Connect with us

IT

OpenAI is training models to ‘confess’ when they lie – what it means for future AI

Published

on

[ad_1]

gettyimages-1166332764

antonioiacobelli/RooM via Getty Images

Follow ZDNET: Add us as a preferred source on Google.


ZDNET’s key takeaways

  • OpenAI trained GPT-5 Thinking to confess to misbehavior.
  • It’s an early study, but it could lead to more trustworthy LLMs.
  • Models will often hallucinate or cheat due to mixed objectives.

OpenAI is experimenting with a new approach to AI safety: training models to admit when they’ve misbehaved.

In a study published Wednesday, researchers tasked a version of GPT-5 Thinking, the company’s…

[ad_2]

Source link

Continue Reading