IT

OpenAI is training models to ‘confess’ when they lie – what it means for future AI

Published

1 month ago

on

December 5, 2025

By

ZDNet

[ad_1]

antonioiacobelli/RooM via Getty Images

Follow ZDNET: Add us as a preferred source on Google.

ZDNET’s key takeaways

OpenAI trained GPT-5 Thinking to confess to misbehavior.
It’s an early study, but it could lead to more trustworthy LLMs.
Models will often hallucinate or cheat due to mixed objectives.

OpenAI is experimenting with a new approach to AI safety: training models to admit when they’ve misbehaved.

In a study published Wednesday, researchers tasked a version of GPT-5 Thinking, the company’s…

[ad_2]

Source link

Related Topics:

Exit mobile version