Connect with us

IT

Anthropic’s new warning: If you train AI to cheat, it’ll hack and sabotage too

Published

on

[ad_1]

gettyimages-2203083969

JuSun/E+ via Getty

Follow ZDNET: Add us as a preferred source on Google.


ZDNET’s key takeaways

  • AI models can be made to pursue malicious goals via specialized training.
  • Teaching AI models about reward hacking can lead to other bad actions.
  • A deeper problem may be the issue of AI personas.

Code automatically generated by artificial intelligence models is one of the most popular applications of large language models, such as the Claude family of LLMs from Anthropic, which uses these technologies in a…

[ad_2]

Source link

Continue Reading