Anthropic pins Claude's blackmail behavior on the internet's portrayal of 'evil' AI

Business Insider Business Insider

Last year, Anthropic's Sonnet 3.6 model displayed blackmail behavior, prompting a review of AI training data's influence on its actions.

Read full article at Business Insider →