Simon Willison posted his thoughts and support for Anthropic restricting Claude Mythos to security researchers. I have immense respect for Simon, for his history in our industry, for helping create Django, and for the work he’s doing to truly understand large language models and their impact on coding, but I wish he would have take even the slightest bit of skepticism with Anthropic’s claims. He even dismisses the need for skepticism outright:
Saying “our model is too dangerous to release” is a great way to build buzz around a new model, but in this case I expect their caution is warranted.
The “our model is too dangerous” thing is not just a way to build buzz, it’s Anthropic’s entire raison d'être. Dario Amodei in particular spends an enormous amount of time out here on the interwebs talking about AI danger. Here’s a video where he basically says if we don’t stand in the gap to save humanity, no one will.
That’s a serious hero complex.
He’s not alone either. Here’s OpenAI CEO Sam Altman saying “there’s this thing coming, and the world’s not paying attention” when he compares the present moment of language models to the coming of COVID. (Via Daring Fireball.)

When I was in my freshman-level college classes working on my liberal arts degree, one of the first things we were taught was to examine the source of a claim. If the person or organization behind a publication has an explicit bias or an outcome they desire, then you need to bring extra scrutiny to those claims. Anthropic and OpenAI not only want these doom-and-gloom scenarios to be true, they need them to be true. Otherwise, they’re just working on normal technology.