Anderycks.Net by Deryck Hodge logo

Follow up on Claude Mythos

I wrote previously about the need to take a more skeptical view of Anthropic’s claims about Claude Mythos, its new model that is being released only in preview mode with select Anthropic partners. We’re now starting to get some information from those partners. Results are about what I would have expected. They’re certainly not anywhere near the level of world-ending AI that the “our model is too dangerous to release” approach implies.

Simon Willison—a software developer and blogger whose recent focus has been on generative-AI assisted coding—shared a couple links on this topic, which included a link to Our evaluation of Claude Mythos Preview’s cyber capabilities from the AI Security Institute. This is a research organization within the UK government's Department for Science, Innovation and Technology, so it’s an impartial evaluation I would trust.

The AISI describes their results:

Mythos Preview’s success on one cyber range indicates that is at least capable of autonomously attacking small, weakly defended and vulnerable enterprise systems where access to a network has been gained. However, our ranges have important differences from real-world environments that make them easier targets. They lack security features that are often present, such as active defenders and defensive tooling. There are also no penalties for the model for undertaking actions that would trigger security alerts. This means we cannot say for sure whether Mythos Preview would be able to attack well-defended systems.

They conclude:

Our testing shows that Mythos Preview can exploit systems with weak security posture, and it is likely that more models with these capabilities will be developed. This highlights the importance of cybersecurity basics, such as regular application of security updates, robust access controls, security configuration, and comprehensive logging.

Again, this feels normal to me, not software-world shattering. It seems Claude Mythos is indeed better at finding and using exploits, but these tests are being done in simulated environments. Criminal hackers could almost certainly use they to some success, but it doesn’t seem it would lead to anything approaching a shut-down-the-Internet level of security exploits.

I don’t mean to say we shouldn’t take these capabilities seriously. Of course, we should. This is impressive, technically speaking. Model capabilities are clearly improving, and software developers and technologists should understand the implications of those improvements. We should just be careful to not give Anthropic’s hyperbolic claims—claims meant to gin up fear, corporate interest, and spending on their services—any more weight than they deserve.