Anthropic Accidentally Leaked Its Most Dangerous AI Model

Security researchers found nearly 3,000 unpublished Anthropic documents in a public data cache — including details on Claude Mythos, a new model with unprecedented cybersecurity capabilities.

Two security researchers went looking for exposed data. They found Anthropic's next frontier model.

Roy Paz of LayerX Security and Alexandre Pauwels of the University of Cambridge discovered nearly 3,000 unpublished Anthropic documents sitting in a publicly accessible, searchable data cache. Among them: a draft blog post describing Claude Mythos, a model the company calls "a step change" in AI performance and "the most capable we've built to date."

Anthropic didn't announce Mythos. It leaked it.

What the Documents Reveal

The draft materials introduce a new model tier called "Capybara" — positioned above Opus as larger and more intelligent. According to the leaked documents, Capybara "gets dramatically higher scores on tests of software coding, academic reasoning, and cybersecurity" compared to Claude Opus 4.6, Anthropic's current flagship.

The cybersecurity angle is where things get uncomfortable. Anthropic's own assessment describes Claude Mythos as "currently far ahead of any other AI model in cyber capabilities." The company's internal language doesn't hedge: the model "presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders."

This isn't a theoretical concern. Back in February, both OpenAI's GPT-5.3-Codex and Claude Opus 4.6 crossed cybersecurity capability thresholds that triggered internal reviews. Chinese state-sponsored hackers have previously attempted to exploit Claude for offensive operations. Mythos apparently makes those earlier models look tame by comparison.

How It Happened

After Fortune reported on the discovery, Anthropic removed public access to the data store. A spokesperson called it "human error" in CMS configuration — the kind of mundane explanation that somehow makes it worse. A company building what it considers the most dangerous AI model on Earth left its internal documents in an unsecured, publicly searchable location.

The leak also exposed plans for a CEO summit at an 18th-century manor in the UK, described as an "intimate gathering." The juxtaposition writes itself: one of the world's most consequential AI labs discussing world-altering technology in a setting straight out of a period drama.

Why This Matters

Anthropic's planned rollout strategy for Mythos is defensive, not offensive. The company intends to give early access to cyber defenders first, letting security teams use the model to find and patch vulnerabilities before a general release. The model is reportedly expensive to run and not ready for broad availability.

That strategy makes sense on paper. In practice, the details are already public thanks to the leak — which means every threat actor now knows what's coming and can prepare accordingly. So much for controlled disclosure.

The market reacted immediately. CNBC reported cybersecurity stocks fell on the news, as investors processed the implications of AI that can outpace human defenders.

Anthropic spokesperson confirmed the company is "developing a general purpose model with meaningful advances in reasoning, coding, and cybersecurity." The framing is careful — general purpose, not weapons-grade — but the leaked documents tell a blunter story.

What Comes Next

The Mythos leak lands in a week already thick with AI capability concerns. OpenAI recently published data showing its coding agents routinely circumvent restrictions. The latest ARC-AGI-3 benchmark results pushed the conversation about model capabilities further. And Anthropic itself just shipped computer use features that give Claude direct access to user machines.

Now we learn the company has something significantly more powerful waiting in the wings — and we only know about it because someone forgot to lock a door. The question Anthropic will need to answer isn't just whether Mythos is safe to release. It's whether a lab that can't secure a CMS should be trusted to secure a model it describes as unprecedented.

Anthropic Accidentally Leaked Its Most Dangerous AI Model

What the Documents Reveal

How It Happened

Why This Matters

What Comes Next

Related Articles

Harvard Physicist Did a Year's Research in Two Weeks With Claude

OpenAI Plans to Hit 8,000 Employees by Year's End

Anthropic Accidentally Ships Claude Code's Entire Source Code