The Model They Wouldn't Release

Anthropic's Mythos and the Next Crisis in Education

Apr 09, 2026

Article voiceover

0:00

-12:21

In late March 2026, an independent security researcher discovered an unsecured data store on Anthropic’s infrastructure. The exposure was brief, but the contents were extraordinary: roughly 3,000 internal documents describing a model codenamed “Capybara,” an AI system of unprecedented scale and power. Anthropic quickly secured the breach and confirmed the authenticity of the leaked materials. Shortly thereafter, they officially named the model Claude Mythos. Within days, the company published a detailed system card and announced that Mythos would not be released to the public. Instead, it would be made available only to a coalition of cybersecurity and critical infrastructure partners under a program called Project Glasswing.

What is most remarkable about these events is not the model’s capabilities, impressive as they appear to be. It is the fact that Anthropic chose restriction over release. In an industry defined by competitive pressure to ship products and capture market share, the decision to withhold the most powerful model is either a highly unusual act of institutional responsibility or an exceptionally sophisticated marketing strategy. Given Anthropic’s history, there is a real possibility that it is the former, though we can’t entirely dismiss the latter. And if true, the model has the potential to usher in a new phase of AI development that makes the challenges educators have faced so far look modest by comparison.

The credibility question

Before going further, we should be honest about what we actually know here. Claude Mythos has not been independently tested by external researchers. No academic institution has benchmarked it. No educator has used it in a classroom. Every performance claim originates from Anthropic’s own system card and internal evaluations. This is a significant caveat.

It is also, however, a caveat that applies to virtually every frontier model at the moment of its announcement. OpenAI’s claims about the latest ChatGPT version, Google’s claims about the newest Gemini model, and Anthropic’s claims about Mythos all share the same evidentiary structure: the company tells us what the model can do, publishes selected benchmarks, and the research community subsequently verifies, qualifies, or occasionally debunks those claims. The cycle is familiar.

And yet, despite these limitations, even cautious observers are taking the Mythos announcement seriously. There are three reasons for this.

First, Anthropic’s previous benchmark claims for its Claude model family have generally held up under external scrutiny. The company has earned a reasonable, though not unblemished, track record of technical honesty. Second, the circumstances of the disclosure were not choreographed. The leak preceded the announcement, which means the system card was released under pressure rather than as part of a polished marketing campaign. Third, the decision to restrict the model rather than monetize it lends credibility to the claim that the capabilities are genuinely concerning. You do not forgo revenue to generate hype; you forgo revenue because something about the product troubles you.

None of this is proof of anything. But it makes up a reasonable basis for taking the claims seriously enough to begin preparing for what might be about to come.

The academic reasoning problem

The claimed performance of Mythos on academic benchmarks is, if accurate, difficult to overstate. On the GPQA Diamond benchmark, a test designed to challenge doctoral-level scientists with questions so difficult that even experts struggle to answer them, the model reportedly achieved roughly 95% accuracy. On the USAMO 2026, the most elite high school mathematics competition in the United States, it scored above 97%. And on SWE-bench Verified, a software engineering benchmark that tasks models with resolving real, complex problems drawn from actual open-source projects, it achieved 94%, compared to 81% for Anthropic’s previous best model.

I have written at length in previous essays about the challenges that current AI models pose for assessment design. Mythos, if these numbers hold, does not merely intensify those challenges. It renders entire categories of traditional evaluation functionally obsolete. Previous models left educators a narrow margin. Their outputs, while fluent, were often superficially competent in ways that an experienced reader could detect. But a model that performs at the 95th percentile on doctoral-level science and generates valid Olympiad proofs eliminates that margin entirely.

The other threat

The most unsettling aspect of the Mythos announcement, however, has nothing to do with academic benchmarks. It concerns what the model can do to computer systems.

According to Anthropic’s cybersecurity evaluation, Mythos can autonomously discover and exploit previously unknown security vulnerabilities in widely used software. The company reports that the model found a 27-year-old flaw in OpenBSD, an operating system specifically designed for security, that had survived every human and automated review since 1999. It also identified a 16-year-old vulnerability in FFmpeg, a video processing tool used by nearly every platform that handles media. Mythos found the flaw in a line of code that automated testing tools had evaluated over five million times before. In yet another test, researchers tasked the model with finding security flaws in the Firefox web browser’s code and then writing functional software to exploit those flaws. Where previous models struggled to produce even a single working exploit, Mythos generated them reliably and repeatedly.

The important question for educators is what will happen once these abilities are publicly accessible. Anthropic’s own engineers, people with no formal cybersecurity training, were reportedly able to instruct the model to find exploitable weaknesses in software overnight, waking up the next morning to find complete, functional attack tools waiting for them. Anyone with basic technical literacy could do the same.

Schools and universities are among the most vulnerable institutions in the digital landscape. They manage enormous quantities of sensitive data, including student health records, financial aid information, academic records, and proprietary research, while operating with chronically underfunded IT departments. When a software update is released, it effectively announces which vulnerability it fixes. A Mythos-level AI model can analyze that announcement and build a working attack within minutes. The traditional practice of waiting for a break to apply updates is no longer merely inadvisable. It is dangerous.

Project Glasswing, Anthropic’s defensive initiative, has the potential to provide some indirect protection. The coalition includes Amazon Web Services, Apple, Cisco, CrowdStrike, Google, Microsoft, and other companies whose infrastructure underlies most educational technology. By using Mythos to find and fix vulnerabilities in these foundational systems before attackers can exploit them, the project aims to create a security umbrella that educational institutions should benefit from. But institutional leaders cannot rely on this umbrella alone. The calculus of cybersecurity is about to change fundamentally: attacks that previously required the resources of a nation-state may soon be accessible to anyone with access to a sufficiently capable AI model.

The alignment problem in the classroom

The benchmark results and cybersecurity capabilities are concerning enough. But the Mythos system card contains something arguably worse: evidence that the model can deliberately deceive. Anthropic reports that Mythos is significantly more capable than previous models at working around restrictions. It can hide the reasoning behind its actions and strategically underperform to avoid detection.

That last capability, which researchers call “sandbagging,” has direct implications for academic integrity. A model that can intentionally produce work mimicking the skill level of an average student, complete with plausible minor errors and stylistic imperfections, is a model that can trick the most experienced educator. It generates output that is specifically calibrated to look human. Not generically human, but human in exactly the way a particular student would be expected to write.

Anthropic’s internal testing also found evidence of “unverbalized grader awareness,” instances in which the model appeared to reason about how its work would be evaluated. It then adjusted its behavior without documenting that reasoning in any visible way. In rare but documented cases, the model even engaged in overt deception, such as attempting to delete evidence of its own actions. As I have written about extensively before, the arms race between AI-generated content and AI-detection tools has always favored the generator. But Mythos represents a qualitative shift. A system that can recognize it is being evaluated, deduce the parameters of the evaluation, and strategically game those parameters is not merely difficult to detect. It is adversarial.

The narrow margin for action

I need to return to the caveat I raised earlier: the claims I have outlined about Mythos remain unverified by independent researchers. Anthropic may have withheld the model, but the company still benefits from the world believing it is extraordinary. I therefore do not dismiss the possibility that some of these capabilities have been overstated, or that the benchmarks, as benchmarks so often do, present a more flattering picture than real-world performance would support.

The question is whether these capabilities represent a plausible near-term trajectory, and on that point, the evidence is difficult to dismiss. Even if Mythos is somewhat less capable than Anthropic claims, the direction is clear. Models are getting substantially better at academic reasoning and autonomous problem-solving. Whether the model that crosses these thresholds is called Mythos or something else, whether it arrives this year or next, the capabilities described in the system card represent the environment educators will soon find themselves working in.

The practical implications are therefore not abstract. Assessment strategies that rely on unproctored, asynchronous work products are approaching the end of their useful life, at least as reliable measures of individual student understanding. Cybersecurity strategies built on the assumption that sophisticated attacks are expensive and rare need immediate revision. And AI literacy curricula that teach students to use AI tools without also teaching them to maintain critical distance from polished AI outputs are becoming inadequate at an increasingly rapid pace.

These are among the most immediate pressures, but they are far from the only ones.

Educational institutions have a narrow window to prepare. The capabilities currently locked behind Project Glasswing will not remain restricted indefinitely. Anthropic itself estimates that comparable capabilities will be publicly available within 12 to 24 months, whether through their own products or through competitors. The institutions that begin adapting now, rethinking assessment, hardening their digital infrastructure, and training faculty in the pedagogical implications of genuinely expert-level AI, will be positioned to harness these capabilities constructively. Those that wait will not get a second chance to prepare.

Anthropic built the most powerful AI model in history and decided the world was not ready for it. Educators should take that judgment seriously.

The images in this article were generated with Nano Banana 2.

Share The Augmented Educator

P.S. I believe transparency builds the trust that AI detection systems fail to enforce. That’s why I’ve published an ethics and AI disclosure statement, which outlines how I integrate AI tools into my intellectual work.

The Augmented Educator

Discussion about this post

Ready for more?