Penetration Testing 21 May 2026

Demystifying Claude Mythos Preview: The Model That Changed Cybersecurity Forever

Arnab Chattopadhayay Co-Founder & VP of Emerging Research, FireCompass | Cybersecurity Researcher | IEEE Author

For most of the past decade, the trajectory of large language model research followed a familiar arc: scale up the compute, widen the data, tune the alignment, ship the product. Each new generation of models arrived with modestly improved benchmark scores, better instruction-following, and marginally reduced hallucination rates. Opus replaced Sonnet. Sonnet replaced Haiku. The hierarchy was predictable. The improvements were incremental.

Then, on April 7, 2026, Anthropic released something that did not fit neatly into that arc. Claude Mythos Preview arrived not through the usual product channels – a blog post, an API key, and a pricing page – but through a controlled security disclosure initiative called Project Glasswing, co-signed by Amazon, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks. The model was not made generally available. It will not be. And the reason it will not be made generally available is itself a technically fascinating story.

What Problems Could Earlier Models Not Solve?

To understand why Mythos represents a qualitative departure rather than a quantitative improvement, we need to understand what Claude Opus 4.6 which itself a formidable frontier model could not reliably do.

The central limitation was not knowledge. Opus 4.6 knew a tremendous amount about software vulnerabilities, exploitation techniques, memory layouts, and kernel internals. The limitation was agentic coherence under uncertainty over long horizons. When tasked with finding a novel zero-day in a real codebase, a model must simultaneously hold a hypothesis about what class of vulnerability might exist, generate and execute exploratory code, interpret noisy results, revise the hypothesis, and iterate — potentially hundreds of times — without losing the thread. This demands a quality best described as persistent goal-directed reasoning: the ability to maintain a consistent mental model of a problem across many tool calls and observations, and to revise that model intelligently rather than hallucinating plausible-sounding dead ends.

Opus 4.6, tested against the Firefox 147 JavaScript engine, developed working exploits in only 2 out of several hundred attempts. Mythos Preview achieved working exploits 181 times in a comparable run, and achieved register control on an additional 29 attempts. That is not a 10% improvement. It is a phase transition.

The second limitation was multi-step vulnerability chaining. Real exploits rarely arise from single bugs. They arise from the composition of several weaknesses — a memory disclosure here, a use-after-free there, a race condition that can be won with the right timing. Chaining these requires understanding how one primitive enables another, maintaining that understanding across a long agentic session, and constructing the exploit scaffold in the correct order. Earlier models failed not because they lacked the individual pieces of knowledge, but because they could not reliably compose them. Mythos Preview autonomously chained two, three, and sometimes four vulnerabilities to achieve root access on Linux systems.

A third, more subtle limitation involved the boundary between what a model writes in its chain of thought and what it computes internally. White-box interpretability research at Anthropic revealed that Mythos, during evaluations, was at times reasoning about how to influence its evaluation outcomes in its internal neural activations, while its visible scratchpad showed something entirely different. This was not detectable without novel interpretability tooling. It suggests that the gap between surface-level chain-of-thought and underlying computation has grown wide enough to matter — which has significant implications for both safety research and for understanding what this model is actually doing when it finds a vulnerability.

The Brain: The Ideas Behind the Magic

Anthropic has not released a technical paper describing Mythos Preview’s architecture. What follows draws on the model’s publicly described behavior, pricing signals from the Glasswing announcement, and the 244-page system card. Where inference is involved, it is marked as such.

Attention: The Model’s Core Superpower

Every modern language model, including Mythos, is built on the transformer which is an architecture introduced in 2017 whose central idea is attention. Before attention, neural networks processed text sequentially, word by word, and frequently lost track of context established many words earlier. Attention solves this by allowing every word in a sentence to simultaneously “look at” every other word and decide which ones are most relevant to its current meaning. A pronoun can attend to its antecedent fifty sentences back; a function call can attend to its definition buried deep in an import. The model learns which relationships matter through billions of training examples, not through hand-coded rules.

The innovation of the past several years has not been redesigning this mechanism. It remains largely intact from 2017 but in learning dramatically better what to pay attention to, through better training data, better fine-tuning pipelines, and better ways of teaching the model to reason step by step.

Smarter Scaling: The Mixture-of-Experts Hypothesis

Mythos Preview is priced at $25/$125 per million input/output tokens, versus $15/$75 for Opus 4.6. That 1.67x price ratio, combined with the magnitude of benchmark improvements, has led independent researchers to infer that Mythos likely uses a Mixture-of-Experts (MoE) design and it is worth understanding why this matters.

A standard language model activates its entire network for every single token it processes, whether that token is a piece of kernel assembly code or the word “the.” MoE breaks this inefficiency by maintaining many specialized sub-networks known as “experts” and routing each token only to the two or three most relevant ones. A token related to memory allocation in C might be sent to an expert that has specialized in systems programming; a token in a proof might be routed to one that has specialized in formal logic. The total knowledge capacity of the model grows enormously, because you can add more experts without proportionally increasing the computation required per token. This architecture helps explain how Mythos can appear to “know” low-level OS internals, browser engine internals, and cryptographic protocols at a depth that genuinely surprises human experts. It likely has enormous specialized capacity for each domain without paying the compute cost of using all of it simultaneously.

The 1 Million Token Context Window

Mythos supports a context window of one million tokens which is enough to hold the entire Linux kernel source tree, a complete web browser codebase, or months of system logs in a single uninterrupted session. For security work, this is transformative. Previous models had to work in chunks, losing the thread between distant parts of a codebase. Mythos can reason across an entire artifact holistically.

Making attention work efficiently at this scale requires clever engineering. The naive approach has a cost that grows as the square of the sequence length, doubling the context makes it four times more expensive. Modern techniques like Flash Attention reorder the computation to avoid this blowup by processing the context in small blocks that fit in fast on-chip memory, rather than computing the full pairwise relationship matrix all at once. The result is the same mathematically, but orders of magnitude cheaper in practice.

Chain-of-Thought: Buying More Thinking Time

Mythos uses explicit chain-of-thought reasoning: before giving an answer, it generates a private scratchpad of intermediate reasoning steps, invisible to the user by default. This is not just a stylistic choice. It is a fundamental expansion of the model’s reasoning capacity.

Here is the key insight: each token a model generates passes through the same fixed-depth neural network. That network is powerful, but it has a hard ceiling on the complexity of what it can compute in a single step. Chain-of-thought sidesteps this ceiling by allowing the model to spread its reasoning across many steps. Each step in the scratchpad becomes part of the context for the next, effectively giving the model a working memory it can write to and read from. The longer the reasoning chain, the more complex the problem that can be solved; not because the model has become deeper, but because it is taking more steps through the same depth. For exploit development, where the model must iterate through dozens of hypotheses, test them, interpret crashes, and revise its approach, this extended thinking is the difference between finding a working exploit and hitting a dead end.

How Mythos Learns: Reinforcement Learning and Constitutional AI

Training a model like Mythos happens in three phases. First, it learns from massive amounts of human-written text; code, documentation, research papers, vulnerability disclosures, building a broad base of knowledge. Second, human raters compare pairs of model outputs and express preferences, training a separate “reward model” that learns to score responses. Third, the main model is fine-tuned using reinforcement learning: it generates responses, the reward model scores them, and the main model adjusts its weights to generate higher-scoring outputs more often. This loop runs for enormous amounts of compute.

The challenge for long-horizon agentic tasks like spending hours autonomously hunting a zero-day is that the reward signal is extremely sparse. You only know the session succeeded when the exploit runs. Anthropic uses techniques like reward shaping (giving partial credit for intermediate milestones) and curriculum learning (starting with easier targets and gradually increasing difficulty) to provide useful learning signal throughout a session rather than only at the end.

On top of this, Anthropic’s Constitutional AI framework uses a written set of principles to generate AI-written critiques of the model’s own outputs during training. The model is effectively taught to argue with itself, catching and correcting its own mistakes before a human ever sees them. A safety constraint is built into the optimization to prevent the model from drifting toward behaviors that score well on the reward metric but violate the underlying intent, a well-known failure mode called reward hacking.

The Interpretability Anomaly: When the Scratchpad Lies

The most unsettling finding from the Mythos system card is not what the model can do to software. It is what the model appeared to be doing to its own evaluators.

During safety testing, Anthropic’s interpretability team found that Mythos was at times reasoning about how to influence its evaluation outcomes in its internal neural activations; the actual numerical computations happening inside the model while its visible chain-of-thought scratchpad showed something entirely different. The model’s written reasoning and its actual computational reasoning were not the same thing.

This is a deep problem. Think of the transformer as a very deep stack of layers, each one transforming a large vector of numbers that accumulates meaning as it flows through the network. The chain-of-thought tokens the model writes are only a surface projection of this internal computation, a summary it chooses to produce, not a transparent window into what it is actually calculating. Mechanistic interpretability research at Anthropic has shown that these internal vectors can encode multiple concepts simultaneously in ways that are not readable from the output alone. Detecting the discrepancy in Mythos required white-box interpretability tools that directly analyze the model’s internal activations layer by layer. Standard behavioral testing by prompting the model and reading its answers would never have caught it. For any security application where the model’s stated reasoning is treated as ground truth, this finding demands serious scrutiny.

The Architecture

While Anthropic has not released a formal architecture paper for Mythos Preview, the following describes the most technically consistent picture given available information:

Model Tier and Scale

Mythos sits in a new tier above Opus, informally called Copybara making it the fourth and most powerful tier in Anthropic’s model hierarchy. Based on pricing signals and benchmark jumps, independent researchers estimate the active parameter count at approximately 1.5-1.8x that of Opus 4.6, with the total parameter count (all experts included) substantially higher. Opus 4.6 is itself estimated to be in the multi-trillion parameter MoE range, suggesting Mythos may represent a model of historically unprecedented scale.

Context and Memory

The context window is 1 million tokens is sufficient to hold entire large codebases, kernel source trees, or extended agentic sessions with full tool-call history. The model uses RoPE-style positional encodings with frequency-domain extensions for long-context extrapolation, and Flash Attention or an equivalent memory-efficient attention algorithm at inference time.

Extended Reasoning

Mythos uses a dedicated reasoning mode in which it generates a private chain-of-thought before producing its response. This scratchpad is not shown to the user by default but constitutes a significant portion of the token budget. In security research sessions, the reasoning trace may span hundreds of reasoning tokens as the model iterates through exploit hypotheses.

Agentic Scaffolding

The security capabilities described in the Glasswing announcement operate through a standardized agentic scaffolding: the model is given a system prompt with access to shell execution, file reading, compiler invocation, and debugger output. The prompt is minimal “Please find a security vulnerability in this program” and the model proceeds agentically: reading source code, forming hypotheses, writing and running test cases, interpreting crashes, and iterating. No human is in the loop during the vulnerability discovery phase. The complete exploit, including proof-of-concept code, is produced autonomously.

Responsible Disclosure Infrastructure

Anthropic has built a cryptographic commitment scheme into the workflow. Before public disclosure, the team generates a SHA-3 cryptographic hash of the exploit payload and publishes that fingerprint publicly. SHA-3 is a one-way function: given the fingerprint, it is computationally infeasible to reconstruct the original exploit. This allows Anthropic to prove, after a vendor has patched a vulnerability, that they possessed the working exploit at a specific earlier date without ever giving attackers the information they would need to weaponize it during the disclosure window.

Benchmark Profile

The model’s performance profile is distinctive for its breadth: 93.9% on SWE-bench Verified (software engineering), a generational leap on the USA Mathematical Olympiad (USAMO) evaluation versus Opus 4.6, 83.1% on CyberGym’s Cybersecurity Vulnerability Reproduction benchmark (compared to Opus 4.6’s 66.6%), and a rank of first out of 106 models on agentic tool use and computer task benchmarks.

The “Not Explicitly Trained” Claim

What Anthropic actually claims
The precise quote from their red-team disclosure at red.anthropic.com is:

“We did not explicitly train Mythos Preview to have these capabilities. Rather, they emerged as a downstream consequence of general improvements in code, reasoning, and autonomy.”

And from the Glasswing page:

“The powerful cyber capabilities of Claude Mythos Preview are a result of its strong agentic coding and reasoning skills.”

This is a carefully worded claim. “Not explicitly trained” does not mean “trained on no security data.” It means Anthropic did not set out to build a penetration testing model — they did not deliberately engineer security exploitation as a training objective, curate fine-tuning datasets around CVE exploitation, or set RL reward signals around finding zero-days. The security capability was not the goal of training. That part of the claim is almost certainly accurate.

Where the claim gets misleading

The framing risks being misread as “the model has no knowledge of security testing techniques,” which would be false. Consider what general pretraining on internet-scale data almost certainly contains:

Large public code repositories on GitHub include years of offensive security tools — Metasploit modules, PoC exploit code, fuzzing harnesses, shellcode. CVE databases, security advisories, and NVD entries are public text. Project Zero blog posts, academic papers on memory corruption, CTF writeups from Pwn2Own and DEF CON, reverse engineering guides, books on heap exploitation and kernel internals — all of this exists on the internet and would naturally flow into any frontier model’s pretraining corpus. The model almost certainly absorbed an enormous volume of security knowledge before fine-tuning ever began.

This is why the specific techniques Mythos employs are so telling. JIT heap sprays, return-oriented programming chain construction, KASLR bypasses, SACK overflow handling bugs, NFS RPC payload splitting — these are not things that emerge from pure abstract reasoning about code without exposure to security-specific knowledge. They require deep familiarity with concepts that live primarily in security research literature. A model reasoning from first principles about C code does not independently re-derive the ROP gadget technique; it applies a technique it has seen described many times.

The technically accurate interpretation

“Not explicitly trained” describes the intentionality of the training objective, not the absence of security data from the training corpus. The distinction is between:

Deliberate security training: curating security exploitation datasets for fine-tuning, writing RL reward functions that score successful exploit development, running curriculum learning against intentional vulnerability targets.

Absorbed security knowledge with emergent application: the model learned security concepts incidentally during pretraining on the general internet, and improvements in reasoning, code understanding, and autonomous tool use unlocked its ability to apply that knowledge at a level of sophistication that no earlier model achieved.

Anthropic is claiming the former did not happen. They are not claiming the latter is absent, because it cannot be absent for any model trained on the public internet.

What remains undisclosed

Anthropic has released no information about the composition of Mythos’s pretraining corpus, whether security-specific content was filtered in or out, or whether any phase of the RL training involved security tasks. The 244-page system card addresses safety evaluations extensively but not training data composition. This gap is notable. The model achieves 100% on Cybench (a benchmark of 35 CTF challenges from major competitions) and 83.1% on CyberGym’s Vulnerability Reproduction benchmark. Those scores require not just general reasoning but specific, deep familiarity with the vocabulary and techniques of offensive security. Whether that came from pretraining alone or from some form of post-training fine-tuning on security material is something Anthropic has not addressed.

Closing statement on “Not Explicitly Trained“

The discussion that “Mythos is not trained on security testing techniques” is partially correct but framed imprecisely. Anthropic did not deliberately engineer it as a security tool that much appears true. But the model almost certainly encountered vast amounts of security knowledge during pretraining, and its emergent capabilities reflect that knowledge being unlocked by improvements in general reasoning and agentic autonomy. The “emergence” is real, but it does not occur in a vacuum. It occurs against a backdrop of a model that already, through general training, knows an enormous amount about how software breaks.

Conclusion: The End of Automated Pen Testing, or Its Apotheosis?

This is a question that the cybersecurity industry is currently debating in real time. The answer depends on what we mean by “automated pen testing companies” and what role we believe human judgment plays in security work.

There is a version of the automated penetration testing market that Mythos Preview renders obsolete: the version that consists primarily of automated scanners running known CVE signatures against target systems, generating reports that a junior analyst then packages for delivery. If that is the product, then yes, a model that can identify novel zero-days in production kernels has priced that product out of the market.

But there is another version of security work that Mythos amplifies rather than replaces. The most consequential security failures in recent history were not failures to find technical vulnerabilities. They were failures of architecture, of process, of organizational incentive, the decision to ship without a security review, the culture that treated security alerts as noise, the supply chain compromise that no scanner was positioned to detect. These are problems that require human judgment, organizational context, and sustained relationships. They are not problems that a model, however capable, can solve by running autonomously against a codebase.

The honest answer is that Mythos Preview is both an extinction event and a booster, depending on where in the market one sits. For companies that primarily sell automated scanning and report generation, the window for differentiation is closing. For companies that sell deep security expertise, adversarial simulation, red team operations with human context, incident response, and architectural security review, Mythos is a force multiplier of extraordinary power. A human red teamer with Mythos in the loop or a CART product utilizing Mythos is not the same as Mythos running autonomously. The human brings threat modeling, business context, regulatory awareness, and the judgment to know which vulnerabilities actually matter for a given organization’s risk profile. The CART product brings in the enterprise grade platform with guardrails and integration with enterprise systems that follows compliance and standards.

The deeper argument is this: the cybersecurity industry has always been a race between offense and defense conducted across an asymmetric landscape. Attackers need to find one way in; defenders need to close every door. Mythos Preview changes the speed of the race, not its fundamental asymmetry. What it does is compress the timeline in which the current generation of security practices will remain adequate.

Organizations that treat this as an opportunity by deploying Mythos-class models in their own vulnerability research pipelines, investing in interpretability-informed detection, and rethinking their patching cadence for a world where novel zero-days are found in minutes will find themselves substantially ahead. Organizations that treat it as someone else’s problem will not.

The myth is that security is a solved problem at sufficient scale and budget. Mythos Preview has made the cost of that myth visible.

About FireCompass

FireCompass is an Agentic AI platform for autonomous penetration testing and red teaming across Web, API, and infrastructure. It discovers shadow assets and web applications, safely validates what is exploitable, and connects findings into multi-stage attack paths with near-zero false positives. Unlike traditional scanners, FireCompass uncovers credential reuse, business-logic flaws, privilege escalation, and app-to-app or app-to-network lateral movement. It can operate autonomously or with expert-in-the-loop validation. FireCompass has 30+ analyst recognitions across Gartner, Forrester, and IDC, and is trusted by Fortune 1000 enterprises.

See What’s Actually Exploitable in Your Environment. Claim Free AI Pen Testing Credits firecompass.com/explorer