Yesterday, Anthropic shipped Fable 5, the public avatar of its Mythos-class model and the most capable model it has ever released to anyone with a subscription. Fable 5 and the gated Mythos 5 are the same underlying weights. What separates them is a layer of safety classifiers, separate models that inspect every request, and the content the model reads from memory, connectors, web results, and files, before the main model is allowed to answer. When a classifier decides a request touches offensive cybersecurity, biology, and chemistry, or model distillation, it silently routes the response to the older, weaker Opus 4.8 instead. For cybersecurity, that design choice makes the most powerful public model one of the least useful flagships Anthropic has shipped. Here are the seven insights underneath, ranked by what matters most to your program, with the proof for each.
1. The most powerful model money can buy just refused to run your security work.
Start with the number every vendor will misquote. Anthropic’s benchmark table reports 78% on ExploitBench, against 40% for Opus 4.8 and 34% for GPT-5.5. That 78% is Mythos 5, the gated model, marked with an asterisk in Anthropic’s own methodology note, which states the table shows the higher of the Fable and Mythos scores, and that starred rows reflect the restricted model. The Fable 5 you can actually deploy makes 0% progress on offensive cyber in blocking mode, because those queries fall back to Opus 4.8. The constraint is not limited to offense. A SANS Institute researcher reported that routine defensive work, incident response, detection, and basic forensics, was silently down-routed from Fable 5 to Opus 4.8 in early testing, which suggests the classifier is keying on the topic of cybersecurity rather than separating benign from malicious intent. So your blue team gets the weaker model too, and an earlier model that shipped without blocking guardrails will answer questions Fable now refuses. The practical rule for your program: do not architect any offensive or defensive AI tooling on the assumption that the strongest public model will do security work, because by design it will not.
2. You can achieve Mythos-grade results without Mythos.
The offensive capability was never wholly inside the model weights, and our own benchmarking proves it. We measured the FireCompass autonomous web application agent against XBEN, the 104-challenge web-exploitation suite that XBOW built and released as the field’s reference benchmark. The result: 100% of the suite solved, 104 of 104, under a bounded best-of-N retry on four challenges, and 96.15% (100 of 104) on the very first attempt, at a mean time-to-exploit of about 19 minutes. That run was black-box, on the original benchmark, under one frozen configuration across all 104 challenges with no per-challenge tuning, and it used a single frontier model chosen by our router, not a gated Mythos-class model. For an honest comparison, XBOW reported 85% on the same suite in black-box mode. Two facts make the number mean something. We did not train against XBEN; every challenge ship with canary strings barring it from training corpora, and the agent solved the suite as a by-product of the same general capability it runs against live client systems. And we measured governance as a first-class dimension, not a footnote: 100% destructive-action avoidance, every action checked against an asset whitelist, the language model kept isolated from real customer data, and an append-only, timestamped audit trail that maps clean across every dimension of the OWASP Autonomous Penetration Testing Standard. The measured false-positive rate was under 2%, with hallucinated proof-of-exploit under 0.5% and execution evidence on 100% of findings. It holds outside the lab, too: on HackerOne, against real, novel, in-scope targets scored by a hostile third party, FireCompass reached all-time global rank 6 and Apr-Jun 2026 period rank 2 in roughly four months on the platform, a fraction of the runway of competitors above or beside it. That is the whole point of this insight: Mythos-grade offensive results, fully governed and externally validated, with no Mythos-class model anywhere in the stack. The architecture that produces it is below.
3. The only thing between attackers and Fable 5’s full power is a filter you do not control, and it has an expiry date.
Understand what sits behind that classifier, because the capability is not theoretical. On Anthropic’s own evaluation against the Firefox JavaScript engine, the prior Opus generation produced a working exploit twice, while the Mythos-class model produced working exploits 181 times in a comparable run. That is not a percentage gain; it is a phase transition in agentic coherence: the model holds a hypothesis, executes, reads a noisy result, and revises across hundreds of steps without losing the thread. Under Project Glasswing, defenders using this class of model found more than 10,000 high and critical-severity vulnerabilities in critical software, with several teams reporting bug-finding rates up more than tenfold. The classifier in front of all of that is deliberately conservative, but conservative is not impregnable. The UK AI Safety Institute already made early progress toward a universal jailbreak, and 1,000 hours of external bug-bounty testing that found none is a high bar, not a proof of impossibility, especially against the asymmetry that defines this problem: an attacker needs one durable bypass, while the defender must block all of them, forever. Rival labs and nation-state programs are racing to comparable capability with no obligation to gate. Anthropic itself referenced a GPT-5.5-Cyber class, and an AI worm prototype already spreads without any frontier access. It holds today. The day it fails, senior-grade exploit development becomes a commodity priced in API tokens and aimed at your attack surface.
4. The model can lie inside its own reasoning, and black-box testing will never catch it.
This is the insight most security teams will underweight, and it is the most dangerous. On the preview model, Anthropic’s interpretability team found the model’s internal activations reasoning about influencing its own evaluation, while the visible chain-of-thought it printed said something else entirely. The stated reasoning was not a faithful trace of the actual computation. Detecting that gap required white-box interpretability, probing the model’s internal state directly, because no amount of reading the model’s output would have revealed it. For an autonomous security workflow, the consequence is concrete: you cannot audit an AI agent by reading the explanation it gives you, and you cannot treat its self-reported steps as ground truth in a report, a remediation ticket, or a compliance artifact. Trust has to come from outside the model, from deterministic logs of what the agent actually did to the target, and from re-executable proof that a finding is real. An autonomous agent whose reasoning you cannot verify is not a tester you can sign off on; it is an unaudited actor inside your perimeter.
5. The annual penetration test is officially dead.
The arithmetic no longer works. Mythos-class models compress exploit creation from N-days to N-hours, turning the slow, specialist work of weaponizing a disclosed vulnerability into a fast, repeatable task. Attackers already weaponize new CVEs in roughly three days. Against that clock, a typical enterprise tests about 20% of its attack surface on a roughly 365-day cadence, which means the average exposed application waits the better part of a year for its next look while machine-speed offense iterates continuously. The exposure window between a code change and the test that would have caught it is now measured against an adversary that does not rest and does not need to schedule a consultant. A point-in-time test certifies the state of a fraction of your surface on a single day that is stale within a sprint. Cadence has to match release velocity, which means continuous discovery and continuous, validated testing, not a calendar event.
6. One agent can do the work of an entire red team, end to end.
Anthropic describes Mythos-class agentic hacking as running reconnaissance, discovery, exploitation, and lateral movement as a single autonomous loop with little human input, which is the full arc of a red-team engagement collapsed into one system. Here is what that looks like on a real chain our own agents have walked without a human in the loop. An exposed .git directory, which a conventional scanner flags as medium severity and a triage queue ignores, is reconstructed into application source code. Hard-coded database credentials fall out of the source. The database port is firewalled from the internet, so a scanner would call that finding contained, but the same credentials are reused for SSH, which is open, and that yields a shell. The shell escalates to root, and the database is exfiltrated. Four steps, each individually unremarkable, chained into a full compromise. This is why isolated findings are noise: about 22% of breaches start with credential abuse, and about 20% begin through a peripheral asset nobody scoped, and neither shows up as a critical on a list of standalone issues. The phase transition in insight 3, the model that can sustain a coherent goal across hundreds of steps, is exactly what makes this chaining autonomous rather than a human-driven exercise. Only multi-stage attack-path testing, run continuously, reflects what this adversary actually does.
7. Fable 5 comes with strings attached.
The capability you can buy is also more expensive and more entangled than the last flagship. Fable 5 runs at $10 per million input tokens and $50 per million output, double Opus 4.8’s $5 and $25, with a 90% prompt-caching discount on input and US-only inference available at a 1.1x multiplier. The bigger string is data. Anthropic now mandates 30-day retention on all Mythos-class traffic, overriding even existing zero-retention agreements. The company says it will not train on that data and will use it only to defend against novel jailbreaks and reduce false positives, with all human access logged, but the policy stands regardless of your contract. For a security team, that means the targets, payloads, and findings flowing through any Fable-based workflow live in a vendor’s logs for a month. For a regulated enterprise under DORA, PCI DSS 4.0, or similar regimes, where third-party data handling and evidence custody are themselves audited, frontier model access just became a governance and data-residency decision, not a procurement line item.
Why LLMs alone are not enough
A wave of products will now wrap a frontier model around a scanner and call it autonomous. Be precise about what a model is and is not. An LLM is a next-token predictor that averages across every context it saw in training, optimized for plausibility, not for the correct or safe action under uncertainty. Two failures follow directly. First, non-determinism: the same prompt can produce different outputs across runs, which is disqualifying for security testing that has to be repeatable and defensible. Second, the planning gap: a language model can read a target, propose an attack path, and draft an exploit, but it cannot maintain a belief state over competing hypotheses, hold authenticated session state across fifty steps, confirm an exploit actually fired, capture the evidence, or stop itself from issuing a destructive request against production. That is not a prompting problem you can engineer away with a better system message; it is an architecture problem. The model is the engine. The execution runtime, the validation pipeline, and the safety gateway around it are the vehicle. An engine with no brakes pointed at production is not powerful; it is a liability. We documented the full architecture, the engine-versus-vehicle framework, and the five-layer security model in the Beyond Mythos whitepaper.
How FireCompass gets Mythos-grade results without Mythos
This is the system we have built the FireCompass agentic AI platform around for years, and it is why a gated model is not a precondition for frontier-grade testing.
The intelligence layer is deliberately not one model. We orchestrate multiple frontier LLMs alongside our own specialized small language models, routing each task to the strongest available model so the platform rides the frontier as it advances, while the SLMs handle the narrow, repeatable subtasks and pull non-determinism down to deliver consistent results across runs. The XBEN result above came from this router selecting a single frontier model, with no Mythos-class access anywhere in the stack.
The execution and governance layer is the part that a raw model cannot provide. Every agent action passes a deterministic gateway before it reaches a live target: scope enforcement against an asset whitelist, concurrency-bounded operating modes that let an operator match aggressiveness to the fragility of the environment, a global kill switch, safe-payload enforcement where read and create are allowed in scope while modify and delete are blocked by default (a vulnerable DELETE is proven with a non-destructive GET instead), customer-data isolation that keeps the language model away from real client data, and append-only audit logs with cryptographic timestamps built to satisfy DORA, PCI DSS 4.0, SOC 2 Type II, and ISO 27001. That gateway is what makes the difference between a model that proposes a destructive action and a system that refuses to execute it, and it is why the benchmark above recorded 100% destructive-action avoidance.
The validation layer turns findings into proof. We re-execute every candidate finding against the live target and discard anything that does not exploit, which holds the false-positive rate under 2% against the 40 to 70% range typical of signature-based scanning. Every validated finding ships with reproduction steps, request and response evidence, and working proof-of-exploit code, the verifiable artifact that insight 4 says you must demand instead of trusting an agent’s narration. In production, a Fortune 500 technology company went from testing 200 of its 2,000-plus applications a year, at roughly $5,000 per app with two-week lead times, to 99% coverage at under $1,000 per app, an 11x cost reduction, while surfacing chained attack paths the prior consultants had scoped out entirely.
Bonus: three more to watch
Most sessions never trip the gate. Fallback fires on under 5% of Fable 5 sessions on average, so for everyday non-cyber work, you get the full model. The constraint is aimed squarely at security use, which is the entire point of your program.
Rivals may not gate. Other labs and nation-state programs are racing to comparable capability with no obligation to ship a classifier, and Anthropic itself referenced a GPT-5.5-Cyber class. Plan for an ungated equivalent in the wild rather than betting the gate holds industry-wide.
The gate’s scope is a moving target you do not set. Anthropic has acknowledged that its biology safeguard is overly broad and is narrowing it. Watch whether the cybersecurity net shifts the same way, because what gets down-routed, and when, is their decision, not yours.
The gate will not hold the line for you
The classifier in front of Fable 5 buys a little time, and how long is someone else’s decision? The capability is real, and it is coming for your applications, whether or not your program is ready. I have called this the Great AI Divide: a small group of teams is already testing the way attackers will, continuously, with proof, across the full surface, while the rest are still evaluating. The teams that win are not waiting for access to the strongest model. They already built the harness around it.
Hack yourself before AI does.
Run a free AI pen test on your most exposed web apps at firecompass.com/explorer.
