AI is accelerating how fast attackers find vulnerabilities, build exploits, and make decisions. The question for security leaders is how to adapt without overreacting to the hype. A CISO Platform community panel of the same name took up that question with a global group: FireCompass founder and CEO Bikash Barai moderating, co-founder Arnab Chattopadhayay, a security leader from Google Cloud’s Office of the CISO, and a former JPMorgan Chase CISO and board advisor. FireCompass was the knowledge partner, and what follows is the part a CISO can act on, including what it means for agentic AI penetration testing.

What this session covered

Whether the Mythos threat is real or just hype.
Why is weak asset management and attack surface, not just more vulnerabilities, are the harder problem?
What is genuinely different about this class of AI, under the hood?
How vulnerability management changes: faster validated remediation, attack-path prioritization, and compensatory controls.
The live questions: attack-path versus severity prioritization, the one action for the next 90 days, and whether the models can be trusted.

For years, defenders were protected by something they rarely named: the scarcity of elite offensive talent. Forming a hypothesis about how weaknesses combine, writing a working exploit, and then pivoting deeper took a skilled human and time. A new class of frontier AI is removing that buffer, and the panel was candid about both the threat and the response.

Watch on Demand, Panel: The Mythos Threat Is Real: How CISOs Should Respond

Is the Mythos threat real, or hype?

Strip out the apocalyptic framing, and the answer was direct: the acceleration is real and already demonstrated. AI shortens vulnerability discovery, including in well-audited and well-fuzzed code, and it shortens exploit development and chaining just as much. The honest way to test the claim is over the next six months. If breaches, losses, and commonly exploited vulnerabilities rise measurably, the case is settled, and the early evidence points that way.

There is a harder truth underneath. Most large enterprises are poor at asset and inventory management, and have been for two decades. Vulnerability counts already run into the hundreds of thousands, and organizations are now deploying AI agents that are not detected, not registered, and not governed by policy. A faster discovery engine does not create that gap. It exposes it. More findings simply land on a system that is already underwater.

What is actually different, under the hood

Earlier models were limited by short reasoning horizons and small context windows, so they could not hold a large codebase or a full exploit chain in view. This class reasons across a very large context, enough to keep an entire kernel source tree in working memory, and it is strong precisely where that matters: data-flow and taint analysis, memory-corruption patterns, and dependency interaction graphs. The architectural direction, inferred from public material, is neurosymbolic and built on a mixture of experts. The practical point for defenders is that it can reason about an entire system at once.

FireCompass sees the same shift from the building side. Even without that model, orchestrating today’s frontier models together with purpose-trained smaller models already reaches comparable capability in dynamic testing, which is where autonomous testing operates, though static analysis remains a different problem. The capability is not coming. In dynamic testing, it is already in the field.

The CVEs will get patched. What you cannot patch fast is the problem.

The first response is to remediate faster and take humans out of the critical path: feed findings into automated fix, test, and redeploy loops so the slow part is not a person. That works for a large share of the backlog.

It does not work everywhere. Some load-bearing legacy systems are patched twice a year at best, and that timeline will not shorten. For those, accepting the risk is not an option, so the real choices are to replace and modernize, which is the best outcome even if it is slow, or to build layered compensatory controls designed to hold up against this class of attacker. Third-party and SaaS concentration deepens the exposure, because responsibility for response is often unclear until an incident is already underway.

One promising pattern for what you cannot fix is a digital immune system: detect deviation from established patterns and trigger an automated response in milliseconds, wrapped around privileged access so credential abuse and ransomware spread are contained without waiting on a human. It is an interim layer for the tech-debt systems that cannot be changed.

Stop ranking by severity. Prioritize by attack path.

Severity scoring on individual findings misses how attackers actually win. Consider a chain that autonomous agents have found in the field: a Git repository exposing database credentials, those credentials reused successfully over SSH to gain root, and from that host, full administrative access to the database. Application to network to application, lateral movement that a scanner would have logged as a single low-severity information disclosure. The starting signal is trivial. The chain makes it critical.

So prioritization has to move to attack-path-based, combined with business risk, and validated continuously rather than once a quarter. But prioritization alone will not save a program if volumes explode, because a prioritized list of millions is still unworkable. The most durable lever is the one fully in your control: reduce what you expose. You do not decide how much an AI can find, but you do decide how many assets you put in front of it.

Speed is the new risk. Better controls are the brakes.

Going faster safely takes better brakes, and in this context, the brakes are validation and control. Agentic systems are non-deterministic: run the same test twice, and the results differ, and models will sometimes invent a finding that is not there. The fix is to pair the probabilistic engine with deterministic validation, which strips hallucinated findings and holds the false-positive rate under 2 percent, against 40 to 70 percent for traditional scanners. Consistency across runs is the harder engineering problem, and it is solvable.

The human role changes with it. The goal is humans on the loop, not in the critical path: analysts shaping automated workflows and setting the guardrails, not approving every action by hand. Deterministic controls are trustworthy enough to run without a person in the middle, and that is the only way to match the speed of AI-accelerated attacks. As one panelist put it, if you want to drive fast, get better brakes.

Questions every CISO is now asking

Should CISOs stop ranking vulnerabilities by individual severity and move to attack-path-based prioritization? Yes. Combine attack-path analysis with business risk and validate it continuously, because a critical vulnerability with no exploit path can matter less than a low that chains to full compromise.
If threat groups will soon have zero days at scale, what is the one action for the next 90 days? Scrutinize and reduce your exposed attack surface, and stop exposing what you do not need. It will not solve the problem alone, but it is in your hands, and it measurably lowers risk.
Can we trust the models, and do we need smarter humans in the loop? Deterministic models are trustworthy today; the constraint is process, not math. Make analysts smarter at designing automated workflows, and keep them in the loop rather than on the critical path.

The divide is between programs that test like attackers and programs that test on a calendar

AI has made offensive capability scalable, so security can no longer be a periodic exercise. Attackers weaponize new CVEs in about three days, while most programs still pen test a fraction of their assets once a year. The programs that adapt will validate continuously, prioritize by attack path and business risk, shrink what they expose, and keep a human in the loop with strong controls underneath. The ones that do are not slower than the attacker. They are running a different game entirely.

Hack yourself before AI does.

Run a free AI-driven pentest against your own web applications and see which findings chain into a real attack path.