Insights from a closed-door roundtable on AI agent safety and governance, chaired by Bruce Schneier and hosted by FireCompass founder Bikash Barai. Participant comments are kept anonymous by agreement.
The most useful thing about this roundtable was that nobody pretended to have the answer. A room of senior security leaders spent an hour on AI agent governance and left with sharper questions than they arrived with. That is the honest state of the practice right now, and it is worth writing down what the room did agree on.
Bruce Schneier set the tone. AI systems are now capable enough to do real work inside organizations. The reflex is to treat them like junior employees, and that analogy both works and fails. It works because we have thousands of years of practice catching the way humans make mistakes: double-entry bookkeeping, copy editing, the surgical checklist that confirms nothing was left inside the patient. It fails because agents make a different class of mistake, and we have no playbook for those yet.
Who carries the liability is starting to settle, and it is settling on the organization. The Regional Court of Munich recently treated Google’s AI Overviews as Google’s own statements and issued a preliminary injunction after they falsely tied two publishers to scams and dubious business practices. That ruling is regional and under appeal, not binding precedent, but the direction is clear. Canada got there first: in Moffatt v. Air Canada, a tribunal held the airline responsible for a chatbot that gave a customer wrong information about claiming a bereavement fare, and rejected the argument that the chatbot was a separate entity answering for itself. “The agent did it, not me” rings as hollow as “the employee did it.” And the failure mode scales in a way a human one does not. A mispriced item on a shelf is a contained, one-off loss. An agent making the same error makes it at machine speed, so a one-percent failure rate stops being a rounding error: someone posts the glitch to Reddit and the mistake repeats ten thousand times before anyone notices. Speed converts a tolerable error rate into an incident.
Underneath all of it sat the same pressure every leader in the room recognized. The business is talking about innovation and cost reduction while security is talking about restriction, and the board wants the company AI-forward immediately. This is happening whether security signs off or not, so the job is to operate in the gray and contribute early rather than be the function that says no.
Three things came out of the session: the risks the room would put on a register, the controls they converged on, and a look at how FireCompass applied the same thinking to its own pentesting agents.
Part 1: The AI agent risk register
Asked what they would actually put on an AI governance risk register, the leaders converged on a consistent set of entries.
Accountability is settling on the organization
The legal answer is arriving faster than the operational one. Early rulings point toward the deploying organization bearing responsibility for what its agents do, even though the case law is thin and still moving. The open question is less who is liable than how you build the controls and evidence to manage that liability before a court settles it for you.
Treat agents as privileged identities
Agents need per-agent identity and least privilege, scoped tightly to one job. The room reached for familiar analogies: govern an agent the way a mainframe team wraps human-grade controls around a started task, or the way you would govern a network engineer who holds the keys to the kingdom. The discipline already exists. It just has to be applied to a non-human worker.
Non-determinism is a control problem
Ask an agent the same question twice and you can get two different answers. It will even describe the same finding two different ways. That inconsistency is not only a quality issue. It undermines auditability, repeatability, and any control that assumes a stable output.
Explainability, or the training has nothing to attach to
Controls have to show a user the downstream impact of what they are asking an agent to do. Years of security awareness training only work if the user can see the consequence of the request. Without explainability, there is nothing to teach against.
Shadow AI compounds
You approve ten agents and discover fifty, because an agent can spin up more agents. Leave a small gap and it multiplies. Shadow AI is harder to contain than shadow IT was, precisely because the assets can create more of themselves.
Hallucination is dangerous where you cannot verify
Agents invent results. FireCompass saw its own pentest agents fabricate vulnerabilities early on. That case was recoverable because a finding can be tested and confirmed. Most enterprise outputs cannot be checked that cheaply, and that is exactly where hallucination turns into risk.
The missing gut check
A human employee has a conscience and a fear of getting it wrong. A digital employee does not hesitate. It does whatever it is permitted to do, which is why the limits on what it is permitted to do carry so much weight.
Part 2: The control set
No one claimed a finished framework. But the practical consensus was clear, and most of it is not new.
Start with inventory and approved use cases
Know which agents exist, who owns them, and what they can access. Restrict to approved use cases, and define the “not use” cases just as explicitly. Several leaders run a build process where the business states the intended use and the intended non-use, engineering submits an architecture diagram and capability list, and a security and privacy assessment sits alongside both.
Enforce identity at runtime, deterministically
ISO 42001 and the NIST AI RMF were both named as the expectations external auditors are starting to apply. Neither is an identity standard. ISO/IEC 42001 is an AI management-system standard and the NIST AI RMF is a risk framework, and what they ask for is documented governance, runtime enforcement, and full auditability. Identity and least privilege are one part of meeting that bar, not the whole of it. Enforce identity at runtime, keep logs deterministic and complete enough to pass an audit, and hold destructive or irreversible actions behind human approval.
Human in the loop is a supervisor, and supervisors do not scale
The room was honest about the limits here. Human in the loop is really just a supervisor by another name, and a human reviewing every action becomes the choke point that makes the whole system slow and impractical. It is a real control. It is not a scalable one.
Adversarial agents may govern better than cautious prompts
One model offered was a council of agents: agents that check each other, some assigned only to be antagonistic to the result. Adversarial agent workflows may govern better than trying to teach a language model to reason like a cautious human, because anything built on language can be socially engineered. Designing for disagreement beats designing for good intentions.
Run the controls you already have, faster
The recurring theme was that agent governance is mostly existing discipline applied to a new kind of worker: asset inventory, data classification, role-based access control, periodic check-ins for drift and pollution. The change is cadence. Run these controls faster and more frequently, and re-validate on a schedule the way you would review a person’s work.
The trap nobody has solved: reliability erodes oversight
The most important exchange of the session was also the least comfortable. As agents get more reliable, the humans watching them get worse at catching the rare failure. There are not enough fires, so firefighters go untrained. Not enough aviation disasters, so the disaster teams atrophy. FireCompass confirmed the pattern from inside: six months ago its pentest agents were roughly half as good as its researchers, who watched closely and found gaps constantly. Today the agents find multi-stage paths the researchers miss, and the researchers, precisely because the agents are now good, have grown less vigilant. Human on the loop softens the landing. It does not stop the slide.
Part 3: A real-life case study, how FireCompass built its agentic harness
FireCompass has been building agentic pentesting since before “harness” was a common term, which forced early decisions about control. The result is a working example of the governance the room was reaching for.
The governing principle: never let one probabilistic model police another
The central decision was to refuse to use a probabilistic model to monitor another probabilistic model for anything critical. That produced a hybrid system rather than a purely AI one, with deterministic, rule-based guardrails sitting alongside the AI rather than more AI layered on top to supervise it.
Take away the teeth
An agent can plan anything, but if you do not give it the tool, it cannot act. Any action that could be disruptive is removed at the tool level rather than discouraged at the prompt level. You do not ask the model to behave. You remove its ability to misbehave.
Input and output firewalls, then a deterministic enforcement layer
An AI input firewall and output firewall govern what an agent is allowed to take in and put out. Behind them sits an absolute, rule-based deterministic enforcement layer for critical actions, with critic agents retained for the work that does not require certainty.
