This article is a continuation of Why LLMs Are Not Planning Machines.
In that earlier post, the core claim was simple: large language models can generate plausible action sequences, but they do not plan. They lack a mechanism to reason over uncertainty across time.
This follow-up post focuses on what is missing. The missing abstraction is the belief state.
Belief state is not a tuning trick or an optimization. It is the minimal representational object required for planning in environments where responses are variable, incomplete, and adversarial.
The Core Question Planning Must Answer
Every planning system, whether it is biological or artificial, must answer one question repeatedly:
“Given everything I have seen and done so far, what do I think is happening?”
That question cannot be answered by:
- raw observations
- a log of actions
- or a conversational memory buffer.
It requires an internal representation that:
- summarizes past evidence
- preserves uncertainty
- and supports rational decision making.
That representation is called a belief state.
What a Belief State is (and is not)
A belief state is an agent’s internal representation of what it believes about the world right now, including uncertainty.
It is:
- not the true world state
- not a log of observations
- not conversational context or memory.
Instead, a belief state is a distribution over hypotheses about the world, conditioned on what the agent has observed and done so far.
Informally:
A belief state answers the question:
“Given everything I have seen and done, what do I think is happening?”
Planning operates on belief, not on raw observations.
In one sentence:
Planning under uncertainty is impossible unless the agent explicitly represents and updates what it believes about the hidden world.
Formalizing Belief State
To reason clearly about planning under uncertainty, informal intuition is not sufficient. What is required is a precise statement of what a belief state is and what properties it must satisfy.
We therefore adopt an axiomatic definition of belief state. This approach allows us to reason about belief independently of any particular algorithmic realization, such as Bayesian filters or POMDP (Partially Observable Markov Decision Process) solvers, while making explicit the minimal requirements that any belief representation must satisfy.
The goal is not to prescribe how belief should be computed, but to state what must be true of belief for planning under uncertainty to be coherent at all.
Preliminaries
Axioms
Axiom 1: Partial Observability
The agent does not have direct access to the world state.
Key property: Internal state is a necessity.
Implication: Any planning or decision-making needs to happen on an internal representation, not on the ground truth.
Inference: Any system that is operating on raw observations alone is working on incomplete information.
Axiom 2: Belief State as an Internal Representation
Key property: Belief is probabilistic, internal, and normalized.
Implication: The agent represents uncertainties as various degrees of belief over possible world states.
Interpretation: The belief state assigns some confidence to all possible world states. The important thing to note is that it is a belief, not a claim of real world state.
Axiom 3: Sufficiency for Decision Making
A “sufficient statistics” is a function of a dataset that captures all the information about a model parameter allowing to reduce data without losing crucial insights for inference. For example, a sample mean is often sufficient to estimate a whole population mean. To explain formally:
Since the future appears probabilistically identical across both histories, selecting different actions would be irrational.
Key property: History compression.
Implication: All relevant information and observations from the past are summarized in the belief state. The agent does not need to reason over raw history. This axiom makes planning tractable.
Inference: All information related to future decisions is compressed into belief states.
Axiom 4: Belief update using Actions and Observations
Belief update occurs through actions and observations. Belief state evolves causally through actions and observations.
Key property: Temporal coherence.
Implication: Belief state update at time t+1 depends only on belief state in time t, action and observations.
Inference: Belief update satisfies Markov property.
Axiom 5: Consistency with Evidence
Key property: Evidence-monotonicity.
Implication: Belief update need not be numerically exact but qualitatively correct.
Inference: Approximate belief updates are acceptable.
Axiom 6: Preservation of Ambiguity
Belief states must preserve multiple hypotheses when evidence is insufficient to distinguish them.
Key property: Non-collapse under ambiguity.
Implication: Belief must not collapse prematurely under ambiguous observations.
Inference: Supports adversarial and silent-failure environments.
Axiom 7: Belief-Conditioned Action Selection
Actions must be selected as a function of the agent’s belief state, not directly from raw observations or action history.
An agent may encounter identical observations at different times and yet act differently, if its underlying beliefs about the world differs. Conversely, when two different histories induce the same belief state, they must induce the same rational action choice.
This axiom establishes belief state as the interface between perception and action.
Why this is necessary
- Observations are local and ambiguous.
- Belief integrates evidence across time.
Without conditioning action on belief:
- Decisions become reactive
- Behavior depends on incidental details of history
- Planning collapses into a short-horizon response.
Key Property
Decision invariance under equivalent belief
Equivalent belief states imply equivalent rational action choices, regardless of how those beliefs were reached.
Implication
Action selection operates on belief, not on narratives, logs, or the most recent observation.
Inference
Planning and reaction are fundamentally different processes. Reaction maps observations to actions; planning maps belief to action.
Axiom 8: Belief Decay and Reset
Belief state is not static. In systems that operate over long horizons under uncertainty, beliefs must evolve not only in response to evidence, but also in response to the absence of evidence.
An agent may act for extended periods without receiving decisive confirmation or contradiction for its hypotheses. In such situations, belief must not remain frozen indefinitely, nor should it collapse irreversibly after a single event.
Instead, belief state must support:
- Gradual decay of confidence when hypotheses are not reinforced.
- Controlled reset or revision when strong, unambiguous confirmation or contradiction occurs.
This behavior is not an optimization. It is required for stable operation in uncertain, partially observable environments.
Why this is necessary
Without decay:
- Early assumptions dominate indefinitely.
- Outdated explanations persist beyond their validity.
- Exploration and adaptation diminish over time.
Without reset:
- Incorrect beliefs cannot be abandoned.
- Learning becomes path-dependent and brittle.
- Recovery from false assumptions is impaired.
Belief that never weakens becomes rigid. Belief that resets impulsively becomes unstable.
Key Property
Stability under long-horizon uncertainty. Belief remains adaptive across time, even when feedback is sparse, delayed, or ambiguous.
Implication
Planning systems must explicitly manage belief confidence over time, not only updating beliefs when observations arrive, but also revising confidence when observations fail to arrive.
Inference
Robust planning requires mechanisms for belief decay and controlled reset. Without them, systems either over-commit early or oscillate unpredictably, undermining rational decision making.
Two running examples
I will now take two real-life use cases and show how belief state helps in decision making. The two use cases are:
- Telecom Link Failure
- Attack Planning in Red Teaming exercise
Telecom Link Failure
Formal Problem Setup
System Model
- A telecom controller routes traffic over three links:
2. Failure Constraint
At most one link may be down.
3. Observability Constraint
- The controller cannot observe link health directly.
- It can only observe latency spikes after routing.
Actions
The controller action at time t:
Observations
After routing on a chosen link, the controller observes:
Observation properties:
Latency Spike is inherently ambiguous:
- It could signify a failure of the selected communication link.
- Alternatively, it might indicate network congestion.
- It may also be caused by transient interference.
Normal Latency does not guarantee the health or operational status of other links.
Core Difficulty
A single observation does not uniquely identify the true link state, as multiple hidden states remain consistent with any observation sequence. Consequently, raw observations are insufficient for rational routing decisions. This issue is appropriately framed as a planning-under-uncertainty problem, rather than a routing heuristic problem.
Belief State Requirements in the Telecom Problem
Belief State
The controller must maintain belief state:
Interpretation:
Why Belief is necessary
Two routing histories may differ syntactically:
- Route on L₁ and then L₂
- Route on L₂ and then L₁
Yet induced the same belief over which link is down.
If beliefs are equal, the rational routing decision is also equal.
This is the sufficiency principle that motivates the belief states.
Red Team Attack Planning
Formal Problem Setup
System Model
A red team agent must select an attack path from a finite set:
- Constraints
(e.g., network segmentation, EDR interference, credential invalidity).
At most one dominant blocking factor may be active at a time
Hidden State (Target Environment State)
Let the hidden state space be:
The true defensive condition is not directly observable.
Actions
At each step, the agent selects:
Observations
After executing an action, the agent observes:
Observation properties:
- NoResponse is ambiguous:
blocked by defense,
silently succeeded,
network unreachable,
rate limited.
Success confirms only the chosen path, not others.
Structural Equivalence: Telecom ↔ Attack Planning
Crucial Equivalence
- In both cases, the system acts without direct access to the true state.
- In both cases, observations are indirect and ambiguous.
- In both cases, actions influence future information.
Belief State in Attack Planning
Belief State
The red team agent must maintain:
Interpretation:
Consequence for Planning
Two attack histories may look different:
- timeout → retry,
- delayed signal → retry,
But induce the same belief over which path is blocked.
If beliefs are equal:
Thus:
Attack planning must operate on belief, not logs or narratives.
Why This Setup Matters for Red Team Agent
- Treating “no response” as “failure” is equivalent to assuming a link is down after a single latency spike.
- LLM-based planners typically:
Collapse ambiguity
Commit early
and oscillate between attack paths without principled justification.
The telecom analogy shows that:
- This is not a tooling mistake.
- It is a missing belief-state abstraction.
Final Takeaway (from the two running examples problem statement)
Attack plan generation in red team exercises is structurally identical to routing under partial observability.
Both require:
- maintaining beliefs over hidden states
- updating beliefs through action-conditioned observations
- and selecting actions as a function of belief, not narrative plausibility.
Without a belief state, attack planning is reactive. With a belief state, it becomes rational.
How the Axioms map with LLM Failure Modes
- Axiom 1 (Partial Observability): LLM treats text context as the world; confuses missing/hidden state with “not mentioned,” leading to overconfident actions in partially observed environments.
- Axiom 2 (Existence of Belief State): No explicit belief distribution—LLM collapses uncertainty into a single narrative completion, so it cannot represent competing hypotheses (e.g., blocked vs silent success vs tool error).
- Axiom 3 (Sufficiency of Belief State): Decisions depend on token/history phrasing rather than an invariant state summary; semantically equivalent situations yield different actions (“prompt sensitivity” / narrative dependence).
- Axiom 4 (Causal Belief Update): LLM “updates” via fresh narration, not via a causal state-transition rule; it overwrites assumptions instead of incrementally revising beliefs based on action+observation.
- Axiom 5 (Evidence Consistency): Contradictory evidence often fails to reduce confidence; LLM rationalizes failures post-hoc, or flips conclusions abruptly without calibrated, directionally consistent belief shifts.
- Axiom 6 (Ambiguity Preservation): Premature collapse under ambiguity; selects one explanation and commits (hallucinated certainty), pruning viable alternatives and causing brittle plans and wrong backtracking.
- Axiom 7 (Belief-Conditioned Action Selection): Chooses actions from latest observation or “most plausible next step” in language, not from expected utility under belief; results in retries, loops, and inconsistent exploration vs exploitation.
- Axiom 8 (Belief Decay and Reset): Stale beliefs persist indefinitely (no decay) or the model overreacts to single events (hard reset); produces overcommitment, whiplash updates, and unstable long-horizon behavior.
Belief State as a First-Class Planning Abstraction
Without belief state:
- Observations are treated as facts
- Ambiguity collapses
- Plans become brittle and irreversible.
With belief state:
- Evidence accumulates across time
- Competing explanations coexist
- Actions can reduce uncertainty, not just “advance.”
This is essential in environments where:
- Feedback is noisy
- Outcomes are delayed or silent
- Adversarial controls interfere.
Closing Thought
Planning is not the generation of plausible actions. It is the maintenance of coherent belief under uncertainty.
Systems that do not represent belief cannot plan, no matter how fluent their behavior appears. They react, revise narratives, and commit prematurely, mistaking confidence for correctness.
Belief state is not a capability to be elicited. It is a structure that must exist.
Until agents act on belief rather than appearance, they will continue to behave convincingly and fail systematically whenever the world withholds clarity.
About FireCompass
FireCompass is an Agentic AI platform for autonomous penetration testing and red teaming across Web, API, and infrastructure. It discovers shadow assets and web applications, safely validates what is exploitable, and connects findings into multi-stage attack paths with near-zero false positives. Unlike traditional scanners, FireCompass uncovers credential reuse, business-logic flaws, privilege escalation, and app-to-app or app-to-network lateral movement. It can operate autonomously or with expert-in-the-loop validation. FireCompass has 30+ analyst recognitions across Gartner, Forrester, and IDC, and is trusted by Fortune 1000 enterprises.
See What’s Actually Exploitable in Your Environment. Claim Free AI Pen Testing Credits → firecompass.com/explorer
