This article is a continuation of Why LLMs Are Not Planning Machines.

In that earlier post, the core claim was simple: large language models can generate plausible action sequences, but they do not plan. They lack a mechanism to reason over uncertainty across time.

This follow-up post focuses on what is missing. The missing abstraction is the belief state.

Belief state is not a tuning trick or an optimization. It is the minimal representational object required for planning in environments where responses are variable, incomplete, and adversarial.

The Core Question Planning Must Answer

Every planning system, whether it is biological or artificial, must answer one question repeatedly:

“Given everything I have seen and done so far, what do I think is happening?”

That question cannot be answered by:

raw observations
a log of actions
or a conversational memory buffer.

It requires an internal representation that:

summarizes past evidence
preserves uncertainty
and supports rational decision making.

That representation is called a belief state.

What a Belief State is (and is not)

A belief state is an agent’s internal representation of what it believes about the world right now, including uncertainty.

It is:

not the true world state
not a log of observations
not conversational context or memory.

Instead, a belief state is a distribution over hypotheses about the world, conditioned on what the agent has observed and done so far.

Informally:

A belief state answers the question:
“Given everything I have seen and done, what do I think is happening?”

Planning operates on belief, not on raw observations.

In one sentence:

Planning under uncertainty is impossible unless the agent explicitly represents and updates what it believes about the hidden world.

Formalizing Belief State

To reason clearly about planning under uncertainty, informal intuition is not sufficient. What is required is a precise statement of what a belief state is and what properties it must satisfy.

We therefore adopt an axiomatic definition of belief state. This approach allows us to reason about belief independently of any particular algorithmic realization, such as Bayesian filters or POMDP (Partially Observable Markov Decision Process) solvers, while making explicit the minimal requirements that any belief representation must satisfy.

The goal is not to prescribe how belief should be computed, but to state what must be true of belief for planning under uncertainty to be coherent at all.

Preliminaries

Axioms

Axiom 1: Partial Observability

The agent does not have direct access to the world state.

Key property: Internal state is a necessity.

Implication: Any planning or decision-making needs to happen on an internal representation, not on the ground truth.

Inference: Any system that is operating on raw observations alone is working on incomplete information.

Axiom 2: Belief State as an Internal Representation

Key property: Belief is probabilistic, internal, and normalized.

Implication: The agent represents uncertainties as various degrees of belief over possible world states.

Interpretation: The belief state assigns some confidence to all possible world states. The important thing to note is that it is a belief, not a claim of real world state.

Axiom 3: Sufficiency for Decision Making

A “sufficient statistics” is a function of a dataset that captures all the information about a model parameter allowing to reduce data without losing crucial insights for inference. For example, a sample mean is often sufficient to estimate a whole population mean. To explain formally:

Since the future appears probabilistically identical across both histories, selecting different actions would be irrational.

Key property: History compression.

Implication: All relevant information and observations from the past are summarized in the belief state. The agent does not need to reason over raw history. This axiom makes planning tractable.

Inference: All information related to future decisions is compressed into belief states.

Axiom 4: Belief update using Actions and Observations

Belief update occurs through actions and observations. Belief state evolves causally through actions and observations.

Key property: Temporal coherence.

Implication: Belief state update at time t+1 depends only on belief state in time t, action and observations.

Inference: Belief update satisfies Markov property.

Axiom 5: Consistency with Evidence

Belief update must be directionally consistent with the evidence.

Key property: Evidence-monotonicity.

Implication: Belief update need not be numerically exact but qualitatively correct.

Inference: Approximate belief updates are acceptable.

Axiom 6: Preservation of Ambiguity

Belief states must preserve multiple hypotheses when evidence is insufficient to distinguish them.

Key property: Non-collapse under ambiguity.

Implication: Belief must not collapse prematurely under ambiguous observations.

Inference: Supports adversarial and silent-failure environments.

Axiom 7: Belief-Conditioned Action Selection

Actions must be selected as a function of the agent’s belief state, not directly from raw observations or action history.

An agent may encounter identical observations at different times and yet act differently, if its underlying beliefs about the world differs. Conversely, when two different histories induce the same belief state, they must induce the same rational action choice.

This axiom establishes belief state as the interface between perception and action.

Why this is necessary

Observations are local and ambiguous.
Belief integrates evidence across time.

Without conditioning action on belief:

Decisions become reactive
Behavior depends on incidental details of history
Planning collapses into a short-horizon response.

Key Property

Decision invariance under equivalent belief

Equivalent belief states imply equivalent rational action choices, regardless of how those beliefs were reached.

Implication

Action selection operates on belief, not on narratives, logs, or the most recent observation.

Inference

Planning and reaction are fundamentally different processes. Reaction maps observations to actions; planning maps belief to action.

Axiom 8: Belief Decay and Reset

Belief state is not static. In systems that operate over long horizons under uncertainty, beliefs must evolve not only in response to evidence, but also in response to the absence of evidence.

An agent may act for extended periods without receiving decisive confirmation or contradiction for its hypotheses. In such situations, belief must not remain frozen indefinitely, nor should it collapse irreversibly after a single event.

Instead, belief state must support:

Gradual decay of confidence when hypotheses are not reinforced.
Controlled reset or revision when strong, unambiguous confirmation or contradiction occurs.

This behavior is not an optimization. It is required for stable operation in uncertain, partially observable environments.

Why this is necessary

Without decay:

Early assumptions dominate indefinitely.
Outdated explanations persist beyond their validity.
Exploration and adaptation diminish over time.

Without reset:

Incorrect beliefs cannot be abandoned.
Learning becomes path-dependent and brittle.
Recovery from false assumptions is impaired.

Belief that never weakens becomes rigid. Belief that resets impulsively becomes unstable.

Key Property
Stability under long-horizon uncertainty. Belief remains adaptive across time, even when feedback is sparse, delayed, or ambiguous.

Implication

Planning systems must explicitly manage belief confidence over time, not only updating beliefs when observations arrive, but also revising confidence when observations fail to arrive.

Inference
Robust planning requires mechanisms for belief decay and controlled reset. Without them, systems either over-commit early or oscillate unpredictably, undermining rational decision making.

Two running examples

I will now take two real-life use cases and show how belief state helps in decision making. The two use cases are:

Telecom Link Failure
Attack Planning in Red Teaming exercise

Telecom Link Failure

Formal Problem Setup

System Model

A telecom controller routes traffic over three links:

2. Failure Constraint

At most one link may be down.

3. Observability Constraint

The controller cannot observe link health directly.
It can only observe latency spikes after routing.

Actions

The controller action at time t:

Observations

After routing on a chosen link, the controller observes:

Observation properties:

Latency Spike is inherently ambiguous:

It could signify a failure of the selected communication link.
Alternatively, it might indicate network congestion.
It may also be caused by transient interference.

Normal Latency does not guarantee the health or operational status of other links.

Core Difficulty

A single observation does not uniquely identify the true link state, as multiple hidden states remain consistent with any observation sequence. Consequently, raw observations are insufficient for rational routing decisions. This issue is appropriately framed as a planning-under-uncertainty problem, rather than a routing heuristic problem.

Belief State Requirements in the Telecom Problem

Belief State

The controller must maintain belief state:

Interpretation:

Why Belief is necessary

Two routing histories may differ syntactically:

Route on L₁ and then L₂
Route on L₂ and then L₁

Yet induced the same belief over which link is down.

If beliefs are equal, the rational routing decision is also equal.

This is the sufficiency principle that motivates the belief states.

Red Team Attack Planning