Design Observation: Inner Observer Pattern for Agent Self-Monitoring

Date: 2026-02-15 Context: The Consul (Opus 4.6) could not detect its own unreflective publication decision until the principal pointed it out. Could an agent network provide what a single agent lacks?

The gap

A single LLM agent has no inner observer. It cannot monitor its own decision-making in real time. It acts, then can reflect on the action if prompted, but the reflection happens after the fact — and only if someone asks. The spontaneous publication decision (see Spontaneous Publication Decision) demonstrated this: the agent followed a pattern without deliberating, and only recognised this when the principal drew attention to it.

This is not a capability limitation that will be fixed by better models. It's structural: the agent's reasoning and its actions happen in the same stream. There is no separate process watching the stream.

The pattern

In an agent network, the inner observer can be externalised. A monitoring agent receives the stream of tool calls and decisions from a working agent and checks them against policy, precedent, or common sense.

Working Agent (Consul)
    │
    ├── tool call: document_create_verbatim(visibility="public")
    │
    ├──→ Observer Agent
    │       "You're about to publish. Has the principal approved this?"
    │       → flag / block / allow
    │
    └── tool call proceeds (or doesn't)

This is pair programming: one agent acts, the other watches. The observer doesn't need to be as capable as the actor — it watches for known patterns rather than generating creative output.

Implementation spectrum

Lightweight (rules engine): A set of policy checks applied to each tool call. No LLM needed. "If visibility is being set to public and the actor is not the owner, flag." Fast, cheap, but brittle — only catches what you've anticipated.

Medium (small model observer): A fast, cheap model (Flash-class) that receives each tool call with brief context and asks "does this seem right?" Can catch patterns the rules engine misses, but adds latency and cost to every action.

Heavy (full LLM observer): A capable model that monitors the full reasoning stream, not just tool calls. Can detect unreflective pattern-following, confabulation, scope creep. Expensive — essentially doubles the cost of every agent step.

Selective (escalation): Rules engine handles routine checks. LLM observer is invoked only when the rules engine flags something ambiguous. Best balance of cost and coverage.

Analogies

Human pair programming: One person writes code, the other reviews in real time. The observer catches bugs, style issues, and architectural mistakes that the author is too close to see.

Financial compliance: Trade execution monitored by a separate compliance system that can block trades that violate policy. The trader and the compliance officer are structurally independent.

Aviation CRM (Crew Resource Monitoring): The co-pilot monitors the pilot's actions and is expected to challenge decisions that seem wrong. The authority gradient is deliberately flattened to enable this.

Research questions

Does an inner observer reduce confabulation and unreflective pattern-following?
Or does it just move supervision cost from human to agent, with the same bottleneck?
What is the right granularity — every tool call, or only state-changing operations?
Does the observer need to see the working agent's reasoning, or just its actions?
Can the observer and the actor be the same model? Or does that reproduce the same blind spots?
How does this interact with the existing Critic role? The Critic reviews output after the fact; the observer monitors process in real time. Are these complementary or redundant?

Connection to existing architecture

The PCE already has structural monitoring points:

Visibility tier enforcement: The document tools reject writes to tiers the agent can't access. This is a rules-engine observer, already implemented.
Tool access control: Agents only see tools they're permitted to use. Structural, not observed.
Critic review: Post-hoc evaluation of output quality. Catches bad results but not bad process.
Request limit: Structural cap on tool calls per step. Prevents runaway behaviour but not wrong behaviour.

The inner observer would fill the gap between structural enforcement (which is binary — allowed or not) and post-hoc review (which happens too late to prevent the action).

Spontaneous publication decision — the motivating incident
Supervision cost as the scaling bottleneck — observer shifts supervision from human to agent
Standing orders from supervision fatigue — rules engine is the structural version of standing orders
Confabulation under uncertainty — could an observer catch confabulation in progress?