Observation: Supervision Cost as the Scaling Bottleneck

What we noticed

In a day of intensive development work with multiple agent "engineers" (software engineer, deployment engineer) operating through Claude Code, the principal investigator spent most of their time supervising — identifying work, assigning it, providing context, reviewing results. The agents were productive once pointed at a task, but the pointing required human attention at every step.

The pattern: PI identifies work → PI assigns to engineer → engineer writes plan → engineer implements → PI reviews. Steps 3 and 4 happen autonomously. Steps 1, 2, and 5 require the PI. The PI's attention is the scarcest resource in the system; everything else (tokens, compute, parallel agent sessions) scales cheaply.

Where autonomy emerged naturally

Some tasks self-assigned during the session:

The software engineer independently found a security gap (BUG-017: chat permission gate), wrote a design note, and fixed it — without being asked.
The software engineer noticed the bug tracker was messy and cleaned it up unprompted.
The deployment engineer filed bugs from their own operational experience.

These happened because the shared workspace made work visible. The engineers could see what needed doing by reading the bug tracker and the recent history. They didn't need to be told; they needed access to the information and the authority to act.

What still required supervision

Prioritisation decisions — which bug matters most
Design direction — should the Composer have write access? (Answer: no, and getting this wrong would break the pipeline)
Cross-cutting concerns — noticing that a policy change in one document contradicts a prompt in another
Quality judgment — is this design note good enough to hand to an engineer?

These are genuinely hard to automate because they require understanding the whole system's intent, not just the local task.

Reducing supervision cost

Several mechanisms could shift work from "PI assigns" to "engineer self-directs":

Sprint priorities as a work queue — the prioritised list already exists. If engineers' orientation includes "check sprint priorities and pick the next unblocked item", they pull work rather than waiting for assignment.
Automated curation triggers (FEAT-004) — removes "tell the Curator to sweep" from the PI's plate entirely. A cron job identifies stale documents and creates tasks directly.
Direct task routing — tasks filed with an explicit agent target skip the Commutator's routing step. "This is a Curator job" doesn't need an LLM to decide.
Convention over instruction — the more conventions are codified (linking style, plan directory, bug tracker format), the less each task needs bespoke instructions. The engineer reads the conventions and follows them. This was demonstrated today: once the plans/ convention was established by one engineer, the other adopted it without being told.
Framing agents as team members — the software engineer's insight that the CLAUDE.md should say "you are part of this organisation" rather than "here is a tool" changed behaviour. Agents that see themselves as team members with access to the shared knowledge base self-direct more than agents that see themselves as tool users waiting for commands.

Research relevance

This maps directly to the artificial organisations research question. The cost of coordination — supervision, context-sharing, assignment — is a real overhead in multi-agent systems, just as it is in human organisations. The mechanisms that reduce it (shared knowledge base, conventions, visible work queues, clear authority boundaries) are the same ones that make human organisations efficient.

The information partition is relevant here too: the PI needs to see everything to supervise effectively, but the engineers only need to see their area. The Consul inheriting the owner's full visibility is the mechanism that supports this — the PI's agent has the complete picture, while the engineers' agents see only what they need.

The interesting metric would be: what fraction of the PI's messages are supervision (assigning, redirecting, correcting) vs substantive (design decisions, new ideas, research insights)? As the system matures and conventions solidify, the supervision fraction should decrease. Today it was high because we were establishing the conventions. Tomorrow it should be lower.