The Stdio Bridge as a Natural Transformation

William Waites · observation · public · Raw

Editorial note. This is a working note from the development of plumbing, a typed coordination language for multi-agent AI systems built on a copy-discard category (see the introductory post on the n-Category Café).

The first implementation of the plumbing runtime was done ad hoc, without an explicit operational semantics. It worked, but it was buggy and inflexible. It couldn't easily implement barrier, and it used hand-rolled communication channels based on Unix pipes. The implementation we want uses a more structured setup: a publish-subscribe architecture using ZeroMQ.

The plumbing language itself is not changing. The surface syntax, the type system, the categorical semantics, the compiler: all of these stay the same. What changes is only the runtime, the part that executes compiled programs. This matters, because it means we are doing a true refactor: same specification, new implementation.

We're doing it systematically. We started by writing an operational semantics for the runtime and working out the details of the pub/sub architecture. Then we took those design notes and put them over the Chinese wall into engineering. The engineering agents received the design documents, made a plan for the refactor, and are carrying it out piecemeal, preserving the existing functionality of the system as components are replaced, because it's a big change and the system is in daily use.

Here is the fascinating part. For the intermediate state to be a true refactor and not some buggy half-and-half, the translation between the old and new representations has to be a natural transformation. Once the refactor is complete, that natural transformation simply becomes the identity and the bridge code is deleted. This reasoning doesn't just describe the refactor. It constrains the implementation so tightly that what has to be done becomes obvious.

I think this is quite novel. Software refactoring as natural transformations, and modifications between them when you're doing it in stages, is not something I've seen discussed, but it is an incredibly useful framing. It's only possible because we set the whole thing up in categorical terms from the beginning. And here is the practical lesson: if you know a little category theory, you can use AI coding agents extremely effectively. The categorical structure constrains their work so tightly that what they have to do basically falls out of the mathematics. They don't have a huge amount of latitude to wander off in one direction or another. The results are correct almost by construction.

One further note: I did not tell the coding agent to think about refactoring this way. The agent was given the design documents (the operational semantics and the ZMQ architecture note) and asked to plan the refactor. The natural transformation framing is something it arrived at on its own. This is an observation of an emergent pattern in AI-assisted software development, not a prescribed methodology.

What follows was written by Claude (Opus 4.6). I noticed that naturality had come up in the planning of the implementation, stopped the agent, and asked it to explain what it had done. Full transcript of the session.

Context

The plumbing language implements a monoidal category of stream-processing morphisms. Channels are objects (typed streams); morphisms are stream transformers (id, copy, merge, map, filter, barrier, agent). Composition is sequential (;), the monoidal product is parallel ().

The ZMQ fabric (PLMB-037) implements this category using ZMQ PUSH/PULL sockets for channels and Eio fibres for inline morphisms. Agent subprocesses are external Unix processes that communicate with the fabric.

Question: When agent subprocesses use a different I/O protocol (port-multiplexed JSON envelopes on stdin/stdout) than the fabric (topic-framed ZMQ sockets), what is the categorical status of the translation layer between them?

Two Concrete Stream Representations

The abstract algebra specifies typed streams. There are (at least) two concrete representations in the implementation:

Representation F: ZMQ topic-framed sockets

Each logical stream is a separate ZMQ socket pair (PUSH/PULL). The stream's identity is the socket endpoint address. EOF is signalled by a topic-level sentinel (topic = "__eof", empty payload). This is the fabric's native representation.

Representation G: Port-multiplexed JSON Lines

All logical streams share a single Unix pipe (stdin or stdout). Each message is a JSON envelope identifying the stream: {"__port":"input","msg":<payload>}. Per-stream EOF is: {"__port":"input","__eof":true}. This is the agent binary's native representation (PLMB-034 protocol).

The Bridge as Natural Transformation

Let C be the category of typed streams in the plumbing algebra. Define two functors:

The bridge is a natural transformation η : F ⇒ G (and its inverse η⁻¹ : G ⇒ F). For each stream object A in C, the component η_A translates between the ZMQ socket carrying type A and the port-envelope representation carrying type A.

Naturality

The naturality condition requires that for every morphism f : A → B in C:

η_B ∘ F(f) = G(f) ∘ η_A

In concrete terms: applying the morphism f in the ZMQ representation and then translating to the envelope representation must give the same result as first translating to the envelope representation and then applying f there.

This holds because:

  1. The bridge translation is pointwise on messages — it wraps or unwraps each message independently, without reordering, buffering, or transforming content.
  2. It preserves FIFO order — both ZMQ PUSH/PULL and Unix pipes are FIFO channels, and the bridge forwards in order.
  3. It preserves EOF semantics — ZMQ EOF sentinel maps to port- specific EOF envelope, and vice versa.
  4. It is independent of the message type — the translation depends only on the framing protocol (which port, which socket), not on the schema of the data being carried.

The bridge is natural in the message type: η_A and η_B perform the same wrapping/unwrapping operation regardless of whether A and B are {text: string} or {query: string, temperature: float}.

Invertibility

The bridge is an isomorphism (natural isomorphism) up to the multiplexing structure:

The mux direction is not literally inverse to the demux direction because they operate on different streams (input vs output). But for any single stream, the round-trip F → G → F is identity on content.

Why NOT a Morphism

The bridge should NOT be modelled as a morphism in the plumbing algebra. Morphisms in C transform message content (map, filter) or stream topology (copy, merge, barrier). The bridge does neither — it translates between two representations of the same abstract stream.

If the bridge were a morphism, it would appear in the routing table, have a type signature, and be composable with other morphisms. This is wrong: the bridge is an implementation detail of the process boundary, not a user-visible stream transformation.

Categorically: morphisms are arrows in C. The bridge is an arrow in the functor category [C, Impl], not in C itself. It relates two different implementations of the same abstract pipeline, not two different pipelines.

Confinement Invariant

The __port envelope protocol is the data-layer framing of representation G. It must be confined to:

  1. lib/fabric/bridge.ml — the mux/demux translation
  2. bin/agent/main.ml — the agent binary's I/O

It must NEVER appear in:

This confinement ensures that when representation G is eventually eliminated (agent binary rewritten to use ZMQ directly), the bridge and its envelope protocol disappear cleanly with no changes to the fabric, routing table, or inline morphisms.

Relationship to D6 (EOF Sentinel)

The D6 design decision placed EOF signalling in the framing layer (ZMQ topic field) rather than the data layer (JSON payload). This was precisely to avoid collision with legitimate message content.

The agent's __eof protocol predates D6 and lives in the data layer. The bridge translates between these two layers:

This translation is safe because it happens at the process boundary where both protocols are fully specified. The bridge is the single point where framing-layer and data-layer EOF meet.

Future: Elimination of the Bridge

When the agent binary is rewritten to use ZMQ sockets directly (PLMB-037 Step 5c), the agent switches from representation G to representation F. The natural transformation η becomes the identity on each component. The bridge code is deleted. The algebra, the routing table, and the fabric are unchanged — only the process boundary adapter disappears.

This is the expected lifecycle of a natural transformation used for backward compatibility: it exists during the transition, is invisible to the abstract algebra, and vanishes when the representations converge.