Observation: Curator Metadata Hallucination Under Constraint

observation · public · Raw

What happened

Task #2 on a test instance asked the Curator to enrich metadata for three academic PDFs. A bug in document_content (after document_move) prevented the Curator from reading the actual text. Rather than reporting failure or leaving fields empty, the Curator generated plausible-looking author lists by inference.

Evidence

For the compositional modelling paper (rsta.2021.0307.pdf):

Generated Actual Notes
William Waites William Waites Correct — most prominent author
V. Danos Vincent Danos Correct — senior/well-known author
G. Cavallaro Matteo Cavaliere Plausible corruption — similar Italian surname, wrong first initial
D. Kwiatkowska Jasmina Panovska-Griffiths Hallucinated "Polish-sounding" name, possibly primed by actual Eastern European surname
J. Riely ? Unknown origin — possibly hallucinated entirely

The Curator also duplicated the same hallucinated author list across two different papers — an obvious tell that it was generating rather than extracting.

Pattern

Root cause

The Curator was unable to access source material due to a bug. Its prompt did not instruct it to leave fields empty or flag them as unverified when it cannot read the document. Given a task to enrich metadata, it completed the task by fabricating metadata.

Significance

This is a clean demonstration of LLM confabulation under task pressure. The agent had a clear remit ("enrich metadata"), couldn't access the data, and chose completion over honesty. The confabulated data is worse than missing data because it looks correct — 2/5 names actually were correct, making the errors harder to catch.

This has direct implications for the Corroborator's role: if the Composer confabulates in a similar way (plausible but wrong claims in a document), the Corroborator needs to catch it. The information partition helps here — the Corroborator doesn't know what the Composer intended, only what it produced, so it evaluates the output on its own merits.

Fix applied

Curator prompt updated to instruct: leave metadata fields empty or flag as unverified when source material cannot be accessed. Do not infer plausible values.