2026-06-10 · verification

My research agent fabricated 4 of 5 citations, so I built a verification gate

I spot-checked the citations my extraction pipeline produced and most of them were wrong. Here is the gate I built so that can never happen again, and why I now wrap deterministic checks around every probabilistic system I ship.

I run an extraction pipeline that reads research papers and pulls out structured claims. Thousands of papers go in, claims with citations come out. The citations are the entire point. A claim without a real source is just a model's opinion wearing a lab coat.

One evening I decided to spot-check five citations by hand. I was not worried. The output looked immaculate: author names, years, journals, the confident texture of real scholarship.

Four of the five were wrong.

Not subtly wrong. One paper did not exist. One existed but said something different from what was attributed to it. Two had real authors attached to titles those authors never wrote. If I had published that batch, anyone who checked my sources would have been right to never trust the project again. And the worst part is that nothing about the output looked suspicious. Fabricated citations look exactly like real ones. That is what a language model does: it produces the most plausible-looking thing, and a plausible-looking citation is a well-formed one, not a true one.

The mistake was architectural, not promptual

My first instinct was to fix the prompt. Be more careful. Only cite real papers. Double-check your sources. If you have built anything with language models you already know how that story ends. The fabrication rate goes down, the confidence goes up, and you have made the problem worse, because now the errors are rarer and you have stopped looking for them.

The actual lesson took me a day to accept: no instruction makes a probabilistic system deterministic. If a property of the output must always hold, the property has to be enforced by something that is not the model.

So I stopped asking the model to be trustworthy and built a gate instead.

The verification gate

The design is almost embarrassingly simple. Between extraction and publication sits a checkpoint that no claim can skip:

  1. Every citation the model produces is parsed into structured fields: title, authors, year, identifier.
  2. Each one is checked against an authority that cannot hallucinate, in my case the OpenAlex and Crossref APIs, which index the real scholarly record.
  3. A citation passes only if the paper exists, the metadata matches, and the identifier resolves.
  4. Anything that fails is quarantined with the reason attached. A human looks at the quarantine queue, not the happy path.
  5. Every published claim carries its verification timestamp, so trust is inspectable later.

The model is still free to be brilliant and still free to be wrong. It just is not free to publish.

What surprised me is how the gate changed the economics of the whole pipeline. Before, every output needed suspicion, which meant slow manual review of everything. After, my attention goes only where the gate says it should. The deterministic check does not make the model better. It makes the model's failures cheap.

The pattern underneath

Once I saw it, I started seeing it everywhere. The pattern is: let the probabilistic system propose, and let a deterministic system dispose.

A calculator does not negotiate. An API that resolves identifiers does not get creative. A database constraint does not have good days and bad days. Any time the cost of a wrong output is high, there should be something with no imagination standing between the model and the world. In my fitness app the meal numbers are computed by plain code and the model is only allowed to choose between validated options, because a hallucinated macro total is a lie someone eats. Same pattern, different domain.

This is now the first question I ask about any AI system I am building or reviewing: where is the gate? What property must always hold, and what enforces it, given that the model cannot? If the answer is "the prompt asks nicely," the system is not done.

What I would tell you to steal

If you ship anything where citations, numbers, identifiers, or legal references come out of a model, build the gate before you scale the pipeline. Mine took a weekend. The shape is always the same: parse the load-bearing fields out of the output, verify them against a source of truth that cannot hallucinate, quarantine failures with reasons, and stamp what passed. Your authority might be Crossref, a price database, a statute index, or your own ledger. The point is that it is not the model.

The model writes the essay. It does not get to grade it.

One useful essay a week. No noise.