2026-06-11 · verification

Never let the LLM author the numbers

A hallucinated macro total is a lie someone eats. Why every number in my systems comes from plain code, and the model is only allowed to choose between validated options.

There is a moment in every AI product where a number needs to appear on screen. Calories in a meal plan. A setback distance in a zoning answer. A confidence score. The temptation is enormous to let the model produce it, because the model is right there, it already understands the context, and it will happily oblige.

Do not let it.

A language model generating "1,847 calories" is not calculating. It is producing the kind of token that tends to follow the kind of tokens that came before. Often the result is close. Close is worse than wrong, because close passes review. A macro total that is off by 40% gets caught; one that is off by 8% gets eaten, every day, by someone trusting your product to manage exactly that number.

The pattern: models choose, code computes

In my fitness app, the meal engine works like this. Every food item lives in a database with verified nutrition facts. Every quantity is a structured field. When a meal is assembled, plain code sums the macros: the same inputs produce the same totals, forever, and a unit test can prove it.

The model still does real work. It interprets "swap the rice for something lower carb that still gets me to 40 grams of protein," finds candidate substitutions, and proposes them. But its proposals are choices among validated options, not authored facts. The mutation layer rejects anything that does not reference a real item with real numbers. The model is the interface; the code is the accountant.

The same split shows up in every system I build now:

In the zoning checker, the by-law thresholds come from extracted, verified tables. The model assembles the explanation around numbers it is not allowed to invent.
In the research pipeline, effect sizes and sample sizes are extracted fields validated against the source, never summarized into existence.
Anywhere a total, a date, a price, or a threshold appears, there is a deterministic function with its own tests behind it.

Why this is a design principle and not a workaround

It is tempting to treat this as a temporary patch until models get better at arithmetic. I think that misreads the problem. Even a model that computes perfectly is still a probabilistic system, and a property that must always hold cannot rest on a system that is allowed to vary. The architecture question is never "how good is the model," it is "what enforces the invariant when the model has a bad day."

Drawing the line between choosing and computing also makes systems easier to reason about. When a user reports a wrong number, I know exactly where to look, and it is never a prompt. When a model proposes something invalid, the rejection is logged and the failure is cheap. The system degrades into "the assistant is less helpful today," never into "the numbers are quietly wrong."

The model writes the sentence. The code writes the number. Keep it that way and a whole category of trust failures becomes structurally impossible.

The pattern: models choose, code computes

Why this is a design principle and not a workaround

One useful essay a week. No noise.