FUNDAMENTALS · 1/12

Learned in public

—— I ship AI systems for a living. This series is me going one level deeper on each fundamental behind them, in public: read the primary sources, rebuild the mechanic small enough to break on purpose, then write the explainer I wish existed. I learned to code in public once before, in 2015, and it changed my life. Running the same play on the layer beneath my own work.

How do you wrap deterministic checks around a probabilistic system so fabrications cannot ship?

What breaks: No instruction makes a probabilistic system deterministic; the property has to be enforced by something that is not the model.

The build: A citation verifier that validates every cited source against an authority API

Which numbers in an AI system should the model never author, and where does the arithmetic actually live?

The build: A deterministic calculation layer the LLM can invoke but never override

Ships on schedule

What does cosine similarity over embedding space actually measure, and which queries does it quietly fail?

The build: Embed one corpus two ways and show where the retrievals disagree

Ships on schedule

Why does chunk size and overlap dominate retrieval quality more than model choice?

The build: Same corpus, three chunking strategies, measured hit rates

Ships on schedule

What do hybrid retrieval and reranking rescue that pure vector search misses?

The build: Add BM25 and a reranker to week 3's corpus and diff the results

Ships on schedule

Why does output validation belong outside the model, and what does schema-constrained extraction look like at scale?

The build: Extract typed records from 50 messy PDFs with a validator that rejects bad rows

Ships on schedule

Why do language models fabricate, and what conditions raise or lower the rate?

The build: Induce fabrications on demand and measure which prompts make them worse

Ships on schedule

How does a small set of golden questions catch silent quality drift before users do?

The build: A golden set of real decided outcomes that benchmarks models and catches rule bugs

When does retrieval beat context stuffing, and what does caching actually save?

The build: One task, three context strategies, a cost and quality table

Ships on schedule

When does giving a model agency help, and how do you bound the blast radius when it does not?

The build: A three-tool agent with a deliberately constrained action space

Ships on schedule

Why should high-stakes AI suggest rather than act, and what does a trustworthy approval queue with an immutable log require?

The build: Propose, approve, apply: a suggestion queue over a Postgres audit table where UPDATE and DELETE are impossible by trigger

Ships on schedule

What do regulated buyers ask first: PII boundaries, prompt injection, and data governance under pressure?

The build: Red-team my own pipeline and publish the findings

Ships on schedule

Follow along. One fundamental a week, with the build and the breakage.