FUNDAMENTALS · 1/12
Learned in public
—— I ship AI systems for a living. This series is me going one level deeper on each fundamental behind them, in public: read the primary sources, rebuild the mechanic small enough to break on purpose, then write the explainer I wish existed. I learned to code in public once before, in 2015, and it changed my life. Running the same play on the layer beneath my own work.
How do you wrap deterministic checks around a probabilistic system so fabrications cannot ship?
What breaks: No instruction makes a probabilistic system deterministic; the property has to be enforced by something that is not the model.
The build: A citation verifier that validates every cited source against an authority API
01
Which numbers in an AI system should the model never author, and where does the arithmetic actually live?
The build: A deterministic calculation layer the LLM can invoke but never override
Ships on schedule
02
What does cosine similarity over embedding space actually measure, and which queries does it quietly fail?
The build: Embed one corpus two ways and show where the retrievals disagree
Ships on schedule
03
Why does chunk size and overlap dominate retrieval quality more than model choice?
The build: Same corpus, three chunking strategies, measured hit rates
Ships on schedule
04
What do hybrid retrieval and reranking rescue that pure vector search misses?
The build: Add BM25 and a reranker to week 3's corpus and diff the results
Ships on schedule
05
Why does output validation belong outside the model, and what does schema-constrained extraction look like at scale?
The build: Extract typed records from 50 messy PDFs with a validator that rejects bad rows
Ships on schedule
06
Why do language models fabricate, and what conditions raise or lower the rate?
The build: Induce fabrications on demand and measure which prompts make them worse
Ships on schedule
07
How does a small set of golden questions catch silent quality drift before users do?
The build: A golden set of real decided outcomes that benchmarks models and catches rule bugs
08
When does retrieval beat context stuffing, and what does caching actually save?
The build: One task, three context strategies, a cost and quality table
Ships on schedule
09
When does giving a model agency help, and how do you bound the blast radius when it does not?
The build: A three-tool agent with a deliberately constrained action space
Ships on schedule
10
Why should high-stakes AI suggest rather than act, and what does a trustworthy approval queue with an immutable log require?
The build: Propose, approve, apply: a suggestion queue over a Postgres audit table where UPDATE and DELETE are impossible by trigger
Ships on schedule
11
What do regulated buyers ask first: PII boundaries, prompt injection, and data governance under pressure?
The build: Red-team my own pipeline and publish the findings
Ships on schedule