MIKIAS ABERA · TORONTO
AI systems where being wrong is expensive
—— Senior software engineer. I build document and data pipelines with verification gates, eval harnesses, and audit trails, and I write about what breaks along the way.
extract.
verify.
publish.
Proofover promises
Citation Verification Rate
Numbers straight from my pipeline databases. Every one is checkable.
No model-authored numbers.
Every output earns trust before it ships.
Verifiedby default
PROJECTS
Systems built to be right
—— Each one carries its own verification story.
COMPLIANCE · ACTIVE
/01
PermitCheck
limits resolved per address · flagged, not guessed
A Toronto permit-readiness checker that verifies a package against the property's actual zoning, and says what it can't confirm instead of guessing.
- +
Resolves each lot's binding zoning limits from the City's official data
- +
Flags variances with the governing by-law section cited on every line
- +
Precision-first: when it flags, it's right; otherwise it says verify
- +
Graded against real decided applications, the City's own determinations as the key
- +
Multi-pass extraction and a measured model choice, every change eval-gated
ENACTED · ACTIVE
/02
Enacted
This week · 9 sectors
Every change to Canadian law, in plain English: daily diffs of nearly 9,000 Ontario and federal regulations and statutes, with AI summaries that are never allowed to invent a fact.
- +
Tracks nearly 9,000 consolidated laws across three jurisdictions: Ontario (daily against e-Laws), the entire federal corpus (git-diffed against Justice Canada's official XML repo), and British Columbia (per Gazette issue), plus Ontario proclamations and per-law RSS watch feeds
- +
Every amendment becomes an exact, computed line-by-line diff between consolidated versions, never AI output
- +
AI writes the plain-English summary of the computed diff only, behind a citation gate that rejects any summary introducing facts not present in its input
- +
Citations, dates, and version numbers render from official metadata, never from the model
- +
Weekly digest by industry sector, with CASL double opt-in and one-click unsubscribe
- +
Seeded with 8 weeks of real history on day one: 128 change events, 120 published summaries, 1 caught by the gate
EXTRACTION · ACTIVE
/03
NoeticMap
8,940
Total Experiences
8,062
AI Analyzed
16.4
Avg. Greyson Score
25
Audio Experiences
Fear of Death · n=7,600
nderf · oberf · adcrf corpora · verbatim-quoted claims
An extraction pipeline that turns consciousness research, 65,074 papers and 8,940 experience accounts, into structured, citable claims.
- +
8,940 experience accounts aggregated from NDERF, OBERF, and ADCRF
- +
8,062 fully AI-processed with analysis, embeddings, and Greyson scoring
- +
65,074 papers discovered, 2,944 relevant, 2,233 fully extracted
- +
10,063 key insights pulled from the academic literature
- +
Deep extraction runs on local Qwen models, not cloud GPUs
EVIDENCE · ACTIVE
/04
Naulus
Fitness & Nutrition · The Evidence, Mapped
Claims, mapped to their evidence.
Getting lean isn’t a mystery. It’s a settled question buried under noise.
A public evidence map for fitness and nutrition claims, where nothing publishes with an unverified citation.
- +
Citations resolved by DOI against Crossref before they back a claim
- +
No verified citation, no publication
- +
98.2% resolution rate across a 1,000-DOI sample
- +
Evidence tiered A to D per study, not per headline
- +
Numbers computed in code, never by the model
DOSSIERS · SHIPPED
/05
Xzema
Eczema treatments, mapped to their actual evidence
Eczema treatments mapped to their actual evidence: 25,037 papers and 1.1 million community reports, distilled into human-reviewed dossiers.
- +
25,037 papers sourced, 13,222 relevant, 9,257 fully extracted
- +
1.14M community posts mined for signal
- +
Evidence grades computed from study design, not stated labels
- +
Sanity-gated rubric caught grade inflation on run one
- +
Citations must resolve at build time or they are dropped and logged
PRACTICE · SHIPPED
/06
Blindfold Lab
Learn to See
Without Your Eyes
An audio-guided practice platform for blindfolded perception: once a session starts, everything is voice, keyboard, or swipe.
- +
Eyes-free by design: all guidance is spoken, all input is keyboard or swipe
- +
Three drills shown full-screen: contrast, colors, and shapes
- +
12-trial sessions across three difficulty tiers, 5s down to 2s exposure
- +
Tracks accuracy and reaction time, suggests level changes over your last 3 sessions
- +
Trust-based practice logs, no verification claims
WRITING
What breaks gets written down
—— Build logs, failure reports, and the fundamentals series. One useful essay a week.
Writing
What breaks needs to be written down.
Essays on making AI systems reliable: pipelines, verification, evals, and the failures that taught me.
- +Build logs and case studies
- +Fundamentals, learned in public
- +Failure reports with fixes
Never let the LLM author the numbers
FUNDAMENTALS
Learned in public
—— One fundamental per week. Primary sources, a tiny build that breaks on purpose, and the explainer I wish existed.
How do you wrap deterministic checks around a probabilistic system so fabrications cannot ship?
The build: A citation verifier that validates every cited source against an authority API
01
Which numbers in an AI system should the model never author, and where does the arithmetic actually live?
The build: A deterministic calculation layer the LLM can invoke but never override
Ships on schedule
02
What does cosine similarity over embedding space actually measure, and which queries does it quietly fail?
The build: Embed one corpus two ways and show where the retrievals disagree
Ships on schedule
03
Why does chunk size and overlap dominate retrieval quality more than model choice?
The build: Same corpus, three chunking strategies, measured hit rates
Ships on schedule
04
FAQ
Asked and answered
—— The questions people actually ask about the work, all in one place.
Document and data pipelines with AI in the loop: extraction systems, retrieval over regulated documents, verification gates that stop fabrications from shipping, and the eval harnesses that catch drift before users do.
Because my own pipeline once fabricated 4 of 5 citations and nothing about the output looked wrong. If a property of the output must always hold, something other than the model has to enforce it. That principle shapes everything I ship.
A 12-week public curriculum: each week I take one fundamental behind systems I already run in production, read the primary sources, rebuild it small enough to break on purpose, and publish the explainer I wish existed.
TypeScript and Python, Next.js, Postgres, and whichever model fits the task. The interesting decisions are rarely the model: they are chunking, retrieval, schemas, evals, and where the deterministic checks live.
Yes. Everything ships to the writing section and the email list, one useful essay a week. The projects pages show the verification story behind each system.
How I work
Verified by default, explained in public.
Every system I ship has a verification story: what must always hold, and the deterministic check that enforces it.
28,000+ papers extracted, 333,000 triaged
Every citation machine-verified against OpenAlex + Crossref.
“Four of the five citations were wrong. Not subtly wrong. Fabricated citations look exactly like real ones, and that is the whole problem: the most plausible-looking thing is a well-formed citation, not a true one. So I stopped asking the model to be trustworthy and built a gate instead.”
The verification gate
Week 01 of Production AI Fundamentals
Read it