PROJECT · active
NoeticMap
An extraction pipeline that turns consciousness research, 65,074 papers and 8,940 experience accounts, into structured, citable claims.
TypeScript · Python · Postgres · OpenAlex · local LLM inference
8,940
Total Experiences
8,062
AI Analyzed
16.4
Avg. Greyson Score
25
Audio Experiences
Fear of Death · n=7,600
nderf · oberf · adcrf corpora · verbatim-quoted claims
A research map with fabricated citations is worse than no map at all.
What it is
NoeticMap reads consciousness research at scale, near-death experiences, out-of-body experiences, and after-death communications, and turns it into a structured map of claims: who found what, with what evidence, citing which sources. The platform aggregates 8,940 published experience accounts from the NDERF, OBERF, and ADCRF research foundations, with 8,062 fully AI-processed (analysis, embeddings, Greyson NDE Scale scoring on 4,315 of them). On the literature side, the extractor has discovered 65,074 academic papers, triaged them down to 2,944 relevant ones, and fully extracted 2,233, yielding 10,063 structured key insights, 2,751 findings, and 775 case studies.
The problem
Research literature is where answers live, but nobody can read two thousand papers, let alone sixty-five thousand. The naive fix is to point a language model at the pile and ask for summaries. That fails in the worst possible way: the output looks scholarly while quietly inventing its sources. In a research tool, a fabricated citation is not a bug, it is a betrayal.
What I built
A multi-stage pipeline: paper discovery through OpenAlex, relevance triage, two-phase extraction (a cheap scan pass, then deep extraction on papers that earn it), entity structuring, and publication into a queryable map with semantic search. Deep extraction runs on local Qwen models on Apple Silicon rather than cloud GPUs, which forced the pipeline to be efficient instead of expensive: the full 65,074-paper run cost about $44 in API spend, with the heavy extraction free and local.
The verification story
The extractor is built to distrust its own model. Extraction is two-phase by design so the deep pass only sees papers that survived triage, every extracted claim must carry a verbatim quote from the source paper, and empty fields are omitted rather than filled in. Verifiable claims in the corpus, like veridical perception cases, carry an explicit verification status (verified, unverified, cannot verify) instead of a confidence the model invented. How I came to treat model-authored citations as guilty until proven innocent is in the verification gate essay.
What broke and what I learned
The first entity-resolution pass merged authors who shared names and split authors who formatted their names differently. Deterministic normalization plus conservative merge rules beat clever model-based matching. And the biggest cost win came from boring engineering: caching, batching, and chunking choices moved throughput more than any model upgrade.
Status
Active. The pipeline powers ongoing research work, and its architecture is the template for every document system I build now: extract with a model, constrain it to verbatim evidence, publish with provenance.