Karpathy Killed Your Vector DB With A Folder Of Markdown Files

Andrej Karpathy's viral LLM wiki concept skips vector databases entirely and replaces them with a folder of markdown files. The token math is brutal: one user dropped query usage by 95%.

The VIP Desk

4 min read·May 13, 2026·Summarizing Nate Herk

the-prompt-vip

Andrej Karpathy posted a quiet tweet last week explaining how he's been organizing his research. No code release, no GitHub repo — just a paragraph describing a folder layout. Within 48 hours it had blown up on X, and Nate Herk walked through a complete implementation on his channel. The takeaway hits harder than the size of the post suggests: the entire vector-database industry might be optional for most people.

The whole "AI second brain" stack everyone has been overbuilding for two years collapses into four files and a directory.

The architecture is almost insulting in its simplicity

Here's what Karpathy's setup looks like on disk:

/raw          ← original source documents go here
/wiki         ← AI-generated synthesis pages live here
agents.md     ← instructions the LLM reads
index.md      ← auto-generated catalog of everything
log.md        ← append-only audit trail

That's it. No embedding model. No vector database. No chunking pipeline. No pinecone.init(). You drop a PDF, an article, or a YouTube transcript into /raw, tell Claude Code to ingest it, and it does five things in order: reads the source, creates or updates wiki pages, extracts entities (people, companies, tools, concepts), updates index.md, and appends to log.md. Every wiki page is a plain markdown file with [[wiki-link]] cross-references. Open the directory in Obsidian and the graph view shows you the relationships.

Karpathy summed it up in his follow-up gist: "I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries."

The token math is the part that matters

This isn't a quality argument. It's an economics argument.

One X user posted that converting 383 scattered files and 100+ meeting transcripts into Karpathy's wiki layout dropped their Claude query token usage by 95%. That's not a typo. The reason is structural: instead of stuffing similar chunks into context and hoping similarity search lands the right ones, the LLM reads the index, follows the wiki-links it needs, and ignores everything else. Targeted reads beat embedded recall when you have a clean structure.

Nate Herk hit the same result on his own setup — he replaced the old context-file approach in his Claude Code executive-assistant project with a wiki vault, and token usage dropped without quality loss. His exact words: "I used to do this with context files inside this project. When I changed over to this method, I actually saw a reduction in tokens."

What it's not good for

The honest part Karpathy doesn't bury: this approach scales to hundreds of documents and ~500,000 words. Not millions. If you're indexing the entire Stripe customer support corpus or a 10-million-document enterprise knowledge base, you still want a real RAG pipeline. The wiki design lives in the personal-to-team scale — exactly where most of us actually operate.

Here's the cleaner version of the trade-off:

	Karpathy wiki	Traditional RAG
Setup time	5 minutes	Days
Infrastructure	A folder	Embedding model + vector DB + chunker
How it finds info	Reads indexes, follows links	Similarity search over chunks
Cost per query	Token cost only	Compute + storage + tokens
Update cost	Re-run a lint	Re-embed affected docs
Sweet spot	<500K words	Multi-million-doc corpora

The five-minute setup

Nate's walkthrough is the closest thing to an official build guide:

Install Obsidian (free, optional but makes the graph view worth it).
Create a new vault — really just an empty folder.
Open Claude Code in that folder.
Paste Karpathy's gist as the prompt and follow with: "You are now my LLM wiki agent. Implement this exact idea as my complete second brain. Guide me step by step. Create the CLAUDE.md, the schema, the folders."
Drop your first source into /raw — Claude Code will ask whether the vault is for personal use, research, business knowledge, etc. The answer shapes the schema it builds.

That's it. The first ingest of a long article (Nate used the AI 2027 essay) took about 10 minutes and produced 23 interconnected wiki pages from a single source. Each page had tags, back-links to people and concepts, and showed up as a node in the graph view.

Where this gets interesting if you push it

Karpathy keeps his vault simple. The architecture leaves room to grow:

Linting — schedule an LLM pass that scans the wiki for inconsistencies, missing data, and "new article candidates" (gaps that should be filled). Daily or weekly.
Multiple vaults, one assistant — Nate runs separate vaults (personal brain + YouTube knowledge) and points his Claude Code executive assistant at both via paths in CLAUDE.md. The assistant pulls the right vault for the right question.
Hot cache — a small hot.md (500 chars) of the most recently discussed items, so the assistant doesn't always have to traverse the wiki for routine queries.
Web-clipper ingest — the free Obsidian Web Clipper Chrome extension drops any article straight into /raw with one click. Set the destination folder once and you've got a passive-capture pipeline.

What to do this week

If the token-cost reduction is real for your workload (and it is for most personal/team-scale knowledge work), this is a few hours of upside for fifteen minutes of work:

Pick the corpus you waste the most tokens re-explaining to Claude — your meeting notes, your YouTube saves, your research articles, your client briefs.
Spin up an empty Obsidian vault and follow Nate's prompt above.
Ingest 5 sources and check whether the synthesis pages and back-links look useful.
Connect it to whatever Claude Code project keeps asking you the same background questions — point it at the vault's CLAUDE.md so it knows how to query.

If the experiment looks good after 5 sources, batch-ingest the rest. If not, you've spent an afternoon and you still have your sources in plain markdown.

The Bottom Line

The AI tooling industry has been racing toward bigger, more complex retrieval systems for two years. Karpathy's response — published as a tweet, not a paper — is that for most people, a folder of markdown files outperforms a vector database on both cost and clarity. The infrastructure isn't the moat. The structure is. Stand up the wiki this week and watch what happens to your token bill.

the-prompt-vipAndrej KarpathyLLM wikiClaude Code knowledge baseObsidian second brainno vector databaseAI knowledge managementNate HerkRAG alternativesmarkdown wikipersonal AI