The Complete Guide to Karpathy's LLM Wiki Workflow

After 5 days, 16M tweet views, and 15+ GitHub implementations: a practical guide to replicating Karpathy's LLM wiki workflow with the exact tools, schemas, and patterns that work.

Five days after Andrej Karpathy posted about his "LLM wiki" workflow, the internet had already produced more than fifteen working implementations, a case study from a diary of 2,500 entries, and a running argument about whether any of this is just RAG with extra steps. The tweet is sitting at sixteen million views. The follow-up gist has passed five thousand stars and 1,483 forks. People are not reading this as commentary — they are reading it as instructions.

This guide is for readers in the second camp. It walks through what Karpathy actually built, the exact tools he named, the directory structure that keeps the whole thing coherent, the community implementations that are pushing the idea in useful directions, and the places where it quietly falls apart. It assumes you want to build one.

What Karpathy actually built

On April 2, 2026, Karpathy published a short post on X and a longer gist describing a personal knowledge workflow. The framing mattered as much as the content. He wrote that "a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge" — a notable admission from someone whose public identity is built on writing some of the most-read deep learning code of the last decade.

The core description, from the gist, is one sentence that is worth quoting exactly:

"I index source documents into a raw/ directory, then I use an LLM to incrementally 'compile' a wiki, which is just a collection of .md files in a directory structure."

That is the entire mechanism. Raw inputs go into one folder. An LLM reads them and emits linked markdown files into another folder. The LLM is responsible for the cross-references, the index, the headings, the formatting, and — critically — for going back and updating earlier pages when new information arrives. The human operates the wiki through Obsidian, not through a custom interface.

Karpathy framed it as a "vibe-coded" personal system, not a product. That framing is part of why it spread: it was obviously reproducible by anyone with a terminal and an LLM key. Within 48 hours, people were forking the gist and pushing working repos. Within five days, the ecosystem had enough variation to talk about best practices.

The exact tools Karpathy uses

Karpathy was specific about his stack. It is worth listing the pieces in full because a lot of the secondary commentary has been sloppy about this.

Obsidian — the frontend and IDE. Karpathy browses, queries, and edits the wiki through Obsidian's markdown-native interface. He did not build a custom UI. This is a deliberate separation: the AI writes the files, Obsidian reads them.
Obsidian Web Clipper — the ingestion path for web articles. It converts pages into clean markdown that lands in raw/ without manual copy-paste.
A custom, vibe-coded CLI/web search tool — Karpathy mentions a small search layer he wrote himself for discovering source documents. He did not release it. It is an idiosyncratic piece that most reimplementations replace with ripgrep or a third-party search.
qmd by Tobi Lütke — Shopify's CEO shipped a hybrid BM25 + vector search tool that Karpathy endorsed as the scaling path once a wiki outgrows a single index file. qmd is the answer to "what do I do when index.md no longer fits in context."
Marp — markdown-to-slides. Karpathy uses his wiki as the source of truth for talks. A slide deck is just another compiled output from the same content.
Dataview — an Obsidian plugin for querying frontmatter. If every page has a confidence: 0.7 or stale: true field, Dataview turns the vault into a queryable database without leaving markdown.
matplotlib — Karpathy has the LLM generate Python plots from structured data in the wiki and render them as images next to the prose. Charts are compiled, not hand-drawn.
Git — versioning for both raw/ and wiki/. Every LLM edit becomes a commit, which means you can diff what the model did and roll back when it hallucinates.

He did not name a specific LLM. But the schema file in his gist is called CLAUDE.md, which is the convention used by Claude Code. The community has taken this as a strong hint that the primary driver is Claude Code operating on a local directory.

The architecture: raw/ vs wiki/

The directory layout is the load-bearing idea. Nearly every mistake in the early reimplementations traces back to blurring the two folders.

knowledge/
├── raw/           # immutable source documents (never edited by LLM)
│   ├── articles/
│   ├── papers/
│   ├── notes/
│   └── transcripts/
├── wiki/          # LLM-owned markdown (rewritten on every ingest)
│   ├── index.md
│   ├── people/
│   ├── concepts/
│   └── projects/
└── CLAUDE.md      # schema, conventions, instructions for the agent

raw/ is immutable. You drop files in, the LLM reads them, nothing writes back. If you edit a raw file, you are editing your sources, and the wiki becomes a lie. Treat it like a read-only archive. Most implementations enforce this with a git pre-commit hook or a simple chmod.

wiki/ is LLM-owned. You do not edit wiki pages by hand. If you do, the next ingest will silently overwrite your changes. If you want to correct something, you either fix the underlying raw document or add an instruction to CLAUDE.md. This feels strict, and it is, but it is the only way the "self-maintaining" property holds.

CLAUDE.md (or AGENTS.md) is the schema file. It tells the agent what conventions to use: how to name files, what frontmatter fields are required, what style the prose should be in, how to handle contradictions, when to create a new page versus update an existing one. This file is the closest thing the workflow has to a configuration layer.

Three operations run against this structure:

Ingest — a new file lands in raw/. The agent reads it, identifies which existing wiki pages it touches, and updates roughly ten to fifteen pages in one pass: the new topic page, the index, cross-referenced concepts, relevant people pages. This is the expensive operation. It is where the "compilation" happens.
Query — you ask a question. The agent reads wiki/index.md, decides which pages are relevant, opens those pages, and synthesizes an answer. Critically, it queries the compiled wiki, not the raw archive. The compilation work pays off here.
Lint — run on a schedule or on demand. The agent audits the wiki for contradictions ("page A says X, page B says not-X"), orphans (pages nothing links to), and stale claims (assertions whose source in raw/ has been superseded). This is the operation that prevents decay.

What people are actually building

Within five days of the original post, GitHub had more than fifteen reimplementations. A handful are substantive enough to learn from.

Ar9av/obsidian-wiki treats the agent layer as pluggable. The same wiki can be driven by Claude Code, Codex, Cursor, Windsurf, Copilot, or Gemini, switched through a skills-based architecture that wraps each provider. This is useful if you want to avoid betting the whole setup on one vendor's pricing curve.

nvk/llm-wiki is from NVK, a long-time Bitcoin developer, and it is the most opinionated implementation so far. It introduces three query depth levels (fast, deep, exhaustive), explicit confidence scoring on every claim, and dual-linking — every wiki page links both to the raw sources it was built from and to other wiki pages it relates to. The dual-linking is the idea worth stealing.

ussumant/llm-wiki-compiler published the only real benchmark so far. Tested on a corpus of 383 markdown files (13.1 MB), it reports roughly 84% fewer tokens per query session compared to loading the raw files directly — about 3,200 lines of context per session dropping to around 330. This is the empirical result nobody has surfaced prominently, and it is the strongest quantitative case for the whole approach.

kenhuangus/llm-wiki runs the entire pipeline on local models — LM Studio serving Gemma 4 — and adds automated monitoring of arXiv and CVE feeds as continuous ingest sources. No cloud, no API bills, and the wiki updates itself overnight while the machine is idle.

iamsashank09/llm-wiki-kit reimplements the whole thing as an MCP server. Instead of running the agent as a CLI, it exposes ingest/query/lint as MCP tools that Cursor or Claude Code can call directly. This is probably where the ecosystem is heading.

swarajbachu/cachezero was the first Show HN launch in the space, billing itself as "Karpathy's LLM wiki idea as one NPM install." It is the lowest-friction way to try the pattern if you want to get a working vault running in under ten minutes.

The Farzapedia case study

The most striking practitioner example is from Farza Majeed (@FarzaTV). Farza fed roughly 2,500 entries from his personal diary, Apple Notes, and iMessage conversations into a Karpathy-style pipeline and called the result "Farzapedia." The output was around 400 interlinked wiki articles covering friends, past startups, research interests, and — because this is Farza — several anime deep-dives.

What makes Farzapedia interesting is the source material. It is not a research corpus. It is the disordered ambient data of a life: texts, throwaway notes, dated diary entries. The compilation step pulled all of it into something navigable. Karpathy himself quote-tweeted it, calling the resulting memory artifact "explicit and navigable" — which is a careful phrase. The point is not that the wiki knows more than Farza's notes did. It is that the wiki can be walked.

This is the most compelling existence proof so far that the pattern generalizes beyond Karpathy's own research workflow.

Best practices that emerged in days

Despite the short timeline, a rough consensus has formed on what works.

YAML frontmatter with confidence and staleness. Every wiki page gets a header like:

---
title: "Pipeline v2 architecture"
confidence: 0.8
last_ingested: 2026-04-06
sources: [raw/notes/2026-03-22-sync.md, raw/articles/pipeline-blog.md]
content_hash: 8f3a...
stale: false
---

Dataview queries then surface low-confidence pages, stale pages, or orphans directly in Obsidian.

Strict raw/ vs wiki/ separation. Already covered above, but it is the practice most often broken in early forks, and it is the one with the worst failure mode.

Steph Ango's vault separation pattern. Steph Ango (@Kepano), Obsidian's CEO, posted during the week that agents should "make a mess in their own space" rather than editing the human's personal vault. His recommendation is to keep your hand-written Obsidian notes in one vault and point the LLM wiki at a separate, sacrificial vault. This prevents what he calls hallucination contamination — the situation where a model invents a fact, writes it into your trusted notes, and then reads it back next week as if it were ground truth. Once a hallucinated link is in your graph, it gets reinforced on every subsequent ingest.

Content hash detection. Hash every raw file on ingest. If the hash changes, flag every wiki page that cited it as stale. This is the cheapest possible lint step and catches most decay.

qmd for scaling search. The single-index-file pattern works up to roughly 100–150 articles. Beyond that, index.md itself becomes larger than a comfortable context window, and the agent starts thrashing. Tobi Lütke's qmd gives you a hybrid BM25/vector retrieval layer that the agent can call as a tool, so the index becomes an API call instead of a file read. This is the only clean scaling path anyone has published.

Schema files as conventions. CLAUDE.md and AGENTS.md are converging as the two filenames agents look for. Put your style guide, your required frontmatter fields, your cross-linking rules, and your disambiguation rules in one of these files. Everything else follows.

Where it breaks down

This is the part of the workflow most of the enthusiast coverage skips. The pattern is good. It is not finished.

The index outgrows the context window. Past a few hundred articles, wiki/index.md becomes unwieldy. You can compress it, shard it, or move to qmd, but none of these are free. Every scaling solution introduces retrieval, and retrieval is the thing the pattern was supposed to replace. At some corpus size the workflow quietly becomes RAG-over-compiled-markdown, which is defensible but worth admitting.

Citation precision is weak. The wiki can say "this decision was made on March 22" and cite raw/notes/2026-03-22-sync.md, but it cannot easily cite "page 47, paragraph 3." For research work where provenance matters, this is a gap. The only workaround so far is to preprocess raw files into small enough chunks that the filename is precise enough — which, again, starts to look like chunking.

Hallucination contamination is real. If the LLM invents a link on ingest, that link persists. On the next ingest the model will read its own invention as source material and reinforce it. Steph Ango's separate-vault pattern mitigates this; lint steps catch some of it; but nobody has a principled solution. The wiki can drift away from the raw archive and there is no automatic alarm.

Token costs at scale are unpublished. Nobody running a wiki in production has posted real numbers. The ingest operation touches ten to fifteen pages per new document, each full-read-and-rewrite. At Claude Sonnet prices, a busy knowledge worker could burn through meaningful money per month, but we do not yet know whether it is ten dollars or two hundred. The ussumant benchmark is about query-time savings, not ingest-time costs.

The "this is just RAG" debate. A significant chunk of the technical commentary is some variant of "compile markdown then query it is literally retrieval-augmented generation." The distinction worth preserving is that compilation is lossy and opinionated — the LLM makes editorial decisions on ingest that classical RAG defers to query time. Whether that counts as a new paradigm or a relocation of the same work is a definitional argument, and it is not going to be settled by a tweet.

The meeting transcript opportunity

Karpathy listed several source types in the gist. One line is worth quoting in full because it points at the gap nobody has filled:

"Business/team: an internal wiki maintained by LLMs, fed by Slack threads, meeting transcripts, project documents, customer calls. The wiki stays current because the LLM does the maintenance that no one on the team wants to do."

That sentence contains the unspoken thesis of every meeting AI tool on the market. Meeting transcripts are the highest-yield input for a Karpathy-style wiki: they are generated automatically, they are relational, they compound, and nobody wants to maintain them by hand. They are exactly the kind of document that benefits from compilation rather than storage.

And yet, as of this week, no product automatically feeds meeting conversations into a compounding Karpathy-style wiki. The closest adjacent tools stop at per-meeting summaries. The gap between "here is a summary of today's call" and "here is a maintained wiki page for the Q3 pricing decision, updated across the seven meetings where it came up" is large, and it is still mostly empty.

Proudfrog is thinking about this problem, and the earlier piece on Karpathy's wiki and meeting knowledge goes into the architectural implications in more depth. For the broader case about what a compiled meeting knowledge graph looks like in daily use, see the knowledge worker workflow, the features page, or the pricing.

The short version: Karpathy showed what the destination looks like. The interesting work now is plumbing it into the places where the raw material is already being produced.

Frequently Asked Questions

What LLM does Karpathy use for his wiki?

Karpathy did not explicitly name a model in the original post or gist. However, the schema file in his published directory structure is named CLAUDE.md, which is the convention used by Anthropic's Claude Code. The community consensus is that the primary driver is Claude Code running against a local directory, though implementations have been published using GPT-based Codex, Gemini, Cursor, and local models via LM Studio. The workflow is model-agnostic — what matters is that the agent can read and write files in a directory.

Do I need to be a developer to build my own LLM wiki?

You need to be comfortable with a terminal, git, and an LLM coding agent like Claude Code or Cursor. You do not need to write code from scratch — several reimplementations (cachezero, llm-wiki-kit, obsidian-wiki) install in a few minutes and give you a working vault. The ongoing work is curating the schema file, deciding what goes into raw/, and reviewing what the agent writes. Think of it as operating a system rather than building one.

How is this different from RAG?

Classical RAG chunks documents, embeds them, and retrieves similar chunks at query time. The retrieval is similarity-based and happens when you ask a question. Karpathy's wiki compiles raw documents into structured, linked markdown at ingest time, and the LLM makes editorial decisions — what matters, what links to what, what to update — before any query is asked. At query time the model reads the compiled wiki, not the raw archive. Past a few hundred articles the distinction gets blurrier because the index itself has to be searched, but at personal scale the compilation step carries meaningful information that pure similarity search discards.

What happens when the wiki gets too big for the context window?

The single-index-file pattern breaks around 100–150 articles, when wiki/index.md stops fitting comfortably in context. The cleanest scaling path published so far is qmd by Tobi Lütke, a hybrid BM25 and vector search tool that the agent calls as a retrieval step. Beyond that, you start sharding the wiki by topic or time period. This is the honest place where the pattern reintroduces retrieval, and it is worth planning for if your corpus is growing.

Can I use this with my meeting transcripts?

In principle, yes — drop transcripts into raw/transcripts/ and let the agent compile them. In practice, meeting transcripts are noisy, speaker-attributed, and timestamped, and a generic wiki agent does not know how to weight decisions, extract action items, or de-duplicate recurring discussions. This is the gap Proudfrog is working on. For a deeper treatment of what a meeting-native compiled wiki would look like, see Karpathy's LLM Wiki and What It Means for Meeting Knowledge.

What's the difference between raw/ and wiki/?

raw/ holds your source documents — articles, papers, notes, transcripts — and is immutable. The LLM reads from it but never writes back. wiki/ holds compiled markdown that the LLM owns completely: it creates, updates, and restructures pages as new sources arrive. You do not hand-edit wiki/ files, because the next ingest pass will overwrite them. If you need to correct something, fix the underlying source in raw/ or update the rules in CLAUDE.md. Keeping the two folders strictly separated is the single most important convention in the workflow, and it is the one most commonly broken in early forks.