#ai #knowledge-management

LLM Wiki: what it is and how the self-updating knowledge base works

The LLM Wiki is the knowledge base an AI agent builds and updates on its own from your documents. What it is, how it works and how to bring it in-house.

by Miro Radenovic
LLM Wiki: graph-based AI knowledge base with linked markdown pages

What an LLM Wiki is

An LLM Wiki is a knowledge base made of plain markdown files that an artificial-intelligence agent builds and keeps up to date on its own, starting from the raw documents of a person or an organisation. Instead of searching the documents at every question — as a traditional RAG system does — the LLM Wiki compiles the knowledge once, during ingestion, producing synthetic pages, entity pages and concept pages that are already linked to one another.

The result is knowledge that compounds over time, like interest: every new source does not stay isolated but enriches and updates the pages that already exist. It is a simple idea with significant consequences for anyone who has to manage a lot of internal documentation.

Where it comes from: Andrej Karpathy’s idea

The pattern was popularised by Andrej Karpathy, one of the most closely followed voices in the AI field. On 4 April 2026 he published a so-called idea file on GitHub: not a code library, but a structured description of the pattern, designed to be pasted into an agent like Claude Code and adapted to your own needs (original source).

His starting observation is familiar to anyone who accumulates notes and documents: you can search them by keyword, but you can’t ask them a question. The LLM Wiki solves exactly this, turning a passive archive into a knowledge base you can query and that keeps growing.

How it works: the three-layer architecture

The heart of the pattern is a deliberately minimal three-layer structure.

LayerWhat it containsWho manages it
Raw sources (raw/)Immutable original documents: reports, PDFs, transcripts, CSVs, articlesThe user (the AI reads but does not modify)
Wiki (wiki/)AI-generated markdown pages: summaries, entities, concepts, comparisonsThe AI agent
Schema (e.g. CLAUDE.md)The instructions on how the agent structures and maintains the wikiThe user (it is configuration)

The raw sources are the source of truth: they are never touched. The wiki is the knowledge itself, written and rewritten by the AI. The schema is the configuration file that dictates conventions and workflows.

The three operations: ingest, query, lint

  • Ingest — You drop in a new source and the agent reads it, discusses the key points with you, writes a summary page, updates the index and updates the related entity and concept pages across the whole wiki. A single source can touch 10-15 different pages.
  • Query — You ask a question and the agent synthesises the answer by reading the relevant pages, with citations to the sources. The answer can be text, a table or even slides.
  • Lint — Periodically the agent checks the health of the wiki: contradictions, outdated claims, orphan pages, missing links.

A concrete ingestion example

Imagine a company feeding its own LLM Wiki. A market analysis of a competitor comes in: the agent reads it, creates an entity page for that competitor, updates the “price positioning” concept page and flags a contradiction with a figure entered six months earlier. The next day a sales-call transcript arrives: the agent links the objections that surfaced to the same competitor page, updates the customer page and notes down a decision that was made.

No one wrote a single one of these connections by hand. When, weeks later, someone asks “how do we position ourselves against that competitor?”, the answer is already there: it synthesises the market analysis, the real customer objections and the pricing history, all at once. This is the difference between storing information and building knowledge.

LLM Wiki vs RAG: why knowledge “compounds”

The key difference compared with RAG (Retrieval-Augmented Generation) is when the cognitive work happens.

With RAG, at every question the model starts from scratch: it retrieves raw fragments, reconstructs their relationships, synthesises a conclusion. It is like re-running the same calculation every time. With the LLM Wiki, by contrast, that work has already been done during ingestion: the links are already there, the contradictions have already been flagged, the synthesis already reflects everything you have read.

It is not an absolute opposition: RAG remains unbeatable on huge volumes and on data that changes in real time, while the LLM Wiki wins on consistency and cost when the knowledge base is of a manageable size. Often the best solution is to combine them. We explored when to choose one or the other in our dedicated article on LLM Wiki vs RAG.

Why businesses need it

The problem the LLM Wiki tackles is not theoretical. According to McKinsey research, knowledge workers spend about 20% of the work week — almost a full day — searching for internal information and tracking down the right colleagues (McKinsey, The social economy). That is time taken away from higher-value work.

On top of this comes the loss of institutional knowledge: when someone leaves the company, the context they carried in their head and that was written down nowhere leaves with them. A wiki maintained by an AI captures that context as it goes, instead of losing it.

There is then a third advantage, often underestimated: portability. An LLM Wiki lives in markdown files, readable by a human, versionable in Git and independent of any proprietary platform. It is open knowledge in the most practical sense of the term — you genuinely own it, you are not locked in to a vendor and you can move it whenever you want. For a company building an asset meant to last for years, that is no small detail.

Open knowledge: truly owning your own knowledge

It is worth dwelling on portability, because it touches a strategic point that is often overlooked: the difference between using knowledge and owning it.

Many knowledge-management platforms keep corporate knowledge in proprietary formats, inside databases and indexes accessible only through their software. It works as long as you remain a customer of that vendor; the day you want to switch, exporting everything cleanly is often an ordeal — and that is precisely the mechanism of lock-in.

An LLM Wiki takes the opposite approach, that of open knowledge: the knowledge lives in markdown files readable by a human, versionable in Git and independent of any platform. You can read them with a text editor, move them around, put them under version control as you do with code, and switch model or vendor without losing anything. This is not just a technical matter: for a company building an asset meant to grow for years, it means that asset stays theirs, today and in the future. It is the same logic with which development teams treat code — and applying it to knowledge is a shift in mindset before it is a shift in tooling.

How to build a corporate LLM Wiki

For a team of one, a folder and an agent are enough. For a company, the step change lies in the integration with existing systems and in respecting security, permissions and compliance.

In the Microsoft ecosystem — the one we work in every day at Dev4Side — a typical architecture connects:

  • SharePoint, Teams and OneDrive as document sources, honouring the permissions already configured;
  • Azure OpenAI as the generative model, so that data stays inside the corporate security perimeter and does not leak out to public services;
  • Azure AI Search when, beyond the synthetic wiki, you need a retrieval layer over large volumes — this is where LLM Wiki and RAG on Azure OpenAI come together.

The delicate part is not the model: it is the engineering around the model — secure connectors, permission governance, incremental updating, auditing. It is exactly the kind of work that separates an experiment from a system a company can genuinely rely on.

The limits of the LLM Wiki: when it is not the right choice

Being honest about the limits is the best way to use a technology well. The LLM Wiki is not a universal solution, and there are cases where it is not the right tool:

  • Real-time data. If the knowledge changes every minute — prices, stock availability, incoming tickets — the wiki risks always being a step behind the latest ingestion. Here RAG is a better fit.
  • Enormous volumes. When the sources number in the millions, the approach based on synthetic pages and an index shows its limits: you need a large-scale retrieval layer, typically Azure AI Search.
  • Purely transactional knowledge. If the goal is to find the single exact document (a specific contract, an invoice), the wiki’s synthesis adds little compared with a good search.

In many real-world scenarios, though, these limits are not an obstacle but a clue: they show where to combine the LLM Wiki with RAG, in a hybrid architecture that takes the best of both worlds. It is a topic we explored in our comparison LLM Wiki vs RAG.

Where to start

The LLM Wiki is not just yet another AI tool: it is a different way of thinking about corporate knowledge, as an asset that grows in value over time instead of dispersing. The pattern is simple; bringing it into production securely and integrated with your systems is an engineering project.

At Dev4Side we design and build knowledge-AI systems integrated with Microsoft 365 and Azure — we have done it for ourselves too: discover how we built an LLM Wiki for our marketing team. If you want to understand how an LLM Wiki could work in your organisation, talk to one of our experts.

Frequently asked questions

What is an LLM Wiki? An LLM Wiki is a knowledge base made of markdown files that an AI agent builds and keeps up to date from an organisation’s raw documents. Unlike RAG, which searches the documents at every question, the LLM Wiki compiles the knowledge once during ingestion, creating synthetic pages that are linked to one another.

Who invented the LLM Wiki? The pattern was popularised by Andrej Karpathy, who on 4 April 2026 published an “idea file” on GitHub describing the three-layer architecture (raw sources, AI-generated wiki, schema file) and the three core operations: ingest, query and lint.

What is the difference between an LLM Wiki and RAG? RAG retrieves fragments from the documents at every query and rebuilds the context from scratch each time. The LLM Wiki, by contrast, synthesises and links the knowledge once during ingestion: the links, contradictions and summaries are already in place. The two techniques are often combined.

Is an LLM Wiki suitable for a business? Yes, especially as an internal knowledge base over documents that change at a moderate pace. In an enterprise setting it needs to be integrated with existing systems (SharePoint, Teams, Azure) and with security and permission controls: it is exactly the kind of project a partner like Dev4Side can build to fit your needs.

What technologies is an LLM Wiki built on in a business? Typically on versioned markdown files (even just in a Git repository), with an AI agent that maintains them. In a Microsoft context it integrates with SharePoint and Teams as sources, Azure OpenAI as the model and, where retrieval over large volumes is needed, Azure AI Search.

Miro Radenovic

Written by

Miro Radenovic

Modern AI Apps · Dev4Side

Dev4Side Software · Microsoft Gold Partner

Need help implementing this in your company?

Our specialist teams have delivered 200+ Microsoft implementations across Italy. Contact us for a free, no-obligation evaluation of your project.