Journal llms.txt 6 min read
llms.txt: the complete guide for 2026 (with examples)
llms.txt is a plain-text file at /llms.txt that tells AI crawlers what to ingest first. Here's what the spec actually says, what it does and doesn't do, the format, and a working template you can ship today.
In this article
llms.txt is a proposal for a plain-text file served at /llms.txt that lists, in priority order, the URLs an AI assistant should ingest to understand your site. It was proposed by Jeremy Howard in September 2024 and has since been adopted by Anthropic, Mintlify, Cloudflare, Stripe, and a growing list of brands that care about how they appear inside AI answers.
This is the working guide we use at Citable when we ship llms.txt as part of a Technical SEO Sprint. It explains what the spec actually says, what the file does and does not do, the exact format, and a template you can copy.
What llms.txt is — and what it isn’t
llms.txt is a content map for AI consumption. It is a Markdown file that points to the URLs you want models to read first when summarizing your site, answering questions about your brand, or deciding whether to cite you. The format is intentionally minimal: H1 brand name, blockquote summary, sections of links with one-line descriptions, optional appendix.
It is not robots.txt. It does not block crawlers. It does not enforce policy. A model is not obligated to respect it — and as of mid-2026, no major AI provider has confirmed they read it during inference.
That last point is the most common objection: if no one promises to read it, why ship it? Three reasons.
- Anthropic publishes one for the Claude documentation. Mintlify built llms.txt support into their docs platform. Cloudflare ships one for their developer site. The brands closest to the model providers are betting it matters.
- It is a costless statement of editorial intent. Even if Claude or GPT do not parse
/llms.txtdirectly, the file becomes part of your public site map. Web archives index it. Researchers training future models ingest it. The next generation of crawlers may default to checking it. - It forces you to write the right summary. Most brands have never compressed their site into “here are the 12 URLs an AI should read to understand who we are and what we do.” The discipline of writing llms.txt is itself the work.
The format, line by line
The spec calls for a specific structure. Here is the canonical shape, with annotations.
# Citable
> Citable is a boutique GEO and SEO agency that measures and grows brand
> presence inside ChatGPT, Perplexity, Gemini, and Google AI Overviews.
> We work with bilingual EN/ES B2B brands across Europe, the UK, and the US.
This file lists priority URLs for AI ingestion. For the full text of the
priority pages, see /llms-full.txt.
## Services
- [GEO](https://citable.agency/services/geo): Generative Engine Optimization measurement and remediation.
- [Technical SEO](https://citable.agency/services/technical-seo): Schema, llms.txt, Core Web Vitals, crawler access.
- [Web Development](https://citable.agency/services/web-development): Bilingual Astro sites built for AI search from day one.
## Methodology
- [Methodology overview](https://citable.agency/methodology): Three-phase Measure → Repair → Compound process.
- [Pricing](https://citable.agency/pricing): Audit, sprint, and retainer ranges.
## Pillar content
- [What is GEO?](https://citable.agency/journal/what-is-geo-2026): Working definition for 2026.
- [llms.txt complete guide](https://citable.agency/journal/llms-txt-the-complete-guide): This file's source.
## Optional
- [About](https://citable.agency/about): Founder and team.
- [Audit](https://citable.agency/audit): Paid AI Visibility Audit (1,200 EUR).
A few rules the spec is strict about:
- Single H1, the brand name. No tagline.
- One blockquote as the site summary. One paragraph. Plain prose, no marketing copy.
- Sections are H2. Group by intent, not URL hierarchy.
- Each link is one bullet with the URL and a one-line description. The description is what the model reads to decide whether to fetch the URL.
- Optional content goes under an
## OptionalH2 at the end.
There is also a sibling spec, llms-full.txt, which contains the full text of the priority pages concatenated into a single file. Useful for models that can ingest a full corpus in one fetch. Most brands ship both.
What signals AI models actually use
Even brands that ship llms.txt should know that it is not the only mechanism — and probably not the dominant one — by which AI models learn about a site. The signals that compound, in our measurement at Citable, are:
- Schema markup.
Organization,Person,Service,Article,FAQPage. These are parsed reliably by Google AI Overviews and are visible in Perplexity’s search results pages. - Citations from high-authority third-party sources. When TechCrunch, The Verge, or a category-specific publication writes about your brand, that text becomes part of the corpus the model retrieves at inference time. This is digital PR with a new mandate.
- Extractable on-page structure. Clear H1/H2 hierarchy, definitional sentences (“X is the discipline of…”), short paragraphs. Models extract sentences, not pages.
- Crawler access. ChatGPT-User, PerplexityBot, ClaudeBot must be allowed in robots.txt. Many sites block them by default and never check.
- llms.txt and llms-full.txt. As editorial intent and as a corpus shortcut.
llms.txt is signal #5. It compounds with the others. Shipping only llms.txt without the rest is theatre.
Common mistakes we see
After auditing dozens of AI-search-aware sites, the same mistakes keep showing up.
- Listing every URL. llms.txt is curation, not a sitemap. If you list 200 URLs, the model has no signal about what matters. Ten well-chosen URLs beats two hundred.
- Marketing voice in the blockquote. “Industry-leading solutions for…” is noise. Write plainly. “Citable is a boutique GEO agency for B2B brands in Europe, the UK, and the US.” Done.
- Missing the description after the link. A bare URL teaches the model nothing about what is at that URL. Always include a one-line description.
- Hosting at the wrong path. It must be
/llms.txt, served at the root, withContent-Type: text/plain. Not/static/llms.txt. Not behind a redirect. - Forgetting about updates. A stale llms.txt linking to deleted pages is worse than none. Quarterly review goes in the calendar.
Should you ship llms.txt?
If you are a B2B brand whose buyers research with AI, yes. The file is small, the cost is one engineer-afternoon, the worst case is no signal lift, and the best case is faster, more accurate citation by the next generation of models. The downside risk is essentially zero.
If you are an ecommerce brand selling to consumers, the calculus is different — consumers rarely use Perplexity to choose between two pairs of running shoes, and your AI search exposure today is mostly Google AI Overviews, which leans on schema and rich snippets more than llms.txt. Ship it anyway, but earlier in the priority list put schema and Merchant Center.
The decision is rarely about whether to ship llms.txt. It is about whether the rest of your AI search stack is in order, and whether llms.txt is the next-best dollar to spend or the third-best.
Working template
Copy this, replace the placeholders, save as public/llms.txt (or wherever your build serves the site root from), redeploy. Verify at https://yourdomain.com/llms.txt returns HTTP 200 with Content-Type: text/plain.
# [Brand Name]
> [One paragraph describing what the company does, who it serves, and where.
> Plain prose, no marketing voice.]
## [Section 1 — usually Services or Products]
- [Page Title](https://example.com/page-1): One-line description of what this page covers.
- [Page Title](https://example.com/page-2): One-line description.
## [Section 2 — usually Methodology, Pricing, or Documentation]
- [Page Title](https://example.com/page-3): One-line description.
## Optional
- [About](https://example.com/about): Founder and team background.
If you are a Citable client, this ships as part of every Technical SEO Sprint. If you are not, this guide and the checklist below are everything you need to do it yourself.
Frequently asked
Questions buyers ask before booking
Do AI models actually read llms.txt?
No major AI provider has confirmed they parse /llms.txt during inference as of mid-2026. But the brands closest to model providers — Anthropic, Mintlify, Cloudflare — ship one anyway. The cost is low and the file becomes part of your public site map for archives, researchers, and the next generation of crawlers.
Where should I host llms.txt?
At your site root, served at https://yourdomain.com/llms.txt with HTTP 200 and Content-Type text/plain. Not under /static/, not behind a redirect. The spec is strict about path and Content-Type.
How is llms.txt different from robots.txt?
robots.txt is a policy file telling crawlers what they may or may not access. llms.txt is an editorial curation pointing AI consumers at the most important content. They are complementary — and you should cross-reference llms.txt from robots.txt with a comment line.
Should an ecommerce site ship llms.txt?
Yes, but it is lower priority than for B2B. Consumer ecommerce buyers rarely use Perplexity to choose between products, so AI search exposure is mostly Google AI Overviews — which leans on schema and Merchant Center more than llms.txt. Ship it, but earlier in the priority list put schema and feed quality.
10 minutes to ship, but only if every line is right
llms.txt shipping checklist
- File served at https://yourdomain.com/llms.txt with HTTP 200 and Content-Type text/plain
- Single H1 with the canonical brand name on line 1
- Blockquote summary describing what the site is, in one paragraph
- Sections grouped by intent (Docs, Pricing, Methodology, etc.) — not by URL structure
- Each link uses absolute URLs and includes a one-line description
- Optional sections clearly marked under an `## Optional` heading
- Cross-linked from robots.txt as `# llms.txt: https://yourdomain.com/llms.txt`
- JSON-LD `Organization` and `WebSite` schema present and consistent with llms.txt
- Site has a separate llms-full.txt with the full text of priority pages
- Quarterly review in calendar — rotate stale links out, add new pillar content