By Meiko Neuman, FounderApril 29, 20265 min readGEOllms.txtAI Discoverability

llms.txt Explained: What AI Crawlers Actually Read on Your Site

A markdown file at the root of your site decides whether ChatGPT, Claude, and Perplexity summarize you correctly. Here's what llms.txt is, what to put in it, and the common mistakes that make it useless.

A single markdown file at https://yourdomain.com/llms.txt decides whether AI engines understand your business correctly when they cite it. The file is small, the convention is loose, and most sites still don’t have one — which means the few that do get a disproportionate share of the citations.

What llms.txt is

llms.txt is a plain-text markdown file at the root of your site, modeled loosely after robots.txt and the way a developer-friendly README is structured. Its job is to give a large language model a fast, deterministic way to understand:

What your site is about, in one sentence
The handful of URLs that are actually worth reading
How to contact the people behind it

It is not a replacement for sitemap.xml. Sitemap is for search engine crawlers and lists every indexable URL. llms.txt is for LLMs and lists the few pages that compress well into context. We compare them in detail in Sitemap.xml vs llms.txt — do you need both? (yes).

Where it came from

The convention was proposed by Jeremy Howard (fast.ai, Answer.AI) in September 2024 and adopted within months by Anthropic, Vercel, Mintlify, Cloudflare, and a long tail of SaaS docs sites. There is no IETF RFC and no W3C blessing — it is a de facto standard that LLM tooling now expects to find.

The anatomy of a good llms.txt

The format is loose, but the four-section pattern below is what every reference implementation lands on:

# Site Name

> One-sentence description. Be specific. The model reads this first.

## What this site does

- Bullet list of 3-6 items.
- Concrete capabilities, not marketing adjectives.

## Key pages

- [Page name](https://example.com/page) — short note about what's there.
- [Another page](https://example.com/other)

## Contact

- Email: [email protected]

The mistakes that make it useless

1. Marketing fluff at the top

“We are a forward-thinking team passionate about empowering businesses to…” is the kind of opener that wastes the only sentence the LLM is guaranteed to read. Lead with what you are and what you do. Specific nouns and verbs only.

2. A 200-line file

The whole point is that an LLM can read it in one shot. If your llms.txt requires multiple context calls to digest, you have rebuilt sitemap.xml in markdown. Keep it under 60 lines. Link out to llms-full.txt for full content.

3. Linking to internal-only or paywalled URLs

LLMs follow the links. If they hit a 404 or a login wall, they deprioritize the entire file. Every URL listed should be publicly readable.

4. Forgetting it exists

Out-of-date llms.txt files are worse than missing ones. If your pricing changed in March and your llms.txt still references the old plans, the LLM will confidently quote stale numbers. Tie regeneration of the file to your deploy pipeline.

llms-full.txt: the bigger sibling

Where llms.txt is a navigation index, llms-full.txtis the full corpus of marketing content, concatenated as plain markdown. Useful when an AI agent wants to ingest your entire pitch in one fetch rather than crawling page-by-page. We generate it dynamically from the same data modules our pages render, so the LLM-readable copy can never drift from what humans see — you can read ours here.

How to verify it’s working

curl https://yourdomain.com/llms.txt— should return 200 with content-type: text/plain or text/markdown.
Ask Perplexity or ChatGPT a question whose answer is in your llms.txt. Compare the cited summary to your file.
Check your server logs for hits from GPTBot, ClaudeBot, PerplexityBot, Google-Extended — they should be reaching /llms.txt regularly.

Every site SOSEI rebuilds ships llms.txt and llms-full.txt by default, both regenerated on every deploy from the live site content. Want to see how your current site scores on AI discoverability? Get started with SOSEI — the GEO category covers llms.txt presence, format, and freshness directly.