Skip to content
SOSEI
4 min readSEOGEOllms.txtSitemap

Sitemap.xml vs llms.txt: Do You Need Both?

Both files sit at the root of your site. Both are about discovery. They are not redundant — they target different consumers and they fail in different ways. Yes, you need both.

Both files live at the root of your domain. Both are about discoverability. So a reasonable question gets asked a lot: do I need both, or does one make the other redundant? Short answer: keep both. Here is why.

Different consumers, different formats

sitemap.xmlllms.txt
ConsumerSearch engine crawlers (Google, Bing)LLM ingestion pipelines (ChatGPT, Claude, Perplexity)
FormatMachine-readable XML, schema-rigidHuman-readable Markdown, loosely structured
ScopeEvery indexable URL on the siteThe handful of URLs that summarize what the site is
Required byGoogle Search Console, Bing Webmaster ToolsDe facto convention; expected by AI tooling
Originsitemaps.org protocol, 2005Jeremy Howard / Answer.AI proposal, 2024

Sitemap.xml: the URL inventory

The sitemap protocol exists so that a crawler can discover every URL on your site without depending on link-graph traversal — useful for deep pages, freshly published content, and sites with weak internal linking. A typical entry:

<url>
  <loc>https://example.com/pricing</loc>
  <lastmod>2026-04-29</lastmod>
  <changefreq>weekly</changefreq>
  <priority>0.9</priority>
</url>

Google reads sitemap.xml as a hint for crawl budget allocation, not a guarantee of indexing. Bing treats it more literally and will fetch every listed URL within days. Either way: missing or stale sitemap = slower indexing of new content.

llms.txt: the editorial index

Where sitemap is mechanical, llms.txt is editorial. You list the URLs an LLM should read first to understand what your site is. A skeleton:

# Site Name

> One-sentence description. The model reads this first.

## Key pages

- [Pricing](https://example.com/pricing) — plans and what each includes
- [How it works](https://example.com/how-it-works) — 6-step explanation

Full anatomy and common mistakes: llms.txt explained.

Why one does not replace the other

It is tempting to think “the LLM can read sitemap.xml, so I do not need llms.txt.” In practice, two things break that:

  1. LLMs do not read XML well. They can parse it, but the structure is information-poor: every URL is equal, no hierarchy, no editorial signal. llms.txt tells the model which pages matter. Sitemap is an inventory; llms.txt is a recommendation.
  2. Search engines do not always read llms.txt. Googlebot reads sitemaps as a documented part of the protocol; it does not reliably follow llms.txt links. Drop sitemap.xml and your indexing latency degrades immediately.

The third file: llms-full.txt

Increasingly common alongside the index: llms-full.txt contains the entire marketing content of the site concatenated as markdown. Useful when an LLM agent wants to ingest the whole pitch in one fetch rather than crawling page-by-page. It is the equivalent of handing the model a printed brochure instead of a table of contents.

What to ship in 2026

  • sitemap.xml— auto-generated, every indexable URL, regenerated on deploy. Submitted to GSC and Bing Webmaster.
  • robots.txt— references the sitemap, names AI crawlers explicitly.
  • llms.txt— navigational, under 60 lines, kept editorial.
  • llms-full.txt— full content corpus, generated from the same source-of-truth as the rendered pages so it cannot drift.

Every SOSEI rebuild ships all four, regenerated on every deploy. Get started with SOSEI to see what your current site is missing.

Stop losing customers to a 2018 website.

Every day on outdated tech is leads walking past your front door. Get the free 40-point audit — see exactly what's broken across SEO, AI-discoverability, WCAG, GDPR, mobile, performance, and design. No signup. Results in seconds.

See your site's score