**An XML sitemap is the map you hand crawlers — a list of the URLs you want found and when they last changed.** This check confirms your sitemap is reachable (a `200` at a common location like `/sitemap.xml`) and that it's well-formed XML with a valid `<urlset>` or `<sitemapindex>`. A missing or malformed sitemap slows down how quickly engines discover your new and updated pages.
It looks for your XML sitemap and confirms it's usable by crawlers. Specifically:
- Reachable — the sitemap returns a 200 at a common location (such as /sitemap.xml or a sitemap index).
- Well-formed XML — it parses as valid XML with a proper <urlset> (a list of URLs) or <sitemapindex> (a list of sitemaps).
- Sensible contents — the URLs it lists should be canonical, indexable 200 pages, not redirects, errors or noindex URLs.
A reachable, valid sitemap passes; one that exists but is malformed or partial is a warning; no sitemap found at common locations is a fail.
GEObubbly fetches the sitemap at common paths (and via the Sitemap: line in robots.txt), confirms a 200 response, and parses it to check for a valid <urlset> or <sitemapindex> structure. It's a core check worth 2 points in Crawlability & Indexability — lower weight than the indexing signals because a sitemap aids discovery rather than gating it.
A sitemap doesn't force indexing, but it dramatically speeds up discovery — it tells crawlers which URLs you consider important and when they last changed, so new and updated pages get found and re-crawled sooner. That matters most for large sites, fresh content, and pages with few internal links that crawlers might otherwise miss. A malformed sitemap is worse than none, because errors can cause engines to ignore it or trust it less; and a sitemap full of redirects, errors or noindex URLs sends mixed signals about which pages you actually want indexed. For GEO, faster discovery means your latest, freshest content reaches AI engines sooner — and freshness is something answer engines weight heavily. Keep the sitemap valid, limited to canonical indexable URLs, and declared in robots.txt so it's easy to find. It complements the rest of the Crawlability & Indexability signals.
An XML sitemap is a machine-readable file (usually at /sitemap.xml) that lists the URLs on your site you want search engines and AI crawlers to find, often with each page's last-modified date. Its purpose is discovery: it tells crawlers which pages exist and which are important, so they're found and re-crawled faster. It's especially valuable for large sites, brand-new pages, and pages that aren't well linked internally and might otherwise be missed.
No — search engines can and do index pages without a sitemap, primarily by following links. A sitemap is a strong aid to discovery rather than a requirement, which is why a missing one is treated as a soft issue rather than a hard failure. That said, a valid sitemap materially speeds up how quickly new and updated pages are found, and it's close to essential for large sites or pages with weak internal linking. There's little downside to having a clean one.
Place it at a predictable location such as yourdomain.com/sitemap.xml (or a sitemap index that points to several sitemaps), and declare it with a Sitemap: line in your robots.txt so crawlers discover it automatically. You can also submit it directly in search-engine tools like Google Search Console and Bing Webmaster Tools. Keeping it referenced in robots.txt is the simplest way to ensure both search and AI crawlers can find it without you submitting it anywhere.
Include only the URLs you want indexed: canonical, indexable pages that return a clean 200. Leave out redirects, error pages, noindex URLs, duplicate parameterised versions and pages blocked by robots.txt — listing those sends mixed signals about which pages matter. Keep last-modified dates accurate so crawlers know what's changed. For very large sites, split URLs across multiple sitemaps referenced by a sitemap index, staying within the per-file limits of 50,000 URLs or 50 MB.
Sitemaps primarily help traditional search crawlers discover and re-crawl pages quickly, and that faster discovery benefits AI visibility indirectly: the sooner your fresh, updated content is crawled and indexed, the sooner it can surface to and be cited by AI answer engines, which weight freshness heavily. While AI crawlers rely heavily on links and their own discovery, a clean sitemap that surfaces new content fast supports the freshness signals that matter for GEO as well as classic search.