A1 · Crawlability & Indexability

Indexability Checker — is your page eligible to be indexed?

Q: Why is my page not being indexed even though it returns a 200 status?

A 200 OK status only means the server responded — it does not guarantee indexing. The page can still be blocked by a noindex in the meta robots tag, a noindex in the X-Robots-Tag HTTP header (which is invisible in the page source), a canonical tag pointing to a different URL, a Disallow in robots.txt, or a soft-404 where a 200 is returned but the content is an error. The indexability verdict checks all of these at once, which is why it catches problems a status-code check alone would miss.

Q: What is the difference between crawlability and indexability?

Crawlability is whether a search engine or AI bot can reach and read a page; indexability is whether, having read it, the engine is allowed to store it and show it in results. The two are independent: a page can be fully crawlable but carry a noindex tag (read but never shown), and a page blocked in robots.txt can still be indexed from external links — just without its content. You need both to be clear for a page to appear, and the indexability verdict focuses specifically on the second half of that equation.

Q: Do AI crawlers like GPTBot respect noindex and robots.txt?

The major AI crawlers — GPTBot, ClaudeBot, PerplexityBot and Google-Extended — generally respect robots.txt, so a Disallow will keep them out. noindex is primarily a search-indexing directive and AI crawlers treat it inconsistently. For AI visibility the bigger risks are being blocked in robots.txt or silently at the CDN edge, or having content that only renders with JavaScript, which most AI crawlers do not execute. The indexability verdict covers the indexing signals; pair it with the AI-bot access and JavaScript-rendering checks for full coverage.

**The indexability verdict is a single yes/no answer to one question: is this page allowed to be indexed by search engines and read by AI crawlers?** GEObubbly produces it by combining five independent signals — the HTTP status code, `robots.txt`, the meta robots tag, the `X-Robots-Tag` HTTP header, and the canonical tag — into one verdict. If any of them blocks indexing, the page won't appear in search results or be cited in AI answers, no matter how good its content is.

What does the indexability verdict check?

It answers one question with a single verdict: is this page eligible to be indexed and shown? Rather than testing one signal at a time, it rolls up five independent indexing signals, because each can block a page on its own:

- HTTP status — the page must return a genuine 200 OK, not a 4xx, 5xx or a soft-404.

- robots.txt — no Disallow rule may block the URL or its critical resources.

- Meta robots tag — no noindex in the page's <meta name="robots">.

- X-Robots-Tag header — no noindex in the HTTP response header (invisible in the page source).

- Canonical tag — the canonical must point to this page or the correct primary version, not a blocked or redirecting URL.

If all five are clear, the page is indexable. If any one blocks indexing, the verdict fails.

How is it evaluated, and how is it scored?

GEObubbly fetches the page and its robots.txt, reads the response headers and the rendered <head>, follows the canonical, and combines the five signals into one verdict. It's a core, automatically-run check carrying a weight of 5 points — the highest in the Crawlability & Indexability category — because nothing else you do matters if the page can't be indexed in the first place.

Why indexability matters for SEO and GEO

Indexability is upstream of everything. A page that can't be indexed scores zero where it counts: it won't rank in search, and it won't be available for AI answer engines like ChatGPT, Perplexity and Google AI Overviews to read and cite. The most common cause is an accidental noindex — left over from a staging environment, applied by a CMS setting, or hidden in an X-Robots-Tag header nobody checks. Because the failure is silent, it can suppress a page for months before anyone notices. Getting the indexability verdict to pass is the first, non-negotiable step toward both search rankings and AI citations — from there, the rest of the Crawlability & Indexability checks and the heavier GEO / LLM Readiness signals build on a page that can actually be seen.

How this check scores

Pass: every signal clear — the page is eligible for indexing.
Warning: indexable, but a soft signal is off (e.g. nofollow).
Fail: a noindex/block from any source makes the page non-indexable.

FAQ

What is an indexability verdict?

An indexability verdict is a single, combined answer to the question "can this page be indexed and shown by search engines and AI engines?" Instead of checking one signal in isolation, GEObubbly evaluates the HTTP status code, robots.txt rules, the meta robots tag, the X-Robots-Tag header and the canonical tag together, because any one of them can silently block indexing. The verdict tells you in one line whether the page is eligible to appear in results — the prerequisite for everything else in SEO and GEO.

Why is my page not being indexed even though it returns a 200 status?

A 200 OK status only means the server responded — it does not guarantee indexing. The page can still be blocked by a noindex in the meta robots tag, a noindex in the X-Robots-Tag HTTP header (which is invisible in the page source), a canonical tag pointing to a different URL, a Disallow in robots.txt, or a soft-404 where a 200 is returned but the content is an error. The indexability verdict checks all of these at once, which is why it catches problems a status-code check alone would miss.

What is the difference between crawlability and indexability?

Crawlability is whether a search engine or AI bot can reach and read a page; indexability is whether, having read it, the engine is allowed to store it and show it in results. The two are independent: a page can be fully crawlable but carry a noindex tag (read but never shown), and a page blocked in robots.txt can still be indexed from external links — just without its content. You need both to be clear for a page to appear, and the indexability verdict focuses specifically on the second half of that equation.

How do I fix a page that is not indexable?

Work through the five signals the verdict combines. Confirm the page returns a genuine 200 (not a 4xx, 5xx or soft-404); remove any noindex from the meta robots tag and from the X-Robots-Tag response header; make sure robots.txt does not Disallow the URL; and check that the canonical tag points to this page (or the correct primary version) rather than a blocked or redirecting URL. Fix whichever signal is failing, then re-run the check — and allow time for engines to re-crawl before the change is reflected.

Do AI crawlers like GPTBot respect noindex and robots.txt?

The major AI crawlers — GPTBot, ClaudeBot, PerplexityBot and Google-Extended — generally respect robots.txt, so a Disallow will keep them out. noindex is primarily a search-indexing directive and AI crawlers treat it inconsistently. For AI visibility the bigger risks are being blocked in robots.txt or silently at the CDN edge, or having content that only renders with JavaScript, which most AI crawlers do not execute. The indexability verdict covers the indexing signals; pair it with the AI-bot access and JavaScript-rendering checks for full coverage.

Audit your page across all 14 checks in Crawlability & Indexability

Run a free audit →