H2 · Infrastructure, Bots & DNS

AI Bot Access Checker — is your CDN blocking AI crawlers?

**Your CDN or edge can silently block AI crawlers before they ever reach your robots.txt — quietly cutting you out of AI search.** This check looks at whether your edge or bot-management layer is allowing or blocking AI crawlers like GPTBot and PerplexityBot. Aggressive default bot rules often treat these as unwanted traffic, so you can be GEO-ready on-page yet invisible to the engines you want to be cited by.

What does the edge / CDN bot-handling check look for?

It looks at whether your infrastructure layer permits the AI crawlers you want, separate from what your robots.txt says. Specifically:

- AI crawler access — whether bots like GPTBot, PerplexityBot, ClaudeBot and Google-Extended appear to be allowed through.

- Edge / WAF blocking — whether CDN bot-management or firewall rules are challenging or blocking these crawlers.

- Consistency with intent — whether the edge behaviour matches your robots.txt and your goal of being AI-visible.

AI crawlers allowed at the edge passes; some friction or partial blocking is a warning; AI crawlers blocked before reaching the site is a fail.

How is it evaluated, and how is it scored?

GEObubbly probes how your edge or CDN responds to AI-crawler user-agents. It's a core, scored Infrastructure check that runs server-side, since edge behaviour is observed from the live responses.

Criteria: Pass — AI UAs get a normal 200. Warning — some challenged or throttled. Fail — edge returns 403 / challenge to AI crawlers.

Why edge/CDN AI-bot handling matters for GEO

There's a layer in front of your site — your CDN, edge network or web application firewall (WAF) — that decides which requests even reach your origin, and its bot-management rules operate independently of your robots.txt. This is a uniquely GEO-relevant trap: you can do everything right on-page and in robots.txt, yet still be invisible to AI engines if your edge is blocking their crawlers. Many bot-management products treat unfamiliar or non-search crawlers as unwanted scraping by default, so AI crawlers like GPTBot (OpenAI), PerplexityBot, ClaudeBot and Google-Extended can be silently challenged or blocked at the edge before they ever see your content. The result is that your pages never get crawled by the engines you want to be cited by — a silent, easy-to-miss exclusion from AI search. The fix is to review your CDN/WAF bot rules and explicitly allow the AI crawlers you want, while still blocking genuinely abusive traffic. This is a deliberate business decision — some sites choose to block AI crawlers — but it should be a choice, not an accidental default.

How this check scores

  • Pass: All tested AI bots reach the origin with a 200 response.
  • Warning: One or two AI bots are challenged or rate-limited; the rest pass.
  • Fail: An AI user-agent is returned 403 / 503 / a JS challenge by the edge layer.

FAQ

Can my CDN block AI crawlers even if robots.txt allows them?

Yes — and this is a common, easily-missed problem. Your CDN, edge network or web application firewall sits in front of your origin and decides which requests get through, using its own bot-management rules that operate independently of your robots.txt. So even if your robots.txt explicitly allows GPTBot or PerplexityBot, an edge rule can challenge or block those crawlers before they ever reach your site or read your robots.txt. To be truly accessible to AI crawlers, you need to allow them both in robots.txt and at the edge/CDN layer.

Which AI crawlers should I make sure can access my site?

The major AI crawlers to consider include GPTBot (OpenAI, used for ChatGPT), PerplexityBot (Perplexity), ClaudeBot (Anthropic), and Google-Extended (which governs whether Google can use your content for its AI features). There are others, and the list evolves as the AI search landscape changes. If your goal is to be discovered and cited by AI answer engines, you want these crawlers able to reach and read your content — both permitted in robots.txt and not blocked by your CDN or firewall. Identify the engines you care about and verify their crawlers aren't being filtered out.

Why would a CDN block AI crawlers by default?

Many CDN and bot-management products are designed to block scraping and automated traffic to protect sites from abuse, scraping of content, and load — and their default rules often treat unfamiliar or non-search-engine crawlers as unwanted. Because AI crawlers are relatively new and don't always fit the allowlist of recognised search bots, they can get caught by these aggressive defaults and challenged or blocked automatically. It's rarely a deliberate decision to exclude AI engines; it's a side effect of bot protection erring on the side of blocking unknown automated traffic, which is exactly why it's worth checking explicitly.

How do I allow AI crawlers at the edge or CDN?

Review your CDN or WAF's bot-management settings and add explicit allow rules for the AI crawler user-agents you want to permit, so they're not caught by general bot-blocking rules. The exact steps depend on your provider, but most bot-management dashboards let you allowlist specific bots or user-agents. Verify the change by checking how your edge responds to those crawlers' user-agents. At the same time, keep blocking genuinely abusive traffic — the goal is to distinguish the AI crawlers you want from the scraping you don't, rather than opening up to everything.

Should I block or allow AI crawlers?

That's a genuine business decision, not a one-size-fits-all answer. Allowing AI crawlers makes your content eligible to be discovered, summarised and cited by AI answer engines, which is increasingly a source of visibility and referral; blocking them protects your content from being used in AI outputs but forfeits that visibility. Some publishers deliberately block AI crawlers, others actively court them. The important thing is that it should be a deliberate choice aligned with your strategy — not an accidental result of default CDN rules silently excluding crawlers you actually wanted to reach you.

Audit your page across all 9 checks in Infrastructure, Bots & DNS

Run a free audit →