🤖 Bulk User Agent Checker

Audit how websites respond to AI, search engine, and SEO bot user agents — in bulk

Status Code Reference

Code	Meaning	Severity	What to Check / What to Do
200	Allowed	✅ Good	Bot can access this URL normally. No action needed.
301 / 302	Redirect	ℹ Info	URL redirects elsewhere. Common for HTTP→HTTPS or www→non-www. Most bots follow redirects — check the destination URL still returns 200. Excessive redirect chains slow crawling.
403	Forbidden / Blocked	⚠ High	Server is actively rejecting this bot's User-Agent. Check: (1) robots.txt Disallow rules, (2) Cloudflare WAF/Bot Management settings, (3) .htaccess or nginx UA blocking rules, (4) CDN security configs. If blocking a search engine bot this is a critical SEO issue.
404	Not Found	⚠ Medium	URL doesn't exist. Verify the URL is correct. If it existed before, check for redirects or content removal. Bots will deindex 404 pages over time.
429	Rate Limited	⚠ Medium	Too many requests too fast. Increase the delay setting in this tool. For the real bot, review crawl rate settings in Google Search Console or the bot's dashboard.
451	Legal Block	⚠ Medium	Content blocked for legal reasons (GDPR, CCPA, court order). Usually geo-restricted. Check if this is intentional — if not, review your geo-blocking rules or CDN configuration.
500 / 502 / 503	Server Error	⚠ High	Server-side failure not specific to the bot. Check server health, error logs, memory/CPU usage. Bots may back off and retry later. Persistent 5xx causes deindexing.
TIMEOUT	No Response	⚠ High	Connection opened but no response within 10s. Could be: geo/IP blocking at network level, DDoS protection dropping the connection, very slow server, or firewall silently dropping bot traffic.
ERR	Network Error	🚨 Check	Complete connection failure — DNS didn't resolve, server unreachable, or deep firewall block. Verify the domain is live and DNS is propagated. This tool uses a proxy so some networks may cause false positives.

By Bot — What a Block Means & What to Do

Search Googlebot / Googlebot-Mobile 🚨 Critical

🚫 If you want to block

You almost certainly don't — blocking Googlebot means your site won't appear in Google Search at all
Only block specific paths (e.g. /admin/) never the whole site

✅ If it's showing blocked (and you don't want that)

Cloudflare → Security → Bots → disable "Block AI Scrapers" (can catch Googlebot)
Check robots.txt for Disallow: / under User-agent: * or User-agent: Googlebot
Check server firewall / .htaccess for UA-based deny rules
Verify in Google Search Console → Settings → Crawl Stats

Search Bingbot / DuckDuckBot ⚠ High

🚫 If you want to block

Add to robots.txt: User-agent: bingbot + Disallow: /
Note: blocking Bingbot also removes you from Yahoo and MSN results

✅ If it's showing blocked (and you don't want that)

Check robots.txt for Bingbot-specific Disallow rules
Check Bing Webmaster Tools for crawl error reports
Check Cloudflare WAF for User-Agent rules catching bingbot

Search Baiduspider ⚠ Regional

🚫 If you want to block

Add to robots.txt: User-agent: Baiduspider + Disallow: /
Baidu doesn't always honour robots.txt — for reliable blocking use Cloudflare WAF: User-Agent contains Baiduspider

✅ If it's showing blocked (and you don't want that)

Only matters if you target Chinese-language audiences
Check server/WAF for geo-blocking rules (China IPs are often blocked by default)
Check robots.txt for User-agent: Baiduspider Disallow

Search NaverBot ℹ Regional

🚫 If you want to block

Add to robots.txt: User-agent: Naverbot + Disallow: /

✅ If it's showing blocked (and you don't want that)

Only relevant if you target South Korean audiences (Naver = ~60% Korean search share)
Check WAF/firewall for Asian IP geo-blocks catching Naver's crawlers

AI GPTBot / ChatGPT-User ℹ Your choice

ℹ These are two different bots

GPTBot — collects training data for OpenAI models
ChatGPT-User — real-time browsing; blocking means ChatGPT won't cite your site in answers

🚫 If you want to block

User-agent: GPTBot + Disallow: / in robots.txt
User-agent: ChatGPT-User + Disallow: / in robots.txt
Also add <meta name="robots" content="noai, noimageai"> to block image training

✅ If it's showing blocked (and you don't want that)

Check robots.txt for GPTBot/ChatGPT-User Disallow entries
Check Cloudflare → WAF for rules matching OpenAI UA strings

AI OAI-SearchBot ⚠ Easy to miss

ℹ Different from GPTBot

Powers ChatGPT's live web search — not training data. Many sites block GPTBot and forget this one
Blocking = your site won't appear as a source in ChatGPT search results

🚫 If you want to block

User-agent: OAI-SearchBot + Disallow: / in robots.txt

✅ If it's showing blocked (and you don't want that)

Check robots.txt — a User-agent: * Disallow will catch this bot
Check Cloudflare WAF for broad OpenAI/bot UA rules

AI Google-Extended ℹ Safe to block

🚫 If you want to block

Blocks Gemini/Bard AI training — does NOT affect regular Google Search crawling or indexing
User-agent: Google-Extended + Disallow: / in robots.txt
Safest AI opt-out — zero SEO risk

✅ If it's showing blocked (and you don't want that)

Remove any User-agent: Google-Extended Disallow from robots.txt
Check WAF rules catching Google-Extended UA

AI ClaudeBot ℹ Your choice

🚫 If you want to block

User-agent: ClaudeBot + Disallow: / in robots.txt
Anthropic states it respects robots.txt opt-outs

✅ If it's showing blocked (and you don't want that)

Check robots.txt for ClaudeBot Disallow entries or broad User-agent: * blocks
Allowing means Anthropic may use your content for Claude training

AI Bytespider (TikTok / ByteDance) ⚠ Consider blocking

🚫 If you want to block

Many sites block due to privacy/data sovereignty concerns (ByteDance is Chinese-owned)
Does not reliably respect robots.txt — use Cloudflare WAF instead
WAF rule: User-Agent contains Bytespider

✅ If it's showing blocked (and you don't want that)

Check WAF rules for Bytespider or broad bot-blocking rules
Check if geo-blocking of Chinese IPs is enabled

AI CCBot (Common Crawl) ℹ Your choice

🚫 If you want to block

CCBot feeds training data for GPT-3, LLaMA, and many open-source AI models
Blocking it indirectly blocks multiple AI pipelines at once
User-agent: CCBot + Disallow: / in robots.txt

✅ If it's showing blocked (and you don't want that)

Check robots.txt for CCBot Disallow entries
Allowing contributes to open-source AI datasets

AI Amazonbot ℹ Your choice

🚫 If you want to block

User-agent: Amazonbot + Disallow: / in robots.txt
Amazon states it respects robots.txt

✅ If it's showing blocked (and you don't want that)

Feeds Alexa and Amazon AI services — allowing may improve your presence in Amazon's ecosystem
Check robots.txt and WAF for Amazonbot rules

AI cohere-ai ℹ Your choice

🚫 If you want to block

User-agent: cohere-ai + Disallow: / in robots.txt
Powers enterprise AI tools used by many businesses

✅ If it's showing blocked (and you don't want that)

Check robots.txt for cohere-ai entries or broad User-agent: * Disallow

SEO AhrefsBot / SemrushBot / MJ12bot ℹ Often intentional

🚫 If you want to block

Prevents competitors from seeing your backlink profile in Ahrefs/Semrush — common choice
Add individual User-agent Disallow rules in robots.txt for each
Or use Cloudflare WAF to match their UA strings

✅ If it's showing blocked (and you don't want that)

You won't be able to audit your own site's backlinks in these tools
Remove their Disallow rules from robots.txt and check WAF
Alternatively: whitelist your own IP in Cloudflare when running audits

SEO ScreamingFrog / Sitebulb ⚠ Self-audit issue

🚫 If you want to block

Add their UA strings to robots.txt Disallow or block via Cloudflare WAF
Note: this means you can't crawl your own site with these tools

✅ If it's showing blocked (and you don't want that)

These only crawl when you manually run them — being blocked breaks your own site audits
Check Cloudflare for aggressive bot rules catching their UA strings
Fix: whitelist your IP in Cloudflare when running a crawl

SEO Diffbot ℹ Often intentional

🚫 If you want to block

Diffbot is a commercial data extraction service — blocking is common and generally fine
Cloudflare WAF: User-Agent contains Diffbot
Or User-agent: Diffbot + Disallow: / in robots.txt

✅ If it's showing blocked (and you don't want that)

If you use Diffbot's own API to extract your data, blocking will break your own usage
Whitelist Diffbot's IP ranges or remove UA block rules

Quick Fix Recipes

🚨 Googlebot is blocked — fix it

Cloudflare dashboard → Security → Bots → check settings
Check robots.txt — must NOT have Disallow: / under User-agent: * or User-agent: Googlebot
Check .htaccess / nginx for BotBlocker or UA-based deny rules
Test in Google Search Console → URL Inspection → "Test Live URL"

🛡 Block all AI training bots

Add to robots.txt:
User-agent: GPTBot Disallow: /
User-agent: ChatGPT-User Disallow: /
User-agent: Google-Extended Disallow: /
User-agent: ClaudeBot Disallow: /
User-agent: CCBot Disallow: /
Also add: <meta name="robots" content="noai, noimageai">

📋 Reading robots.txt results

Click the 📋 icon next to any URL in the table to see its parsed robots.txt rules
Disallow: / = entire site blocked for that bot
Disallow: /wp-admin/ = only that path blocked
Allow: / overrides a broader Disallow
Missing User-agent entry = bot uses * (wildcard) rules