← Back to Deacyde.com

πŸ€– Bulk User Agent Checker

Audit how websites respond to AI, search engine, and SEO bot user agents β€” in bulk

Bot Categories

Enter URLs or domains, one per line. We'll check how each responds to selected bot user agents.

πŸ“ Pi Proxy Active
Your Cloudflare Worker endpoint (saved locally)
200ms default β€” increase to avoid rate limiting
Use residential IP instead of Cloudflare Worker datacenter IP
Starting…

Status Code Reference

CodeMeaningSeverityWhat to Check / What to Do
200 Allowed βœ… Good Bot can access this URL normally. No action needed.
301 / 302 Redirect ℹ Info URL redirects elsewhere. Common for HTTP→HTTPS or www→non-www. Most bots follow redirects — check the destination URL still returns 200. Excessive redirect chains slow crawling.
403 Forbidden / Blocked ⚠ High Server is actively rejecting this bot's User-Agent. Check: (1) robots.txt Disallow rules, (2) Cloudflare WAF/Bot Management settings, (3) .htaccess or nginx UA blocking rules, (4) CDN security configs. If blocking a search engine bot this is a critical SEO issue.
404 Not Found ⚠ Medium URL doesn't exist. Verify the URL is correct. If it existed before, check for redirects or content removal. Bots will deindex 404 pages over time.
429 Rate Limited ⚠ Medium Too many requests too fast. Increase the delay setting in this tool. For the real bot, review crawl rate settings in Google Search Console or the bot's dashboard.
451 Legal Block ⚠ Medium Content blocked for legal reasons (GDPR, CCPA, court order). Usually geo-restricted. Check if this is intentional β€” if not, review your geo-blocking rules or CDN configuration.
500 / 502 / 503 Server Error ⚠ High Server-side failure not specific to the bot. Check server health, error logs, memory/CPU usage. Bots may back off and retry later. Persistent 5xx causes deindexing.
TIMEOUT No Response ⚠ High Connection opened but no response within 10s. Could be: geo/IP blocking at network level, DDoS protection dropping the connection, very slow server, or firewall silently dropping bot traffic.
ERR Network Error 🚨 Check Complete connection failure β€” DNS didn't resolve, server unreachable, or deep firewall block. Verify the domain is live and DNS is propagated. This tool uses a proxy so some networks may cause false positives.

By Bot β€” What a Block Means & What to Do

Search Googlebot / Googlebot-Mobile 🚨 Critical
🚫 If you want to block
  • You almost certainly don't β€” blocking Googlebot means your site won't appear in Google Search at all
  • Only block specific paths (e.g. /admin/) never the whole site
βœ… If it's showing blocked (and you don't want that)
  • Cloudflare β†’ Security β†’ Bots β†’ disable "Block AI Scrapers" (can catch Googlebot)
  • Check robots.txt for Disallow: / under User-agent: * or User-agent: Googlebot
  • Check server firewall / .htaccess for UA-based deny rules
  • Verify in Google Search Console β†’ Settings β†’ Crawl Stats
Search Bingbot / DuckDuckBot ⚠ High
🚫 If you want to block
  • Add to robots.txt: User-agent: bingbot + Disallow: /
  • Note: blocking Bingbot also removes you from Yahoo and MSN results
βœ… If it's showing blocked (and you don't want that)
  • Check robots.txt for Bingbot-specific Disallow rules
  • Check Bing Webmaster Tools for crawl error reports
  • Check Cloudflare WAF for User-Agent rules catching bingbot
Search Baiduspider ⚠ Regional
🚫 If you want to block
  • Add to robots.txt: User-agent: Baiduspider + Disallow: /
  • Baidu doesn't always honour robots.txt β€” for reliable blocking use Cloudflare WAF: User-Agent contains Baiduspider
βœ… If it's showing blocked (and you don't want that)
  • Only matters if you target Chinese-language audiences
  • Check server/WAF for geo-blocking rules (China IPs are often blocked by default)
  • Check robots.txt for User-agent: Baiduspider Disallow
Search NaverBot β„Ή Regional
🚫 If you want to block
  • Add to robots.txt: User-agent: Naverbot + Disallow: /
βœ… If it's showing blocked (and you don't want that)
  • Only relevant if you target South Korean audiences (Naver = ~60% Korean search share)
  • Check WAF/firewall for Asian IP geo-blocks catching Naver's crawlers
AI GPTBot / ChatGPT-User β„Ή Your choice
β„Ή These are two different bots
  • GPTBot β€” collects training data for OpenAI models
  • ChatGPT-User β€” real-time browsing; blocking means ChatGPT won't cite your site in answers
🚫 If you want to block
  • User-agent: GPTBot + Disallow: / in robots.txt
  • User-agent: ChatGPT-User + Disallow: / in robots.txt
  • Also add <meta name="robots" content="noai, noimageai"> to block image training
βœ… If it's showing blocked (and you don't want that)
  • Check robots.txt for GPTBot/ChatGPT-User Disallow entries
  • Check Cloudflare β†’ WAF for rules matching OpenAI UA strings
AI OAI-SearchBot ⚠ Easy to miss
β„Ή Different from GPTBot
  • Powers ChatGPT's live web search β€” not training data. Many sites block GPTBot and forget this one
  • Blocking = your site won't appear as a source in ChatGPT search results
🚫 If you want to block
  • User-agent: OAI-SearchBot + Disallow: / in robots.txt
βœ… If it's showing blocked (and you don't want that)
  • Check robots.txt β€” a User-agent: * Disallow will catch this bot
  • Check Cloudflare WAF for broad OpenAI/bot UA rules
AI Google-Extended β„Ή Safe to block
🚫 If you want to block
  • Blocks Gemini/Bard AI training β€” does NOT affect regular Google Search crawling or indexing
  • User-agent: Google-Extended + Disallow: / in robots.txt
  • Safest AI opt-out β€” zero SEO risk
βœ… If it's showing blocked (and you don't want that)
  • Remove any User-agent: Google-Extended Disallow from robots.txt
  • Check WAF rules catching Google-Extended UA
AI ClaudeBot β„Ή Your choice
🚫 If you want to block
  • User-agent: ClaudeBot + Disallow: / in robots.txt
  • Anthropic states it respects robots.txt opt-outs
βœ… If it's showing blocked (and you don't want that)
  • Check robots.txt for ClaudeBot Disallow entries or broad User-agent: * blocks
  • Allowing means Anthropic may use your content for Claude training
AI Bytespider (TikTok / ByteDance) ⚠ Consider blocking
🚫 If you want to block
  • Many sites block due to privacy/data sovereignty concerns (ByteDance is Chinese-owned)
  • Does not reliably respect robots.txt β€” use Cloudflare WAF instead
  • WAF rule: User-Agent contains Bytespider
βœ… If it's showing blocked (and you don't want that)
  • Check WAF rules for Bytespider or broad bot-blocking rules
  • Check if geo-blocking of Chinese IPs is enabled
AI CCBot (Common Crawl) β„Ή Your choice
🚫 If you want to block
  • CCBot feeds training data for GPT-3, LLaMA, and many open-source AI models
  • Blocking it indirectly blocks multiple AI pipelines at once
  • User-agent: CCBot + Disallow: / in robots.txt
βœ… If it's showing blocked (and you don't want that)
  • Check robots.txt for CCBot Disallow entries
  • Allowing contributes to open-source AI datasets
AI Amazonbot β„Ή Your choice
🚫 If you want to block
  • User-agent: Amazonbot + Disallow: / in robots.txt
  • Amazon states it respects robots.txt
βœ… If it's showing blocked (and you don't want that)
  • Feeds Alexa and Amazon AI services β€” allowing may improve your presence in Amazon's ecosystem
  • Check robots.txt and WAF for Amazonbot rules
AI cohere-ai β„Ή Your choice
🚫 If you want to block
  • User-agent: cohere-ai + Disallow: / in robots.txt
  • Powers enterprise AI tools used by many businesses
βœ… If it's showing blocked (and you don't want that)
  • Check robots.txt for cohere-ai entries or broad User-agent: * Disallow
SEO AhrefsBot / SemrushBot / MJ12bot β„Ή Often intentional
🚫 If you want to block
  • Prevents competitors from seeing your backlink profile in Ahrefs/Semrush β€” common choice
  • Add individual User-agent Disallow rules in robots.txt for each
  • Or use Cloudflare WAF to match their UA strings
βœ… If it's showing blocked (and you don't want that)
  • You won't be able to audit your own site's backlinks in these tools
  • Remove their Disallow rules from robots.txt and check WAF
  • Alternatively: whitelist your own IP in Cloudflare when running audits
SEO ScreamingFrog / Sitebulb ⚠ Self-audit issue
🚫 If you want to block
  • Add their UA strings to robots.txt Disallow or block via Cloudflare WAF
  • Note: this means you can't crawl your own site with these tools
βœ… If it's showing blocked (and you don't want that)
  • These only crawl when you manually run them β€” being blocked breaks your own site audits
  • Check Cloudflare for aggressive bot rules catching their UA strings
  • Fix: whitelist your IP in Cloudflare when running a crawl
SEO Diffbot β„Ή Often intentional
🚫 If you want to block
  • Diffbot is a commercial data extraction service β€” blocking is common and generally fine
  • Cloudflare WAF: User-Agent contains Diffbot
  • Or User-agent: Diffbot + Disallow: / in robots.txt
βœ… If it's showing blocked (and you don't want that)
  • If you use Diffbot's own API to extract your data, blocking will break your own usage
  • Whitelist Diffbot's IP ranges or remove UA block rules

Quick Fix Recipes

🚨 Googlebot is blocked β€” fix it
  • Cloudflare dashboard β†’ Security β†’ Bots β†’ check settings
  • Check robots.txt β€” must NOT have Disallow: / under User-agent: * or User-agent: Googlebot
  • Check .htaccess / nginx for BotBlocker or UA-based deny rules
  • Test in Google Search Console β†’ URL Inspection β†’ "Test Live URL"
πŸ›‘ Block all AI training bots
  • Add to robots.txt:
  • User-agent: GPTBot Disallow: /
  • User-agent: ChatGPT-User Disallow: /
  • User-agent: Google-Extended Disallow: /
  • User-agent: ClaudeBot Disallow: /
  • User-agent: CCBot Disallow: /
  • Also add: <meta name="robots" content="noai, noimageai">
πŸ“‹ Reading robots.txt results
  • Click the πŸ“‹ icon next to any URL in the table to see its parsed robots.txt rules
  • Disallow: / = entire site blocked for that bot
  • Disallow: /wp-admin/ = only that path blocked
  • Allow: / overrides a broader Disallow
  • Missing User-agent entry = bot uses * (wildcard) rules
Note: Checks are performed via a Cloudflare Worker proxy. Results may differ from direct server responses. Some servers detect proxy requests and return different status codes than they would to real bots. Use a server-side tool for production-grade auditing.