Robots.txt Tester
Inspect how richproxy.app controls crawler access, blocked paths, sitemap references and AI crawler rules.
Preview
Score: 92
- At least one user-agent has Disallow: / which blocks the entire site.
Get your full report + exact fixes
See what’s hurting your SEO and how to fix it step by step.
- Full breakdown
- Actionable fixes
- Prioritized next steps
Robots.txt Status
Robots.txt Status
Present
Score
92
/100
· Strong
View Full Robots.txt
Robots.txt Content Preview
# As a condition of accessing this website, you agree to abide by the following # content signals: # (a) If a Content-Signal = yes, you may collect content for the corresponding # use. # (b) If a Content-Signal = no, you may not collect content for the # corresponding use. # (c) If the website operator does not include a Content-Signal for a # corresponding use, the website operator neither grants nor restricts # permission via Content-Signal with respect to the corresponding use. # The content signals and their meanings are: # search: building a search index and providing search results (e.g., returning # hyperlinks and short excerpts from your website's contents). Search does not # include providing AI-generated search summaries. # ai-input: inputting content into one or more AI models (e.g., retrieval # augmented generation, grounding, or other real-time taking of content for # generative AI search answers). # ai-train: training or fine-tuning AI models. # ANY RESTRICTIONS EXPRESSED VIA CONTENT SIGNALS ARE EXPRESS RESERVATIONS OF # RIGHTS UNDER ARTICLE 4 OF THE EUROPEAN UNION DIRECTIVE 2019/790 ON COPYRIGHT # AND RELATED RIGHTS IN THE DIGITAL SINGLE MARKET. # BEGIN Cloudflare Managed content User-agent: * Content-Signal: search=yes,ai-train=no Allow: / User-agent: Amazonbot Disallow: / User-agent: Applebot-Extended Disallow: / User-agent: Bytespider Disallow: / User-agent: CCBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: CloudflareBrowserRenderingCrawler Disallow: / User-agent: Google-Extended Disallow: / User-agent: GPTBot Disallow: / User-agent: meta-externalagent Disallow: / # END Cloudflare Managed Content # Rich Proxy - Optimized robots.txt # For help, see: https://developers.google.com/search/docs/crawling-indexing/robots/intro User-agent: * Allow: / # EXCLUDE SYSTEM AND AUTOMATION FILES Disallow: /*.py$ Disallow: /*.json$ Disallow: /*.log$ Disallow: /*.bak$ Disallow: /*.txt$ Allow: /robots.txt Allow: /llms.txt # EXCLUDE SCRIPTS AND INTERNAL DIRECTORIES Disallow: /__pycache__/ Disallow: /node_modules/ Disallow: /scripts/ Disallow: /tmp/ # Keep public rendering assets crawlable so search engines can render pages correctly. Allow: /assets/ # EXCLUDE DEBUG & BACKUP HTML FILES Disallow: /*_debug.html Disallow: /*_test.html Disallow: /*_final_v*.html Disallow: /index_debug.html Disallow: /light-theme-test.html Disallow: /404.html # PROTECT SENSITIVE FILES Disallow: /package.json Disallow: /package-lock.json Disallow: /webhook.php # SET SITEMAP PATH Sitemap: https://richproxy.app/sitemap.xml # PREVENT AI CRAWLERS FROM SCRAPING DATA (Optional, but recommended for niche data businesses) # User-agent: GPTBot # Disallow: /
User-agent Rules
| User-agent(s) | Allowed paths | Disallowed paths |
|---|---|---|
| * |
|
No explicit Disallow rules. |
| amazonbot | No explicit Allow rules. |
|
| applebot-extended | No explicit Allow rules. |
|
| bytespider | No explicit Allow rules. |
|
| ccbot | No explicit Allow rules. |
|
| claudebot | No explicit Allow rules. |
|
| cloudflarebrowserrenderingcrawler | No explicit Allow rules. |
|
| google-extended | No explicit Allow rules. |
|
| gptbot | No explicit Allow rules. |
|
| meta-externalagent | No explicit Allow rules. |
|
| * |
|
|
Blocked and Allowed Paths
| Blocked paths |
|
|---|---|
| Allowed paths |
|
| Crawl-delay | No Crawl-delay directive detected. |
Sitemaps Detected
AI Crawler Policy
At least one AI crawler (such as GPTBot, ChatGPT-User, ClaudeBot, PerplexityBot or Google-Extended) appears to be blocked by robots.txt.
Issues Found
- At least one user-agent has Disallow: / which blocks the entire site.
Recommendations
- Avoid blocking the entire site (Disallow: /); restrict only sensitive or low-value paths instead.
- Review your AI crawler policy for GPTBot, ChatGPT-User, ClaudeBot, PerplexityBot and Google-Extended to ensure it matches your content strategy.
- Ensure important pages, CSS and JavaScript assets are crawlable so search engines can fully render your site.