Robots.txt Generator
Take complete control over which bots crawl your site — and which ones don't. Generate a production-ready robots.txt file in seconds.
Includes templates for blocking AI crawlers (GPTBot, Google-Extended, CCBot), managing crawl budgets, and configuring sitemap directives. Copy, paste, deploy.
Configure Rules
Googlebot ignores Crawl-delay. This is mainly for Bing and other bots.
robots.txt Output
# robots.txt # Generated by Webvello Robots.txt Generator # https://www.webvello.com/tools/robots-generator User-agent: * Allow: / Disallow: /admin/ Disallow: /api/
How to deploy
- Copy the output above
- Save as
robots.txtin your website's root directory - Verify it's accessible at
https://yoursite.com/robots.txt - Test in Google Search Console → URL Inspection
Why You Need a Robots.txt File
Your robots.txt is the gatekeeper of your entire website. Configure it wrong and search engines can't find your content. Configure it right and you unlock these six advantages.
Control Crawler Access
Decide exactly which bots can access which parts of your site. Block admin panels, staging areas, and duplicate content from being crawled.
Block AI Crawlers
Stop GPTBot, Google-Extended, CCBot, and other AI training crawlers from scraping your content — with ready-made directives built into the tool.
Manage Crawl Budget
Every site has a limited crawl budget. Direct search engines to your most important pages by blocking low-value content from being crawled.
Point to Your Sitemap
Include Sitemap directives so search engines discover your XML sitemap immediately — ensuring all your important pages get found and indexed.
Reduce Server Load
Set crawl-delay directives to throttle aggressive bots. Prevent unnecessary crawling that wastes your server resources and bandwidth.
Pre-Built Templates
Start with common configurations — standard SEO setup, AI blocker template, WordPress defaults — and customize from there. No syntax memorization needed.
Critical Warning: Disallow Is Not Noindex
noindex meta tag and allow crawling so Google can see the directive. Blocking with Disallow while expecting deindexing is the single most common robots.txt mistake in SEO.How to Use This Robots.txt Generator
Whether you're creating a robots.txt from scratch or updating an existing one, this tool walks you through it. Here's the fastest path from zero to deployed.
Start with a Template (or Blank)
Choose a pre-built template if one fits your use case — standard SEO configuration, AI crawler blocker, WordPress defaults, or a restrictive setup for staging sites. Or start blank and build custom rules from scratch.
Add User-Agent Rules
Specify which bots each rule applies to. Use "*" for all bots, or name specific crawlers like "Googlebot", "Bingbot", or "GPTBot". Each User-agent block can have its own set of Allow and Disallow directives.
Configure Allow and Disallow Directives
Set which paths each bot can and cannot access. Disallow "/" blocks the entire site. Disallow "/admin/" blocks your admin area. Use wildcards (* and $) for pattern matching. Order matters — more specific rules override general ones for the same bot.
Add Your Sitemap URL
Include your XML sitemap URL (e.g., "Sitemap: https://yourdomain.com/sitemap.xml"). This is the simplest way to ensure every crawler immediately discovers your full page listing. Add multiple sitemaps if you have them.
Copy, Deploy, and Test
Copy the generated robots.txt content. Upload it to your site's root directory so it's accessible at yourdomain.com/robots.txt. Then test it using Google Search Console's robots.txt Tester (or Bing Webmaster Tools) to verify your directives work as expected.
Robots.txt by the Numbers
A tiny text file with outsized impact on how search engines interact with your site.
The Robots Exclusion Protocol was first proposed by Martijn Koster in 1994. Source: robotstxt.org.
Robots.txt Best Practices
A misconfigured robots.txt file can silently kill your search traffic. No error messages, no warnings — just pages that never get crawled, never get indexed, and never rank. Follow these best practices to ensure your robots.txt works for you, not against you.
Start Permissive, Then Restrict
The safest default is to allow everything and then block specific paths you don't want crawled. Start with User-agent: * and Allow: /, then add targeted Disallow rules for admin areas, staging pages, internal search results, and other low-value content. This approach prevents the common mistake of accidentally blocking important pages.
Always Include Your Sitemap
Adding a Sitemap directive is the single easiest SEO win you can get from robots.txt. It takes one line — Sitemap: https://yourdomain.com/sitemap.xml — and it ensures that every crawler, from Googlebot to the smallest niche search engine, knows exactly where to find your complete page listing. If you have multiple sitemaps (blog, products, pages), list all of them.
Block Low-Value URL Patterns
Internal search result pages, faceted navigation URLs, print versions, paginated archives, and URL parameter variations all waste crawl budget. Use wildcard patterns to block them efficiently. For example, Disallow: /search* blocks all internal search pages, and Disallow: /*?sort= blocks sort-parameter URLs. This focuses crawler attention on your canonical, high-value pages.
Handle AI Crawlers Deliberately
The rise of AI crawlers has added a new dimension to robots.txt management. Bots like GPTBot (OpenAI), Google-Extended (Google AI training), CCBot (Common Crawl), anthropic-ai (Anthropic), and Applebot-Extended (Apple Intelligence) scrape web content for training large language models. Decide your policy: block all AI crawlers, allow specific ones, or allow everything. Whatever you choose, make it a deliberate decision rather than a passive default.
Don't Block CSS and JavaScript
This was common advice in the early 2000s but is actively harmful today. Google needs to render your pages to understand them fully. If you block CSS and JavaScript files in robots.txt, Googlebot can't render your page, which means it can't evaluate your content layout, user experience, or mobile friendliness. Always allow crawling of CSS and JS resources.
Use Crawl-delay Wisely
The Crawl-delay directive throttles how frequently a bot makes requests. It's useful for reducing server load from aggressive crawlers — but there's a catch. Googlebot ignores Crawl-delay entirely. To control Google's crawl rate, use the crawl rate settings in Google Search Console. Bingbot, Yandex, and most other crawlers do respect Crawl-delay. A value of 1-10 seconds is typical; anything higher risks slowing crawl discovery significantly.
Test Before and After Deploying
Before uploading a new robots.txt, test it using Google Search Console's robots.txt Tester. Enter URLs you expect to be blocked and URLs you expect to be allowed, and verify the tool shows the correct result for each. After deploying, re-test to confirm the live file matches what you intended. A typo in a single line can inadvertently block your entire site.
Remember: Robots.txt Is Not Security
This cannot be emphasized enough. The Robots Exclusion Protocol is voluntary. Well-behaved crawlers follow it; malicious bots, scrapers, and security scanners ignore it completely. Your robots.txt file is publicly accessible — anyone can read it and see which paths you're trying to hide. Never rely on robots.txt to protect sensitive information. Use proper authentication, server-side access controls, and firewall rules instead.
Keep It Simple and Maintainable
A robots.txt file with 200 lines of rules is a maintenance nightmare. Group your rules logically: one block for all-bot rules, one for AI crawlers, one for specific search engine exceptions. Add comments (lines starting with #) to explain why each rule exists. When you revisit the file in six months, those comments will save you from accidentally breaking something.
Need a Full Technical SEO Audit?
Robots.txt is one piece of technical SEO. Our team audits crawl accessibility, site architecture, page speed, mobile usability, structured data, and more — then builds an action plan to fix what's holding your rankings back.
Common Robots.txt Mistakes to Avoid
These five mistakes cause more SEO damage than almost any other technical issue — because they're completely silent. No error messages. No warnings. Just pages that never rank.
Accidentally Blocking Your Entire Site
It only takes two lines: "User-agent: *" and "Disallow: /". During development or staging, this is standard practice. But deploying it to production is catastrophic. Your entire site disappears from search results within days. Always double-check your robots.txt after site migrations, CMS updates, and staging-to-production deployments.
Blocking CSS and JavaScript Files
In 2024+, Googlebot needs to render your page to understand it. Blocking CSS and JS prevents rendering, which means Google can't evaluate your layout, mobile experience, or content structure. The old practice of "Disallow: /wp-content/" or "Disallow: /*.js$" actively hurts your SEO. Remove these blocks immediately.
Confusing Disallow with Noindex
Disallow prevents crawling. Noindex prevents indexing. They're fundamentally different. If a page has inbound links from external sites, Google may index the URL (without a snippet) even if it's Disallowed — because Google discovers the URL through links, not crawling. To deindex a page reliably, use a noindex meta tag and allow crawling.
Forgetting Subdomain-Specific Robots.txt
Your robots.txt at example.com only applies to example.com. If you have blog.example.com, app.example.com, or docs.example.com, each subdomain needs its own robots.txt file. A missing file means no crawl restrictions — which may or may not be what you want. Audit every subdomain, not just your main domain.
Not Testing After Changes
A single typo — "Disalow" instead of "Disallow" — silently breaks the entire rule. Google ignores malformed lines without warning. After every edit, test your robots.txt using Google Search Console's robots.txt Tester. Enter critical URLs and verify they show the expected "Allowed" or "Blocked" status.
AI Crawler Reference Guide
These are the major AI crawlers you should know about. Decide your blocking policy for each one — and document it in your robots.txt.
GPTBot
OpenAI
Trains GPT models and powers ChatGPT web browsing features.
User-agent: GPTBotChatGPT-User
OpenAI
Real-time browsing agent when ChatGPT users ask it to visit URLs.
User-agent: ChatGPT-UserGoogle-Extended
AI/Gemini training crawler. Separate from Googlebot (search indexing).
User-agent: Google-ExtendedCCBot
Common Crawl
Non-profit web archive used by many AI companies as training data.
User-agent: CCBotanthropic-ai
Anthropic
Collects data for training Claude AI models.
User-agent: anthropic-aiApplebot-Extended
Apple
Apple Intelligence and Siri AI training data collection.
User-agent: Applebot-ExtendedUnderstanding Robots.txt Syntax
Robots.txt syntax is deceptively simple — just four main directives — but the interactions between them trip up even experienced developers. Here's how it all works.
Every robots.txt file is composed of one or more rule groups. Each group starts with a User-agent line that specifies which crawler the rules apply to, followed by one or more Allow or Disallow directives. The wildcard * in a User-agent line means "all crawlers."
Precedence matters. When a URL matches both an Allow and a Disallow rule, most crawlers (including Googlebot) use the most specific match. A longer path wins over a shorter one. If specificity is tied, Allow wins over Disallow. This lets you write broad Disallow rules and then carve out exceptions with Allow.
For example, you might block all of /admin/ but allow /admin/public-page/. The more specific Allow for /admin/public-page/ overrides the broader Disallow for /admin/.
Wildcards extend your pattern-matching power. The asterisk (*) matches any sequence of characters. The dollar sign ($) anchors to the end of the URL. So Disallow: /*.pdf$ blocks URLs that end in .pdf, while Disallow: /search* blocks any URL path starting with /search.
Comments are lines starting with # and are ignored by crawlers. Use them generously — a well-commented robots.txt is a maintainable one.
Quick Syntax Reference
Frequently Asked Questions
Everything you need to know about robots.txt files, crawler control, and managing bot access to your website.
Need Help With Technical SEO?
Robots.txt is just one piece of the technical SEO puzzle. Our team can audit your crawl accessibility, site architecture, page speed, and indexing health — then build an action plan that drives measurable results.