Skip to main content
USA Based
|
Free SEO Tool — No Sign-Up Required

Robots.txt Generator

Take complete control over which bots crawl your site — and which ones don't. Generate a production-ready robots.txt file in seconds.

Includes templates for blocking AI crawlers (GPTBot, Google-Extended, CCBot), managing crawl budgets, and configuring sitemap directives. Copy, paste, deploy.

Configure Rules

Googlebot ignores Crawl-delay. This is mainly for Bing and other bots.

robots.txt Output

# robots.txt
# Generated by Webvello Robots.txt Generator
# https://www.webvello.com/tools/robots-generator

User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/

How to deploy

  1. Copy the output above
  2. Save as robots.txt in your website's root directory
  3. Verify it's accessible at https://yoursite.com/robots.txt
  4. Test in Google Search Console → URL Inspection

Why You Need a Robots.txt File

Your robots.txt is the gatekeeper of your entire website. Configure it wrong and search engines can't find your content. Configure it right and you unlock these six advantages.

Control Crawler Access

Decide exactly which bots can access which parts of your site. Block admin panels, staging areas, and duplicate content from being crawled.

Block AI Crawlers

Stop GPTBot, Google-Extended, CCBot, and other AI training crawlers from scraping your content — with ready-made directives built into the tool.

Manage Crawl Budget

Every site has a limited crawl budget. Direct search engines to your most important pages by blocking low-value content from being crawled.

Point to Your Sitemap

Include Sitemap directives so search engines discover your XML sitemap immediately — ensuring all your important pages get found and indexed.

Reduce Server Load

Set crawl-delay directives to throttle aggressive bots. Prevent unnecessary crawling that wastes your server resources and bandwidth.

Pre-Built Templates

Start with common configurations — standard SEO setup, AI blocker template, WordPress defaults — and customize from there. No syntax memorization needed.

Critical Warning: Disallow Is Not Noindex

A common — and dangerous — misconception: Disallow does not remove pages from search results. If external links point to a Disallowed page, Google may still show the URL in results (with no snippet). To reliably deindex a page, use a noindex meta tag and allow crawling so Google can see the directive. Blocking with Disallow while expecting deindexing is the single most common robots.txt mistake in SEO.

How to Use This Robots.txt Generator

Whether you're creating a robots.txt from scratch or updating an existing one, this tool walks you through it. Here's the fastest path from zero to deployed.

1

Start with a Template (or Blank)

Choose a pre-built template if one fits your use case — standard SEO configuration, AI crawler blocker, WordPress defaults, or a restrictive setup for staging sites. Or start blank and build custom rules from scratch.

2

Add User-Agent Rules

Specify which bots each rule applies to. Use "*" for all bots, or name specific crawlers like "Googlebot", "Bingbot", or "GPTBot". Each User-agent block can have its own set of Allow and Disallow directives.

3

Configure Allow and Disallow Directives

Set which paths each bot can and cannot access. Disallow "/" blocks the entire site. Disallow "/admin/" blocks your admin area. Use wildcards (* and $) for pattern matching. Order matters — more specific rules override general ones for the same bot.

4

Add Your Sitemap URL

Include your XML sitemap URL (e.g., "Sitemap: https://yourdomain.com/sitemap.xml"). This is the simplest way to ensure every crawler immediately discovers your full page listing. Add multiple sitemaps if you have them.

5

Copy, Deploy, and Test

Copy the generated robots.txt content. Upload it to your site's root directory so it's accessible at yourdomain.com/robots.txt. Then test it using Google Search Console's robots.txt Tester (or Bing Webmaster Tools) to verify your directives work as expected.

Robots.txt by the Numbers

A tiny text file with outsized impact on how search engines interact with your site.

1994
Year Introduced
Robots Exclusion Protocol
6+
AI Crawlers
You can block today
1st
File Checked
By every well-behaved bot
Rules Allowed
No limit on directives

The Robots Exclusion Protocol was first proposed by Martijn Koster in 1994. Source: robotstxt.org.

Robots.txt Best Practices

A misconfigured robots.txt file can silently kill your search traffic. No error messages, no warnings — just pages that never get crawled, never get indexed, and never rank. Follow these best practices to ensure your robots.txt works for you, not against you.

Start Permissive, Then Restrict

The safest default is to allow everything and then block specific paths you don't want crawled. Start with User-agent: * and Allow: /, then add targeted Disallow rules for admin areas, staging pages, internal search results, and other low-value content. This approach prevents the common mistake of accidentally blocking important pages.

Always Include Your Sitemap

Adding a Sitemap directive is the single easiest SEO win you can get from robots.txt. It takes one line — Sitemap: https://yourdomain.com/sitemap.xml — and it ensures that every crawler, from Googlebot to the smallest niche search engine, knows exactly where to find your complete page listing. If you have multiple sitemaps (blog, products, pages), list all of them.

Block Low-Value URL Patterns

Internal search result pages, faceted navigation URLs, print versions, paginated archives, and URL parameter variations all waste crawl budget. Use wildcard patterns to block them efficiently. For example, Disallow: /search* blocks all internal search pages, and Disallow: /*?sort= blocks sort-parameter URLs. This focuses crawler attention on your canonical, high-value pages.

Handle AI Crawlers Deliberately

The rise of AI crawlers has added a new dimension to robots.txt management. Bots like GPTBot (OpenAI), Google-Extended (Google AI training), CCBot (Common Crawl), anthropic-ai (Anthropic), and Applebot-Extended (Apple Intelligence) scrape web content for training large language models. Decide your policy: block all AI crawlers, allow specific ones, or allow everything. Whatever you choose, make it a deliberate decision rather than a passive default.

Don't Block CSS and JavaScript

This was common advice in the early 2000s but is actively harmful today. Google needs to render your pages to understand them fully. If you block CSS and JavaScript files in robots.txt, Googlebot can't render your page, which means it can't evaluate your content layout, user experience, or mobile friendliness. Always allow crawling of CSS and JS resources.

Use Crawl-delay Wisely

The Crawl-delay directive throttles how frequently a bot makes requests. It's useful for reducing server load from aggressive crawlers — but there's a catch. Googlebot ignores Crawl-delay entirely. To control Google's crawl rate, use the crawl rate settings in Google Search Console. Bingbot, Yandex, and most other crawlers do respect Crawl-delay. A value of 1-10 seconds is typical; anything higher risks slowing crawl discovery significantly.

Test Before and After Deploying

Before uploading a new robots.txt, test it using Google Search Console's robots.txt Tester. Enter URLs you expect to be blocked and URLs you expect to be allowed, and verify the tool shows the correct result for each. After deploying, re-test to confirm the live file matches what you intended. A typo in a single line can inadvertently block your entire site.

Remember: Robots.txt Is Not Security

This cannot be emphasized enough. The Robots Exclusion Protocol is voluntary. Well-behaved crawlers follow it; malicious bots, scrapers, and security scanners ignore it completely. Your robots.txt file is publicly accessible — anyone can read it and see which paths you're trying to hide. Never rely on robots.txt to protect sensitive information. Use proper authentication, server-side access controls, and firewall rules instead.

Keep It Simple and Maintainable

A robots.txt file with 200 lines of rules is a maintenance nightmare. Group your rules logically: one block for all-bot rules, one for AI crawlers, one for specific search engine exceptions. Add comments (lines starting with #) to explain why each rule exists. When you revisit the file in six months, those comments will save you from accidentally breaking something.

Need a Full Technical SEO Audit?

Robots.txt is one piece of technical SEO. Our team audits crawl accessibility, site architecture, page speed, mobile usability, structured data, and more — then builds an action plan to fix what's holding your rankings back.

Common Robots.txt Mistakes to Avoid

These five mistakes cause more SEO damage than almost any other technical issue — because they're completely silent. No error messages. No warnings. Just pages that never rank.

Accidentally Blocking Your Entire Site

It only takes two lines: "User-agent: *" and "Disallow: /". During development or staging, this is standard practice. But deploying it to production is catastrophic. Your entire site disappears from search results within days. Always double-check your robots.txt after site migrations, CMS updates, and staging-to-production deployments.

Blocking CSS and JavaScript Files

In 2024+, Googlebot needs to render your page to understand it. Blocking CSS and JS prevents rendering, which means Google can't evaluate your layout, mobile experience, or content structure. The old practice of "Disallow: /wp-content/" or "Disallow: /*.js$" actively hurts your SEO. Remove these blocks immediately.

Confusing Disallow with Noindex

Disallow prevents crawling. Noindex prevents indexing. They're fundamentally different. If a page has inbound links from external sites, Google may index the URL (without a snippet) even if it's Disallowed — because Google discovers the URL through links, not crawling. To deindex a page reliably, use a noindex meta tag and allow crawling.

Forgetting Subdomain-Specific Robots.txt

Your robots.txt at example.com only applies to example.com. If you have blog.example.com, app.example.com, or docs.example.com, each subdomain needs its own robots.txt file. A missing file means no crawl restrictions — which may or may not be what you want. Audit every subdomain, not just your main domain.

Not Testing After Changes

A single typo — "Disalow" instead of "Disallow" — silently breaks the entire rule. Google ignores malformed lines without warning. After every edit, test your robots.txt using Google Search Console's robots.txt Tester. Enter critical URLs and verify they show the expected "Allowed" or "Blocked" status.

AI Crawler Reference Guide

These are the major AI crawlers you should know about. Decide your blocking policy for each one — and document it in your robots.txt.

GPTBot

OpenAI

Trains GPT models and powers ChatGPT web browsing features.

User-agent: GPTBot

ChatGPT-User

OpenAI

Real-time browsing agent when ChatGPT users ask it to visit URLs.

User-agent: ChatGPT-User

Google-Extended

Google

AI/Gemini training crawler. Separate from Googlebot (search indexing).

User-agent: Google-Extended

CCBot

Common Crawl

Non-profit web archive used by many AI companies as training data.

User-agent: CCBot

anthropic-ai

Anthropic

Collects data for training Claude AI models.

User-agent: anthropic-ai

Applebot-Extended

Apple

Apple Intelligence and Siri AI training data collection.

User-agent: Applebot-Extended

Understanding Robots.txt Syntax

Robots.txt syntax is deceptively simple — just four main directives — but the interactions between them trip up even experienced developers. Here's how it all works.

Every robots.txt file is composed of one or more rule groups. Each group starts with a User-agent line that specifies which crawler the rules apply to, followed by one or more Allow or Disallow directives. The wildcard * in a User-agent line means "all crawlers."

Precedence matters. When a URL matches both an Allow and a Disallow rule, most crawlers (including Googlebot) use the most specific match. A longer path wins over a shorter one. If specificity is tied, Allow wins over Disallow. This lets you write broad Disallow rules and then carve out exceptions with Allow.

For example, you might block all of /admin/ but allow /admin/public-page/. The more specific Allow for /admin/public-page/ overrides the broader Disallow for /admin/.

Wildcards extend your pattern-matching power. The asterisk (*) matches any sequence of characters. The dollar sign ($) anchors to the end of the URL. So Disallow: /*.pdf$ blocks URLs that end in .pdf, while Disallow: /search* blocks any URL path starting with /search.

Comments are lines starting with # and are ignored by crawlers. Use them generously — a well-commented robots.txt is a maintainable one.

Quick Syntax Reference

User-agent: * — applies to all bots. Disallow: /path/ — blocks the path. Allow: /path/exception/ — allows within a blocked parent. Sitemap: URL — points to your sitemap. Crawl-delay: N — seconds between requests (ignored by Googlebot). # — comment line.

Frequently Asked Questions

Everything you need to know about robots.txt files, crawler control, and managing bot access to your website.

A robots.txt file is a plain text file that lives at the root of your website (e.g., https://example.com/robots.txt). It follows the Robots Exclusion Protocol — an industry standard since 1994 — and gives instructions to web crawlers about which pages they're allowed to access. When a bot arrives at your site, robots.txt is the very first file it reads. Think of it as the bouncer at the door of your website.
Add User-agent directives for each AI crawler you want to block. For example: "User-agent: GPTBot" followed by "Disallow: /" blocks OpenAI's crawler entirely. Do the same for "Google-Extended" (Google's AI training bot), "CCBot" (Common Crawl), "anthropic-ai" (Anthropic/Claude), and "Applebot-Extended" (Apple Intelligence). This tool includes a pre-built AI blocker template that handles all of them at once.
Yes — but not in the way you might think. Robots.txt doesn't boost rankings, but it can absolutely tank them. If you accidentally block your important pages with a Disallow directive, search engines can't crawl them, which means they can't index them. No index = no rankings. On the positive side, a well-configured robots.txt helps search engines focus their crawl budget on your highest-value pages.
Crawl budget is the number of pages a search engine will crawl on your site within a given timeframe. For large sites (10,000+ pages), this is a real constraint. If Googlebot spends its budget crawling admin pages, print stylesheets, and duplicate filter URLs, your important content pages may not get crawled frequently enough. Robots.txt lets you block low-value URLs so crawlers spend their budget where it matters.
Crawl-delay tells bots to wait a specified number of seconds between consecutive requests. If you set "Crawl-delay: 10", compliant bots will wait 10 seconds between each page fetch. Important caveat: Googlebot ignores this directive entirely — you need to set crawl rate limits in Google Search Console instead. Bingbot, Yandex, and most other crawlers do respect it.
Absolutely. Adding a "Sitemap: https://example.com/sitemap.xml" line is one of the simplest and most effective things you can do. It ensures that every well-behaved crawler immediately knows where to find your complete page listing. You can list multiple sitemaps too — for example, separate sitemaps for pages, blog posts, and images. It takes five seconds and eliminates a common crawling blind spot.
Disallow (in robots.txt) prevents a crawler from even visiting a URL. Noindex (a meta tag on the page itself) tells a crawler that visited the page not to add it to the search index. Here's the critical gotcha: if you Disallow a page, the crawler never sees the noindex tag. So if a page has external links pointing to it, it might still appear in search results (with no snippet) even with Disallow. For reliable deindexing, use noindex and allow crawling.
Yes. Googlebot, Bingbot, and most modern crawlers support two wildcard characters. The asterisk (*) matches any sequence of characters — "Disallow: /search*" blocks all URLs starting with /search. The dollar sign ($) matches the end of a URL — "Disallow: /*.pdf$" blocks all PDF files regardless of directory. These patterns let you write concise rules instead of listing hundreds of individual URLs.
No — and this is critical to understand. Robots.txt is a voluntary protocol. Well-behaved bots (Googlebot, Bingbot) follow it. Malicious bots, scrapers, and security scanners ignore it completely. Never use robots.txt to hide sensitive information like login pages, admin panels with weak authentication, or private data. Use proper authentication, IP restrictions, and server-side access controls for real security.
Yes — 100% free, no sign-up, no usage limits. Generate robots.txt files for as many sites as you need. The tool includes pre-built templates, AI crawler blocking directives, and real-time preview. Everything runs in your browser, so your configuration data stays private.

Need Help With Technical SEO?

Robots.txt is just one piece of the technical SEO puzzle. Our team can audit your crawl accessibility, site architecture, page speed, and indexing health — then build an action plan that drives measurable results.

Get Free Growth Plan