What is a robots.txt file?

A robots.txt file is a plain text file placed in your website's root directory that instructs search engine crawlers and other bots which pages or sections they can or cannot access. It follows the Robots Exclusion Protocol and is the first file bots check before crawling your site.

How do I block AI crawlers like GPTBot?

Add specific User-agent directives to your robots.txt. For example: 'User-agent: GPTBot' followed by 'Disallow: /' blocks OpenAI's crawler. Use 'User-agent: Google-Extended' with 'Disallow: /' for Google's AI training crawler, and 'User-agent: CCBot' with 'Disallow: /' for Common Crawl.

Does robots.txt affect SEO?

Yes, robots.txt directly affects SEO by controlling which pages search engines can crawl. Incorrectly blocking important pages prevents them from being indexed. Use it to manage crawl budget, block duplicate content, and prevent access to admin or staging areas.

What is Crawl-delay in robots.txt?

Crawl-delay tells bots to wait a specified number of seconds between requests to reduce server load. Note that Googlebot does NOT respect Crawl-delay — use Google Search Console's crawl rate settings instead. Bingbot and other crawlers do respect it.

Should I include my sitemap in robots.txt?

Yes, including a Sitemap directive is a best practice. It helps search engines discover your XML sitemap and all your pages. Format: 'Sitemap: https://example.com/sitemap.xml'. You can list multiple sitemaps.

Where should I put the robots.txt file?

The robots.txt file must be in your domain's root directory, accessible at https://yourdomain.com/robots.txt. Each subdomain needs its own robots.txt file. Most frameworks (WordPress, Next.js, etc.) have built-in ways to generate this file.

What is the difference between Disallow and noindex?

Disallow in robots.txt prevents crawlers from accessing a page, while noindex (a meta robots tag) tells search engines not to index a page. If you Disallow a page, search engines can't see the noindex tag. Use noindex for deindexing, Disallow for pages that shouldn't be crawled at all.

Can I use wildcards in robots.txt?

Yes, most modern crawlers support * (match any characters) and $ (match end of URL). Example: 'Disallow: /*.pdf$' blocks all PDF files, 'Disallow: /search*' blocks all search result pages.

Is robots.txt a security measure?

No, robots.txt is NOT a security measure. It's a voluntary protocol that well-behaved bots follow, but malicious crawlers will ignore it. Never use robots.txt to hide sensitive data. Use authentication, access controls, or server-side restrictions for actual security.

Is this robots.txt generator free?

Yes, completely free with no limits or sign-up required. Generate and customize robots.txt files for any number of websites. The tool runs in your browser and includes pre-built templates for common configurations.

What is AI-powered SEO?

AI-powered SEO uses artificial intelligence and machine learning to analyze millions of data points, identify ranking opportunities your competitors miss, and deliver 300%+ organic traffic growth in 6 months.

How does local SEO help my business?

Local SEO helps your business dominate local search results, drive qualified foot traffic, and capture nearby customers searching for your services. We help you rank #1 in your local market.

What makes Webvello different from other digital marketing agencies?

Webvello combines cutting-edge AI technology with proven strategies to deliver measurable results. We specialize in AI-powered SEO, local search optimization, and custom web development with a focus on conversion optimization.

Do you offer custom web development services?

Yes, we build professional custom websites that convert visitors into customers. Our sites are fast, mobile-optimized, SEO-friendly, and built with modern technologies for optimal performance.

Free SEO Tool — No Sign-Up Required

Robots.txt Generator

Take complete control over which bots crawl your site — and which ones don't. Generate a production-ready robots.txt file in seconds.

Includes templates for blocking AI crawlers (GPTBot, Google-Extended, CCBot), managing crawl budgets, and configuring sitemap directives. Copy, paste, deploy.

Configure Rules

Quick Templates

User-Agent

Directive

Path

User-Agent

Directive

Path

User-Agent

Directive

Path

Sitemap URL

Host (optional)

Crawl Delay (seconds, optional)

Googlebot ignores Crawl-delay. This is mainly for Bing and other bots.

robots.txt Output

# robots.txt
# Generated by Webvello Robots.txt Generator
# https://www.webvello.com/tools/robots-generator

User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/

How to deploy

Copy the output above
Save as robots.txt in your website's root directory
Verify it's accessible at https://yoursite.com/robots.txt
Test in Google Search Console → URL Inspection

Why You Need a Robots.txt File

Your robots.txt is the gatekeeper of your entire website. Configure it wrong and search engines can't find your content. Configure it right and you unlock these six advantages.

Control Crawler Access

Decide exactly which bots can access which parts of your site. Block admin panels, staging areas, and duplicate content from being crawled.

Block AI Crawlers

Stop GPTBot, Google-Extended, CCBot, and other AI training crawlers from scraping your content — with ready-made directives built into the tool.

Manage Crawl Budget

Every site has a limited crawl budget. Direct search engines to your most important pages by blocking low-value content from being crawled.

Point to Your Sitemap

Include Sitemap directives so search engines discover your XML sitemap immediately — ensuring all your important pages get found and indexed.

Reduce Server Load

Set crawl-delay directives to throttle aggressive bots. Prevent unnecessary crawling that wastes your server resources and bandwidth.

Pre-Built Templates

Start with common configurations — standard SEO setup, AI blocker template, WordPress defaults — and customize from there. No syntax memorization needed.

Critical Warning: Disallow Is Not Noindex

A common — and dangerous — misconception: Disallow does not remove pages from search results. If external links point to a Disallowed page, Google may still show the URL in results (with no snippet). To reliably deindex a page, use a noindex meta tag and allow crawling so Google can see the directive. Blocking with Disallow while expecting deindexing is the single most common robots.txt mistake in SEO.

How to Use This Robots.txt Generator

Whether you're creating a robots.txt from scratch or updating an existing one, this tool walks you through it. Here's the fastest path from zero to deployed.

Start with a Template (or Blank)

Choose a pre-built template if one fits your use case — standard SEO configuration, AI crawler blocker, WordPress defaults, or a restrictive setup for staging sites. Or start blank and build custom rules from scratch.

Add User-Agent Rules

Specify which bots each rule applies to. Use "*" for all bots, or name specific crawlers like "Googlebot", "Bingbot", or "GPTBot". Each User-agent block can have its own set of Allow and Disallow directives.

Configure Allow and Disallow Directives

Set which paths each bot can and cannot access. Disallow "/" blocks the entire site. Disallow "/admin/" blocks your admin area. Use wildcards (* and $) for pattern matching. Order matters — more specific rules override general ones for the same bot.

Add Your Sitemap URL

Include your XML sitemap URL (e.g., "Sitemap: https://yourdomain.com/sitemap.xml"). This is the simplest way to ensure every crawler immediately discovers your full page listing. Add multiple sitemaps if you have them.

Copy, Deploy, and Test

Copy the generated robots.txt content. Upload it to your site's root directory so it's accessible at yourdomain.com/robots.txt. Then test it using Google Search Console's robots.txt Tester (or Bing Webmaster Tools) to verify your directives work as expected.

Robots.txt by the Numbers

A tiny text file with outsized impact on how search engines interact with your site.

1994

Year Introduced

Robots Exclusion Protocol

AI Crawlers

You can block today

1st

File Checked

By every well-behaved bot

∞

Rules Allowed

No limit on directives

The Robots Exclusion Protocol was first proposed by Martijn Koster in 1994. Source: robotstxt.org.

Robots.txt Best Practices

A misconfigured robots.txt file can silently kill your search traffic. No error messages, no warnings — just pages that never get crawled, never get indexed, and never rank. Follow these best practices to ensure your robots.txt works for you, not against you.

Start Permissive, Then Restrict

The safest default is to allow everything and then block specific paths you don't want crawled. Start with User-agent: * and Allow: /, then add targeted Disallow rules for admin areas, staging pages, internal search results, and other low-value content. This approach prevents the common mistake of accidentally blocking important pages.

Always Include Your Sitemap

Adding a Sitemap directive is the single easiest SEO win you can get from robots.txt. It takes one line — Sitemap: https://yourdomain.com/sitemap.xml — and it ensures that every crawler, from Googlebot to the smallest niche search engine, knows exactly where to find your complete page listing. If you have multiple sitemaps (blog, products, pages), list all of them.

Block Low-Value URL Patterns

Internal search result pages, faceted navigation URLs, print versions, paginated archives, and URL parameter variations all waste crawl budget. Use wildcard patterns to block them efficiently. For example, Disallow: /search* blocks all internal search pages, and Disallow: /*?sort= blocks sort-parameter URLs. This focuses crawler attention on your canonical, high-value pages.

Handle AI Crawlers Deliberately

The rise of AI crawlers has added a new dimension to robots.txt management. Bots like GPTBot (OpenAI), Google-Extended (Google AI training), CCBot (Common Crawl), anthropic-ai (Anthropic), and Applebot-Extended (Apple Intelligence) scrape web content for training large language models. Decide your policy: block all AI crawlers, allow specific ones, or allow everything. Whatever you choose, make it a deliberate decision rather than a passive default.

Don't Block CSS and JavaScript

This was common advice in the early 2000s but is actively harmful today. Google needs to render your pages to understand them fully. If you block CSS and JavaScript files in robots.txt, Googlebot can't render your page, which means it can't evaluate your content layout, user experience, or mobile friendliness. Always allow crawling of CSS and JS resources.

Use Crawl-delay Wisely

The Crawl-delay directive throttles how frequently a bot makes requests. It's useful for reducing server load from aggressive crawlers — but there's a catch. Googlebot ignores Crawl-delay entirely. To control Google's crawl rate, use the crawl rate settings in Google Search Console. Bingbot, Yandex, and most other crawlers do respect Crawl-delay. A value of 1-10 seconds is typical; anything higher risks slowing crawl discovery significantly.

Test Before and After Deploying

Before uploading a new robots.txt, test it using Google Search Console's robots.txt Tester. Enter URLs you expect to be blocked and URLs you expect to be allowed, and verify the tool shows the correct result for each. After deploying, re-test to confirm the live file matches what you intended. A typo in a single line can inadvertently block your entire site.

Remember: Robots.txt Is Not Security

This cannot be emphasized enough. The Robots Exclusion Protocol is voluntary. Well-behaved crawlers follow it; malicious bots, scrapers, and security scanners ignore it completely. Your robots.txt file is publicly accessible — anyone can read it and see which paths you're trying to hide. Never rely on robots.txt to protect sensitive information. Use proper authentication, server-side access controls, and firewall rules instead.

Keep It Simple and Maintainable

A robots.txt file with 200 lines of rules is a maintenance nightmare. Group your rules logically: one block for all-bot rules, one for AI crawlers, one for specific search engine exceptions. Add comments (lines starting with #) to explain why each rule exists. When you revisit the file in six months, those comments will save you from accidentally breaking something.

Need a Full Technical SEO Audit?

Robots.txt is one piece of technical SEO. Our team audits crawl accessibility, site architecture, page speed, mobile usability, structured data, and more — then builds an action plan to fix what's holding your rankings back.

Get a Free SEO Audit Explore AI SEO Services

Common Robots.txt Mistakes to Avoid

These five mistakes cause more SEO damage than almost any other technical issue — because they're completely silent. No error messages. No warnings. Just pages that never rank.

Accidentally Blocking Your Entire Site

It only takes two lines: "User-agent: *" and "Disallow: /". During development or staging, this is standard practice. But deploying it to production is catastrophic. Your entire site disappears from search results within days. Always double-check your robots.txt after site migrations, CMS updates, and staging-to-production deployments.

Blocking CSS and JavaScript Files

In 2024+, Googlebot needs to render your page to understand it. Blocking CSS and JS prevents rendering, which means Google can't evaluate your layout, mobile experience, or content structure. The old practice of "Disallow: /wp-content/" or "Disallow: /*.js$" actively hurts your SEO. Remove these blocks immediately.

Confusing Disallow with Noindex

Disallow prevents crawling. Noindex prevents indexing. They're fundamentally different. If a page has inbound links from external sites, Google may index the URL (without a snippet) even if it's Disallowed — because Google discovers the URL through links, not crawling. To deindex a page reliably, use a noindex meta tag and allow crawling.

Forgetting Subdomain-Specific Robots.txt

Your robots.txt at example.com only applies to example.com. If you have blog.example.com, app.example.com, or docs.example.com, each subdomain needs its own robots.txt file. A missing file means no crawl restrictions — which may or may not be what you want. Audit every subdomain, not just your main domain.

Not Testing After Changes

A single typo — "Disalow" instead of "Disallow" — silently breaks the entire rule. Google ignores malformed lines without warning. After every edit, test your robots.txt using Google Search Console's robots.txt Tester. Enter critical URLs and verify they show the expected "Allowed" or "Blocked" status.

AI Crawler Reference Guide

These are the major AI crawlers you should know about. Decide your blocking policy for each one — and document it in your robots.txt.

GPTBot

OpenAI

Trains GPT models and powers ChatGPT web browsing features.

User-agent: GPTBot

ChatGPT-User

OpenAI

Real-time browsing agent when ChatGPT users ask it to visit URLs.

User-agent: ChatGPT-User

Google-Extended

Google

AI/Gemini training crawler. Separate from Googlebot (search indexing).

User-agent: Google-Extended

CCBot

Common Crawl

Non-profit web archive used by many AI companies as training data.

User-agent: CCBot

anthropic-ai

Anthropic

Collects data for training Claude AI models.

User-agent: anthropic-ai

Applebot-Extended

Apple

Apple Intelligence and Siri AI training data collection.

User-agent: Applebot-Extended

Understanding Robots.txt Syntax

Robots.txt syntax is deceptively simple — just four main directives — but the interactions between them trip up even experienced developers. Here's how it all works.

Every robots.txt file is composed of one or more rule groups. Each group starts with a User-agent line that specifies which crawler the rules apply to, followed by one or more Allow or Disallow directives. The wildcard * in a User-agent line means "all crawlers."

Precedence matters. When a URL matches both an Allow and a Disallow rule, most crawlers (including Googlebot) use the most specific match. A longer path wins over a shorter one. If specificity is tied, Allow wins over Disallow. This lets you write broad Disallow rules and then carve out exceptions with Allow.

For example, you might block all of /admin/ but allow /admin/public-page/. The more specific Allow for /admin/public-page/ overrides the broader Disallow for /admin/.

Wildcards extend your pattern-matching power. The asterisk (*) matches any sequence of characters. The dollar sign ($) anchors to the end of the URL. So Disallow: /*.pdf$ blocks URLs that end in .pdf, while Disallow: /search* blocks any URL path starting with /search.

Comments are lines starting with # and are ignored by crawlers. Use them generously — a well-commented robots.txt is a maintainable one.

Quick Syntax Reference

User-agent: * — applies to all bots. Disallow: /path/ — blocks the path. Allow: /path/exception/ — allows within a blocked parent. Sitemap: URL — points to your sitemap. Crawl-delay: N — seconds between requests (ignored by Googlebot). # — comment line.

Frequently Asked Questions

Everything you need to know about robots.txt files, crawler control, and managing bot access to your website.

A robots.txt file is a plain text file that lives at the root of your website (e.g., https://example.com/robots.txt). It follows the Robots Exclusion Protocol — an industry standard since 1994 — and gives instructions to web crawlers about which pages they're allowed to access. When a bot arrives at your site, robots.txt is the very first file it reads. Think of it as the bouncer at the door of your website.

Add User-agent directives for each AI crawler you want to block. For example: "User-agent: GPTBot" followed by "Disallow: /" blocks OpenAI's crawler entirely. Do the same for "Google-Extended" (Google's AI training bot), "CCBot" (Common Crawl), "anthropic-ai" (Anthropic/Claude), and "Applebot-Extended" (Apple Intelligence). This tool includes a pre-built AI blocker template that handles all of them at once.

Yes — but not in the way you might think. Robots.txt doesn't boost rankings, but it can absolutely tank them. If you accidentally block your important pages with a Disallow directive, search engines can't crawl them, which means they can't index them. No index = no rankings. On the positive side, a well-configured robots.txt helps search engines focus their crawl budget on your highest-value pages.

Crawl budget is the number of pages a search engine will crawl on your site within a given timeframe. For large sites (10,000+ pages), this is a real constraint. If Googlebot spends its budget crawling admin pages, print stylesheets, and duplicate filter URLs, your important content pages may not get crawled frequently enough. Robots.txt lets you block low-value URLs so crawlers spend their budget where it matters.

Crawl-delay tells bots to wait a specified number of seconds between consecutive requests. If you set "Crawl-delay: 10", compliant bots will wait 10 seconds between each page fetch. Important caveat: Googlebot ignores this directive entirely — you need to set crawl rate limits in Google Search Console instead. Bingbot, Yandex, and most other crawlers do respect it.

Absolutely. Adding a "Sitemap: https://example.com/sitemap.xml" line is one of the simplest and most effective things you can do. It ensures that every well-behaved crawler immediately knows where to find your complete page listing. You can list multiple sitemaps too — for example, separate sitemaps for pages, blog posts, and images. It takes five seconds and eliminates a common crawling blind spot.

Disallow (in robots.txt) prevents a crawler from even visiting a URL. Noindex (a meta tag on the page itself) tells a crawler that visited the page not to add it to the search index. Here's the critical gotcha: if you Disallow a page, the crawler never sees the noindex tag. So if a page has external links pointing to it, it might still appear in search results (with no snippet) even with Disallow. For reliable deindexing, use noindex and allow crawling.

Yes. Googlebot, Bingbot, and most modern crawlers support two wildcard characters. The asterisk (*) matches any sequence of characters — "Disallow: /search*" blocks all URLs starting with /search. The dollar sign ($) matches the end of a URL — "Disallow: /*.pdf$" blocks all PDF files regardless of directory. These patterns let you write concise rules instead of listing hundreds of individual URLs.

No — and this is critical to understand. Robots.txt is a voluntary protocol. Well-behaved bots (Googlebot, Bingbot) follow it. Malicious bots, scrapers, and security scanners ignore it completely. Never use robots.txt to hide sensitive information like login pages, admin panels with weak authentication, or private data. Use proper authentication, IP restrictions, and server-side access controls for real security.

Yes — 100% free, no sign-up, no usage limits. Generate robots.txt files for as many sites as you need. The tool includes pre-built templates, AI crawler blocking directives, and real-time preview. Everything runs in your browser, so your configuration data stays private.

What Is AI SEO?SaaS SEO SEO vs GEO

Need Help With Technical SEO?

Robots.txt is just one piece of the technical SEO puzzle. Our team can audit your crawl accessibility, site architecture, page speed, and indexing health — then build an action plan that drives measurable results.

Get a Free SEO Audit Explore AI SEO Services

Schema Generator OG Tag Checker Meta Tag Analyzer SEO Audit Tool

Robots.txt Generator

Configure Rules

robots.txt Output

How to deploy

Why You Need a Robots.txt File

Control Crawler Access

Block AI Crawlers

Manage Crawl Budget

Point to Your Sitemap

Reduce Server Load

Pre-Built Templates

Critical Warning: Disallow Is Not Noindex

How to Use This Robots.txt Generator

Start with a Template (or Blank)

Add User-Agent Rules

Configure Allow and Disallow Directives

Add Your Sitemap URL

Copy, Deploy, and Test

Robots.txt by the Numbers

Robots.txt Best Practices

Start Permissive, Then Restrict

Always Include Your Sitemap

Block Low-Value URL Patterns

Handle AI Crawlers Deliberately

Don't Block CSS and JavaScript

Use Crawl-delay Wisely

Test Before and After Deploying

Remember: Robots.txt Is Not Security

Keep It Simple and Maintainable

Related Reading

Need a Full Technical SEO Audit?

Common Robots.txt Mistakes to Avoid

Accidentally Blocking Your Entire Site

Blocking CSS and JavaScript Files

Confusing Disallow with Noindex

Forgetting Subdomain-Specific Robots.txt

Not Testing After Changes

AI Crawler Reference Guide

GPTBot

ChatGPT-User

Google-Extended

CCBot

anthropic-ai

Applebot-Extended

Understanding Robots.txt Syntax

Quick Syntax Reference

Frequently Asked Questions

Related

Need Help With Technical SEO?