Skip to main content

Robots.txt Generator

Build and validate robots.txt files visually

EVT·T153
RFC 9309 Builder

About the Robots.txt Generator

The Robots.txt Generator is a visual builder for valid RFC 9309 robots.txt files. It ships with platform presets (WordPress, Next.js, Laravel, Shopify, Astro, generic static), per-user-agent Allow/Disallow rule blocks, a crawl-delay directive, a sitemap line, wildcard (`*`) and end-anchor (`$`) support, and a URL tester that simulates how Googlebot and Bingbot would evaluate any path against your draft file.

It is built for developers shipping a new site who need a sane robots.txt baseline, SEO consultants auditing client robots.txt files for the common ranking-destroying mistakes, agency teams maintaining robots.txt across many client domains, and self-hosters running static sites who want explicit AI-crawler controls (GPTBot, ClaudeBot, Google-Extended).

All rule building, validation, and URL testing run locally in JavaScript. The page makes no network call after first load — nothing is sent to a server, no URLs are fetched on your behalf, and the URL tester evaluates paths against the in-memory robots.txt object only.

Robots.txt is advisory: well-behaved crawlers honour it, malicious scrapers ignore it. Never rely on Disallow for access control or secrecy — URLs blocked here can still appear in search results from external links because Google does not visit them to discover the noindex tag. For true non-indexing, allow the crawl and add a meta robots noindex (or X-Robots-Tag) on the page itself. The other classic failure is blocking `/wp-content/`, `/assets/`, or any directory holding CSS or JS; Google needs those to render the page, and blocking them tanks rankings. The validator flags both anti-patterns before you copy the output.

Privacy100% client-side · no rules or URLs transmitted
StandardRFC 9309 with Googlebot/Bingbot wildcard extensions
Last reviewed2026-05-14 by Dennis Traina
Optional but strongly recommended for SEO. Enter the full URL to your XML sitemap.
Optional. Google ignores Crawl-delay, but Bing and Yandex respect it. Use only if your server struggles under heavy crawling.
Generated robots.txt
Rules
0
User-Agents
0
File Size
0 B
Validation Report
    /
    Enter a path to test
    Define different rules per crawler. Perfect for blocking AI crawlers (GPTBot, CCBot) while allowing search engines full access.
    Multi-bot rule builder requires subscription
    Enter up to 10 URL paths from your site to see which are crawlable and which are blocked under your current rules.
    /
    Crawl simulator requires subscription
    Paste your current robots.txt below to edit it visually, validate it, and catch mistakes.
    Import & edit mode requires subscription
    Save requires subscription
    137 Foundry — custom app building studio

    What Is robots.txt and How Does It Work?

    Every website can include a plain-text file at /robots.txt that tells search engine crawlers and other automated bots which parts of the site they may or may not access. When Googlebot, Bingbot, or any well-behaved crawler arrives at your domain, the first thing it requests is https://yourdomain.com/robots.txt. If the file exists, the crawler reads it line by line and obeys the directives inside — skipping paths you have marked as Disallow and freely crawling anything you have explicitly or implicitly allowed. It is important to understand that robots.txt is advisory, not enforceable; malicious scrapers can ignore it entirely. However, every major search engine and most legitimate bots honour the protocol. The file must be placed at the root of your domain (not a subdirectory) and must be served as plain text with a 200 status code. If the server returns a 404 for robots.txt, crawlers assume everything is fair game.

    robots.txt Syntax Reference Guide

    The format is deceptively simple but has important nuances that trip up even experienced developers. Every block starts with a User-agent line that names the crawler the rules apply to — use * for all bots or a specific name like Googlebot for targeted rules. Below the User-agent line you place one or more Disallow directives (paths the bot may not visit) and optional Allow directives (paths within a disallowed tree that should remain accessible). Paths are case-sensitive and use prefix matching: Disallow: /admin blocks /admin, /admin/, and /admin/settings/users alike. Two wildcard characters are supported in the Google and Bing implementations: * matches any sequence of characters and $ anchors the match to the end of the URL. For example, Disallow: /*.pdf$ blocks all URLs ending in .pdf anywhere on the site. The Sitemap directive tells crawlers where your XML sitemap lives and can appear anywhere in the file. Google ignores the Crawl-delay directive, but Bing, Yandex, and several other engines respect it as a polite request to pause between requests.

    Common robots.txt Mistakes That Hurt SEO

    The single most damaging mistake is accidentally blocking CSS and JavaScript resources that Google needs to render your page. If Googlebot cannot load your stylesheets and scripts, it cannot evaluate mobile-friendliness or understand dynamic content — and your rankings suffer silently because Google Search Console may not surface the issue prominently. A close second is the bare Disallow: / with no further Allow directives, which blocks the entire site. This happens more often than you would expect, especially on staging sites that go live without removing the development robots.txt. Other frequent issues include:

    • Conflicting rules — having both Allow: /blog and Disallow: /blog under the same User-agent creates ambiguity. Google resolves conflicts by favouring the most specific path, but other crawlers may not.
    • Missing trailing slashDisallow: /admin also blocks /administrator because robots.txt uses prefix matching, not exact matching.
    • No sitemap declaration — while not strictly required, omitting the Sitemap directive forces crawlers to discover your sitemap through other channels, which can delay indexing of new content.
    • Blocking query-parameter pagesDisallow: /? blocks all URLs with query strings, which can inadvertently hide paginated content, search results, or filtered product pages that you actually want indexed.

    robots.txt Examples for WordPress, Shopify, and Custom Sites

    A well-configured WordPress robots.txt blocks /wp-admin/ (the dashboard) while explicitly allowing /wp-admin/admin-ajax.php (required by many plugins for front-end functionality). It disallows /wp-includes/ to prevent crawlers from indexing raw PHP templates, and it may block /author/ archives if thin-content author pages do not add SEO value. Crucially, it does not block /wp-content/uploads/ (your media), /wp-content/themes/ (your CSS), or /wp-content/plugins/ (your scripts). Shopify stores have a platform-generated robots.txt that blocks admin paths, checkout URLs, cart pages, and internal search. You can customise it via the robots.txt.liquid template if you need additional rules. For custom-built sites — whether Next.js, Laravel, Rails, or static — the principle is the same: block administrative, internal, and duplicate-content paths while leaving all user-facing content, assets, and sitemaps accessible. A Next.js app typically blocks /_next/static/ build hashes from being indexed as content pages while allowing everything else, and a Laravel project blocks /storage/, /vendor/, and /nova/ (if using Nova).

    How to Test Your robots.txt File

    Before deploying a new robots.txt, always test it. Google Search Console has a robots.txt Tester tool under Crawl → robots.txt Tester that lets you enter any URL on your site and see whether the current robots.txt allows or blocks it for Googlebot. The tool on this page provides similar functionality without leaving the browser — enter a path like /admin/settings, choose a user-agent, and instantly see the verdict. Beyond testing individual URLs, review the validation report for systemic issues: conflicting rules, blocked assets, or a missing sitemap. Once deployed, monitor Google Search Console’s Coverage report for “Blocked by robots.txt” errors — this is the fastest way to catch rules that accidentally exclude important pages. Remember that changes to robots.txt can take hours or even days to propagate to Google; the file is cached on their end. If you need an urgent recrawl, use the URL Inspection tool to request indexing of specific pages, or submit an updated sitemap.

    Looking for more developer tools? Explore all Dev & Tech tools on EvvyTools.

    Frequently Asked Questions

    What is the difference between Disallow and noindex?

    Disallow in robots.txt prevents crawling but does not prevent indexing, so blocked URLs can still appear in search results from external links. To keep a page out of the index, allow crawling and use a meta robots noindex tag or X-Robots-Tag header. Google retired support for noindex in robots.txt in 2019.

    Should I block CSS and JavaScript in robots.txt?

    No. Google needs to render pages with their CSS and JS to assess content and mobile usability. Blocking /wp-content/, /assets/, or /static/ is one of the most common ranking-destroying mistakes. The robots.txt Specification (RFC 9309) assumes crawlers can fetch resources required for rendering.

    Can I use wildcards in robots.txt?

    Google and Bing support * for any characters and $ to anchor at the end of a URL, though these are extensions to the original spec. For example, Disallow: /*.pdf$ blocks only URLs ending in .pdf. Smaller crawlers may not support wildcards.

    What happens if robots.txt returns a 500 error?

    Googlebot treats 5xx responses as a signal to pause crawling the site entirely, because it cannot confirm what is allowed. Persistent 5xx for more than 30 days causes Google to fall back to the last cached robots.txt and eventually ignore it. Always monitor the file's availability.

    Is robots.txt enforceable?

    No. It is advisory, defined in RFC 9309. Respectful crawlers like Googlebot and Bingbot honour it, but malicious scrapers ignore it entirely. For genuine access control, use authentication, IP allowlists, or rate limiting at the server level.

    Honey-Do Tracker — home maintenance for landlords and property managers
    137 Foundry — custom app building studio
    137 Foundry — custom app building studio
    Link copied to clipboard!