About the Robots.txt Generator
The Robots.txt Generator is a visual builder for valid RFC 9309 robots.txt files. It ships with platform presets (WordPress, Next.js, Laravel, Shopify, Astro, generic static), per-user-agent Allow/Disallow rule blocks, a crawl-delay directive, a sitemap line, wildcard (`*`) and end-anchor (`$`) support, and a URL tester that simulates how Googlebot and Bingbot would evaluate any path against your draft file.
It is built for developers shipping a new site who need a sane robots.txt baseline, SEO consultants auditing client robots.txt files for the common ranking-destroying mistakes, agency teams maintaining robots.txt across many client domains, and self-hosters running static sites who want explicit AI-crawler controls (GPTBot, ClaudeBot, Google-Extended).
All rule building, validation, and URL testing run locally in JavaScript. The page makes no network call after first load — nothing is sent to a server, no URLs are fetched on your behalf, and the URL tester evaluates paths against the in-memory robots.txt object only.
Robots.txt is advisory: well-behaved crawlers honour it, malicious scrapers ignore it. Never rely on Disallow for access control or secrecy — URLs blocked here can still appear in search results from external links because Google does not visit them to discover the noindex tag. For true non-indexing, allow the crawl and add a meta robots noindex (or X-Robots-Tag) on the page itself. The other classic failure is blocking `/wp-content/`, `/assets/`, or any directory holding CSS or JS; Google needs those to render the page, and blocking them tanks rankings. The validator flags both anti-patterns before you copy the output.
What Is robots.txt and How Does It Work?
Every website can include a plain-text file at /robots.txt that tells search
engine crawlers and other automated bots which parts of the site they may or may not access.
When Googlebot, Bingbot, or any well-behaved crawler arrives at your domain, the first thing
it requests is https://yourdomain.com/robots.txt. If the file exists, the crawler
reads it line by line and obeys the directives inside — skipping paths you have marked
as Disallow and freely crawling anything you have explicitly or implicitly
allowed. It is important to understand that robots.txt is advisory, not enforceable; malicious
scrapers can ignore it entirely. However, every major search engine and most legitimate bots
honour the protocol. The file must be placed at the root of your domain (not a subdirectory)
and must be served as plain text with a 200 status code. If the server returns a 404 for
robots.txt, crawlers assume everything is fair game.
robots.txt Syntax Reference Guide
The format is deceptively simple but has important nuances that trip up even experienced
developers. Every block starts with a User-agent line that names the crawler
the rules apply to — use * for all bots or a specific name like
Googlebot for targeted rules. Below the User-agent line you place one or more
Disallow directives (paths the bot may not visit) and optional
Allow directives (paths within a disallowed tree that should remain
accessible). Paths are case-sensitive and use prefix matching: Disallow: /admin
blocks /admin, /admin/, and /admin/settings/users
alike. Two wildcard characters are supported in the Google and Bing implementations:
* matches any sequence of characters and $ anchors the match to
the end of the URL. For example, Disallow: /*.pdf$ blocks all URLs ending in
.pdf anywhere on the site. The Sitemap directive tells crawlers
where your XML sitemap lives and can appear anywhere in the file. Google ignores the
Crawl-delay directive, but Bing, Yandex, and several other engines respect
it as a polite request to pause between requests.
Common robots.txt Mistakes That Hurt SEO
The single most damaging mistake is accidentally blocking CSS and JavaScript resources that
Google needs to render your page. If Googlebot cannot load your stylesheets and scripts, it
cannot evaluate mobile-friendliness or understand dynamic content — and your rankings
suffer silently because Google Search Console may not surface the issue prominently. A close
second is the bare Disallow: / with no further Allow directives, which blocks
the entire site. This happens more often than you would expect, especially on staging sites
that go live without removing the development robots.txt. Other frequent issues include:
- Conflicting rules — having both
Allow: /blogandDisallow: /blogunder the same User-agent creates ambiguity. Google resolves conflicts by favouring the most specific path, but other crawlers may not. - Missing trailing slash —
Disallow: /adminalso blocks/administratorbecause robots.txt uses prefix matching, not exact matching. - No sitemap declaration — while not strictly required, omitting the Sitemap directive forces crawlers to discover your sitemap through other channels, which can delay indexing of new content.
- Blocking query-parameter pages —
Disallow: /?blocks all URLs with query strings, which can inadvertently hide paginated content, search results, or filtered product pages that you actually want indexed.
robots.txt Examples for WordPress, Shopify, and Custom Sites
A well-configured WordPress robots.txt blocks /wp-admin/ (the dashboard) while
explicitly allowing /wp-admin/admin-ajax.php (required by many plugins for
front-end functionality). It disallows /wp-includes/ to prevent crawlers from
indexing raw PHP templates, and it may block /author/ archives if thin-content
author pages do not add SEO value. Crucially, it does not block
/wp-content/uploads/ (your media), /wp-content/themes/ (your CSS),
or /wp-content/plugins/ (your scripts). Shopify stores have a platform-generated
robots.txt that blocks admin paths, checkout URLs, cart pages, and internal search. You can
customise it via the robots.txt.liquid template if you need additional rules.
For custom-built sites — whether Next.js, Laravel, Rails, or static — the
principle is the same: block administrative, internal, and duplicate-content paths while
leaving all user-facing content, assets, and sitemaps accessible. A Next.js app typically
blocks /_next/static/ build hashes from being indexed as content pages while
allowing everything else, and a Laravel project blocks /storage/,
/vendor/, and /nova/ (if using Nova).
How to Test Your robots.txt File
Before deploying a new robots.txt, always test it. Google Search Console has a
robots.txt Tester tool under Crawl → robots.txt Tester that lets you
enter any URL on your site and see whether the current robots.txt allows or blocks it for
Googlebot. The tool on this page provides similar functionality without leaving the browser
— enter a path like /admin/settings, choose a user-agent, and instantly
see the verdict. Beyond testing individual URLs, review the validation report for systemic
issues: conflicting rules, blocked assets, or a missing sitemap. Once deployed, monitor
Google Search Console’s Coverage report for “Blocked by robots.txt” errors
— this is the fastest way to catch rules that accidentally exclude important pages.
Remember that changes to robots.txt can take hours or even days to propagate to Google;
the file is cached on their end. If you need an urgent recrawl, use the URL Inspection tool
to request indexing of specific pages, or submit an updated sitemap.
Looking for more developer tools? Explore all Dev & Tech tools on EvvyTools.
Frequently Asked Questions
What is the difference between Disallow and noindex?
Disallow in robots.txt prevents crawling but does not prevent indexing, so blocked URLs can still appear in search results from external links. To keep a page out of the index, allow crawling and use a meta robots noindex tag or X-Robots-Tag header. Google retired support for noindex in robots.txt in 2019.
Should I block CSS and JavaScript in robots.txt?
No. Google needs to render pages with their CSS and JS to assess content and mobile usability. Blocking /wp-content/, /assets/, or /static/ is one of the most common ranking-destroying mistakes. The robots.txt Specification (RFC 9309) assumes crawlers can fetch resources required for rendering.
Can I use wildcards in robots.txt?
Google and Bing support * for any characters and $ to anchor at the end of a URL, though these are extensions to the original spec. For example, Disallow: /*.pdf$ blocks only URLs ending in .pdf. Smaller crawlers may not support wildcards.
What happens if robots.txt returns a 500 error?
Googlebot treats 5xx responses as a signal to pause crawling the site entirely, because it cannot confirm what is allowed. Persistent 5xx for more than 30 days causes Google to fall back to the last cached robots.txt and eventually ignore it. Always monitor the file's availability.
Is robots.txt enforceable?
No. It is advisory, defined in RFC 9309. Respectful crawlers like Googlebot and Bingbot honour it, but malicious scrapers ignore it entirely. For genuine access control, use authentication, IP allowlists, or rate limiting at the server level.