mage2kishan / module-robots-seo
Panth Robots SEO — dedicated robots.txt, X-Robots-Tag, and LLM-bot (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, etc.) policy control for Magento 2. Extracted from Panth_AdvancedSEO. Self-contained: emits a per-store robots.txt, adds X-Robots-Tag response headers, validates robots meta directi
Package info
github.com/mage2sk/module-robots-seo
Type:magento2-module
pkg:composer/mage2kishan/module-robots-seo
Requires
- php: ~8.1.0||~8.2.0||~8.3.0||~8.4.0
- mage2kishan/module-core: ^1.0
- magento/framework: ^103.0
- magento/module-backend: ^102.0
- magento/module-catalog: ^104.0
- magento/module-cms: ^104.0
- magento/module-config: ^101.2
- magento/module-store: ^101.1
- magento/module-ui: ^101.2
Suggests
- hyva-themes/magento2-default-theme: For Hyva frontend compatibility (no jQuery, TailwindCSS, AlpineJS).
README
Magento 2 Robots SEO Extension: robots.txt, X-Robots-Tag and LLM Bot Policy Control (Hyva + Luma)
Full robots and crawler policy control for Magento 2. Panth Robots SEO takes over
/robots.txtat the router layer, emits anX-Robots-TagHTTP header on every frontend response, and gives you one-click toggles for 14 AI and LLM crawlers including GPTBot, ClaudeBot, PerplexityBot, and Google-Extended. Every directive passes a CRLF-safe validator before it reaches the wire. Works identically on Hyva and Luma.
Product page: kishansavaliya.com/magento-2-robots-seo.html
Quick Answer
What is Panth Robots SEO? It is a Magento 2 extension that replaces Magento's limited robots handling with a dedicated controller for /robots.txt, a per-response X-Robots-Tag HTTP header, and an admin UI for managing per-user-agent path policies and AI crawler access.
What does it add to my store?
- A dynamic
/robots.txtper store view built from LLM bot toggles, admin policy rows, crawl-delay, and sitemap references. No static file is ever read from disk. - 14 LLM and AI crawler toggles for GPTBot, ClaudeBot, ChatGPT-User, PerplexityBot, Google-Extended, Cohere-AI, Bytespider, and more. One click blocks or allows each bot.
- An
X-Robots-TagHTTP response header on every frontend HTML page, with automatic noindex for error pages, private paths, layered-nav filters, and search result pages. - An admin CRUD grid for per-user-agent, per-path, per-store-view allow/disallow rows, plus a live robots.txt Preview page so you can verify output before it goes public.
Which themes are supported? Both Hyva and Luma. The module works at the controller and plugin layer, so no theme-specific template is needed.
What does it need? Magento 2.4.4 to 2.4.8, PHP 8.1 to 8.4, and the free mage2kishan/module-core package.
Need Custom Magento 2 Development?
Get a free quote for your project in 24 hours for custom modules, Hyva themes, performance work, M1 to M2 migrations, and Adobe Commerce Cloud.
Visit our website: kishansavaliya.com | Get a quote: kishansavaliya.com/get-quote
Table of Contents
- Who Is It For
- Key Features
- Screenshots
- Compatibility
- Installation
- Configuration
- Supported LLM Bots
- How It Works
- FAQ
- Support
- About Panth Infotech
- Quick Links
Who Is It For
- Stores worried about AI training scrapers that want to block GPTBot, Bytespider, CCBot, or other data-collection bots in one click rather than hand-editing a file on every deploy.
- SEO-conscious merchants who need layered-nav pages, search result pages, and customer account paths excluded from indexing through proper HTTP headers, not just a meta tag.
- Multi-store setups where each store view needs its own robots.txt body, noindex path list, and LLM bot policy.
- Stores upgrading from Panth_AdvancedSEO that want robots handling as a standalone, self-contained module without pulling in the full SEO suite.
- Developers who need a structured, schema-backed policy grid instead of a single admin textarea with no validation.
Key Features
Dynamic robots.txt Per Store View
- Router-level controller takes over
/robots.txtso the response is built from live config every time. No static file is ever served. - LLM and AI bot blocks are written as
User-agent: <bot>\nDisallow: /sections when their toggle is set to No. - Admin policy rows from the
panth_seo_robots_policytable are merged under the matching user-agent block. - Crawl-delay, Sitemap, and Host lines are appended automatically from config.
- Custom body override lets you paste your own robots.txt verbatim and skip the generation pipeline entirely.
X-Robots-Tag HTTP Response Header
- Added to every frontend HTML response by
Plugin\Response\XRobotsTagPluginbefore the response is sent. - Automatic noindex for error pages (404, 410, 500, 503), non-HTML assets (.pdf, .doc, .xls), layered-nav filter pages, and search result pages.
- Configurable noindex path list with wildcard
*support for private paths like/customer/*,/checkout,/wishlist, and more. - max-image-preview and max-snippet are appended to every header value, including the
largesetting recommended for Google Discover.
14 LLM and AI Crawler Toggles
- One Yes/No toggle per bot in the LLM Bot Policy config group. Turning a bot to No writes a
Disallow: /block for that user-agent in robots.txt. - Bots covered: GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot (covers Claude-Web too), Anthropic-AI, Google-Extended (covers GoogleOther), PerplexityBot, Cohere-AI, CCBot, Bytespider, Amazonbot, Applebot-Extended, FacebookBot, Meta-ExternalAgent.
- CCBot and Bytespider are blocked by default because they feed large-scale training pipelines and are known to ignore partial disallows.
Admin Policy Grid and robots.txt Preview
panth_seo_robots_policytable stores per-user-agent, per-path, per-store-view allow/disallow rows with a priority column.- Full CRUD grid at Admin - Panth Infotech - Robots & LLM Bots - Robots Policies with mass enable, disable, and delete actions.
- robots.txt Preview page renders the live output for any store view so you can check the result before it goes public.
- Store-view scope on every row and config value so each store can have its own policy.
Security Built In
- CRLF-injection-safe validator runs on every directive string before it reaches a response header or the robots.txt body.
\r,\n, and\0are rejected outright. - User-agent and path validation on every policy save. UAs must match
/^[A-Za-z0-9._\-+*\/ ]+$/; paths must start with/and contain no control bytes. - ACL on every admin controller. All routes require a valid admin session and declare their own
ADMIN_RESOURCE. - XSS-safe Preview page renders the robots.txt body through
escapeHtml()so a hostile custom body cannot execute script in the admin browser.
Built to Last
- Constructor DI only across all classes. No ObjectManager calls.
- Full Page Cache friendly. The robots.txt controller and the X-Robots-Tag plugin do not break Varnish or Fastly.
- Translation ready. All admin labels use Magento's
__()function. - Zero data loss on upgrade from Panth_AdvancedSEO. The
panth_seo_robots_policytable name is preserved and the schema shapes match exactly.
Screenshots
Live Walkthrough
End-to-end admin flow: enable the module, toggle a few LLM bots, add a policy row, preview the generated robots.txt, curl /robots.txt on both Hyva and Luma, and confirm the X-Robots-Tag header on a customer account page.
Admin Configuration
Global configuration: toggle the module, set the default meta robots value, configure layered-nav and catalogsearch noindex, edit the noindex path list, and set max-image-preview, max-snippet, and Crawl-delay.
Robots Policies Grid
One row per user-agent, path, directive, and store view combination. Filter by store, mass-enable, disable or delete, and set priority so the evaluator knows which rule wins when two patterns overlap.
Policy Edit Form
Pick a user-agent (* for the default block, or GPTBot, ClaudeBot, a custom crawler), pick allow or disallow, enter a path, scope to a store view, and set priority and active flag.
robots.txt Preview
Dedicated Panth Infotech - Robots & LLM Bots - robots.txt Preview page renders the live body for the selected store view, exactly as the frontend controller will serve it.
Compatibility
| Requirement | Versions Supported |
|---|---|
| Magento Open Source | 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8 |
| Adobe Commerce | 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8 |
| Adobe Commerce Cloud | 2.4.4 to 2.4.8 |
| PHP | 8.1.x, 8.2.x, 8.3.x, 8.4.x |
| Hyva Theme | 1.0+ (fully compatible) |
| Luma Theme | Native support |
| Required Dependency | mage2kishan/module-core (free) |
Installation
Composer Installation (Recommended)
composer require mage2kishan/module-robots-seo bin/magento module:enable Panth_Core Panth_RobotsSeo bin/magento setup:upgrade bin/magento setup:di:compile bin/magento setup:static-content:deploy -f bin/magento cache:flush
Manual Installation via ZIP
- Download the latest release from Packagist or from the product page.
- Extract it to
app/code/Panth/RobotsSeo/in your Magento install. - Make sure
Panth_Coreis installed too (required dependency). - Run the commands above starting from
bin/magento module:enable.
Verify Installation
bin/magento module:status Panth_RobotsSeo # Expected: Module is enabled curl -s -o /dev/null -w '%{http_code}\n' https://your-store.test/robots.txt # 200 curl -sI https://your-store.test/customer/account/login | grep -i x-robots-tag # X-Robots-Tag: noindex, nofollow, max-image-preview:large, max-snippet:-1
After install, open:
Admin -> Panth Infotech -> Robots & LLM Bots
Configuration
Go to Stores -> Configuration -> Panth Infotech -> Robots & LLM Bots.
General
| Setting | Group | Default | Description |
|---|---|---|---|
| Enable Module | General | Yes | Master switch. When No, the X-Robots-Tag plugin is a no-op and /robots.txt serves a stock User-agent: *\nAllow: /. |
| Debug Logging | General | No | When Yes, every header and meta decision is written to var/log/panth_robots_seo.log. |
| Default Meta Robots | General | index,follow |
Baseline directive applied when no per-entity or per-path override fires. |
| Noindex Layered-Nav Filtered Pages | General | Yes | Emit noindex, follow when a catalog listing has layered-nav or sort/limit/page query parameters. |
| Noindex Search Result Pages | General | Yes | Emit noindex, follow on /catalogsearch/result/* pages. |
| Noindex URL Paths (one per line) | General | 18-line seeded list | Private path patterns that must emit noindex, nofollow. Wildcards supported: * matches anything. |
| max-image-preview Directive | General | large |
Appended to every X-Robots-Tag. large is recommended for Google Discover eligibility. |
| max-snippet Directive | General | -1 |
-1 = unlimited. A positive integer caps the SERP snippet length. |
| Crawl-delay (seconds) | General | 0 |
Emitted under User-agent: * in robots.txt. 0 omits the directive. |
LLM Bot Policy
| Setting | Group | Default | Description |
|---|---|---|---|
| Allow GPTBot (OpenAI) | LLM Bot Policy | Yes | No = emits User-agent: GPTBot\nDisallow: /. |
| Allow ClaudeBot (Anthropic) | LLM Bot Policy | Yes | Covers both ClaudeBot and Claude-Web. |
| Allow Google-Extended | LLM Bot Policy | Yes | Covers both Google-Extended and GoogleOther. |
| Allow CCBot (Common Crawl) | LLM Bot Policy | No | Blocked by default. CCBot feeds dataset-scale training pipelines. |
| Allow PerplexityBot | LLM Bot Policy | Yes | |
| Allow Bytespider (ByteDance) | LLM Bot Policy | No | Blocked by default. Bytespider is known to ignore partial disallows. |
| Allow ChatGPT-User | LLM Bot Policy | Yes | |
| Allow OAI-SearchBot | LLM Bot Policy | Yes | |
| Allow Anthropic-AI | LLM Bot Policy | Yes | |
| Allow Cohere-AI | LLM Bot Policy | Yes | |
| Allow Amazonbot | LLM Bot Policy | Yes | |
| Allow Applebot-Extended | LLM Bot Policy | Yes | |
| Allow Facebookbot | LLM Bot Policy | Yes | |
| Allow Meta-ExternalAgent | LLM Bot Policy | Yes |
robots.txt Override
| Setting | Group | Default | Description |
|---|---|---|---|
| Use Custom robots.txt Body | robots.txt Override | No | When Yes, the custom body REPLACES the generated output. All LLM toggles and policy rows are ignored. |
| Custom robots.txt Body | robots.txt Override | (empty) | Pasted verbatim into the response. CRLF is normalised to LF. Leave empty to use the generated output. |
Every setting resolves at store-view scope, so each store can have a different LLM policy, noindex path list, or custom body.
Supported LLM Bots
| Bot | UA string(s) | Default |
|---|---|---|
| GPTBot (OpenAI) | GPTBot |
Allow |
| ChatGPT-User | ChatGPT-User |
Allow |
| OAI-SearchBot | OAI-SearchBot |
Allow |
| ClaudeBot (Anthropic) | ClaudeBot, Claude-Web |
Allow |
| Anthropic-AI | anthropic-ai |
Allow |
| Google-Extended | Google-Extended, GoogleOther |
Allow |
| PerplexityBot | PerplexityBot |
Allow |
| Cohere-AI | cohere-ai |
Allow |
| CCBot (Common Crawl) | CCBot |
Disallow |
| Bytespider (ByteDance) | Bytespider |
Disallow |
| Amazonbot | Amazonbot |
Allow |
| Applebot-Extended | Applebot-Extended |
Allow |
| FacebookBot | FacebookBot |
Allow |
| Meta-ExternalAgent | meta-externalagent |
Allow |
Bots not listed here (YouBot, PetalBot, Diffbot, AI2Bot, etc.) are not blocked by default. To block them, add a Disallow: / row in the Robots Policies grid with the UA as the user-agent string.
How It Works
Controller\Robots\IndexservesGET /robots.txtwith the generated or override body atContent-Type: text/plain; charset=utf-8. The core Magento robots router is disabled inetc/frontend/di.xmlso this controller always wins.Setup\Patch\Data\InstallRobotsTxtRewritewrites theurl_rewriterow that maps/robots.txtto the module controller at install time.RefreshRobotsTxtRewritere-points stale rows left behind byPanth_AdvancedSEO, so upgrades need no manual DB work.Plugin\Response\XRobotsTagPluginrunsbeforeSendResponseonMagento\Framework\App\Response\Http. It reads the request path, HTTP status code, and Content-Type, then setsX-Robots-Tagonce per response using this precedence order:- Self-skip on
/robots.txt. - Error-code override (404, 410, 500, 503) to
noindex, nofollow. - Non-HTML asset override (.pdf, .doc, .xls, .xlsx) to
noindex, nofollow. - Catalogsearch noindex when
noindex_search_results = Yes. - Configured
noindex_pathsmatch viaService\NoindexPathMatcher. - Layered-nav or sort-filter parameters to
noindex, follow. - Default directive from
panth_robots_seo/general/default_directive.
- Self-skip on
Model\Robots\PolicyResolveraggregates LLM-bot toggles, rows frompanth_seo_robots_policy, the configured crawl-delay, and sitemap references into the final robots.txt body for a given store.Service\DirectiveValidatoris the single chokepoint every directive string passes through before it reaches a response header or the robots.txt body. It rejects any string containing\r,\n,\0, or bytes outside printable ASCII.
FAQ
Does it work on Hyva themes?
Yes. The module works at the controller and plugin layer, not through layout or template. Both Hyva and Luma stores get the same robots.txt output and X-Robots-Tag header with no extra configuration.
Magento already has a robots.txt textarea in Content -> Design -> Configuration. Why replace it?
Magento's built-in option is a single store-wide textarea with no header control, no LLM bot awareness, and no path-level noindex logic. Panth Robots SEO adds per-store-view generation, 14 AI crawler toggles, a structured policy grid, and the X-Robots-Tag header that Magento does not provide at all.
Will it affect my existing robots.txt content?
When you first install the module, the generated robots.txt starts from your LLM bot toggles and any policy rows you create. If you want to keep your existing content exactly, paste it into the Custom robots.txt Body field and enable the override.
Can I block just one path for a specific bot, not the whole site?
Yes. Open Admin - Panth Infotech - Robots & LLM Bots - Robots Policies and add a row with the user-agent, disallow, and the path you want blocked. You can scope the row to a specific store view and set a priority.
Does it set the HTML <meta name="robots"> tag too?
The module sets the X-Robots-Tag HTTP response header on every page, which search engines treat as equivalent to the meta tag. The HTML meta tag is updated too if Panth_AdvancedSEO is installed alongside this module. Without AdvancedSEO, only the HTTP header is set.
Is the noindex path list configurable?
Yes. Go to Stores - Configuration - Panth Infotech - Robots & LLM Bots - General - Noindex URL Paths. Enter one path per line. The * wildcard matches anything. The seeded default covers /customer/*, /checkout, /wishlist, /sales/*, and about 14 other private patterns.
What happens on a 404 page?
The X-Robots-Tag plugin hard-overrides to noindex, nofollow for HTTP status codes 404, 410, 500, and 503, regardless of any other config. Error pages can never appear in the index.
Does it need Panth_AdvancedSEO?
No. The module is fully standalone. If AdvancedSEO is also installed, they share the panth_seo_robots_policy table and do not conflict with each other.
Is it multi-store safe?
Yes. Every config value, every policy row, and every X-Robots-Tag decision resolves at store-view scope. A setting on one store view never affects another.
Support
| Channel | Contact |
|---|---|
| Product Page | kishansavaliya.com/magento-2-robots-seo.html |
| kishansavaliyakb@gmail.com | |
| Website | kishansavaliya.com |
| +91 84012 70422 | |
| GitHub Issues | github.com/mage2sk/module-robots-seo/issues |
| Upwork (Top Rated Plus) | Hire Kishan Savaliya |
| Upwork Agency | Panth Infotech |
Response time: 1-2 business days.
Need Custom Magento Development?
Looking for custom Magento module development, Hyva theme work, store migrations, or performance tuning? Get a free quote in 24 hours:
About Panth Infotech
Built and maintained by Kishan Savaliya (kishansavaliya.com), a Top Rated Plus Magento developer on Upwork with 10+ years of eCommerce experience.
Panth Infotech is a Magento 2 development agency that builds high quality, security focused extensions and themes for both Hyva and Luma storefronts. The extension suite covers SEO, performance, checkout, product presentation, customer engagement, and store management, with each module built to MEQP standards and tested across Magento 2.4.4 to 2.4.8.
Browse the full extension catalog on our Magento extensions page or on Packagist.
Quick Links
| Resource | Link |
|---|---|
| Product Page | magento-2-robots-seo.html |
| Packagist | mage2kishan/module-robots-seo |
| GitHub | mage2sk/module-robots-seo |
| Website | kishansavaliya.com |
| Free Quote | kishansavaliya.com/get-quote |
| Upwork (Top Rated Plus) | Hire Kishan Savaliya |
| Upwork Agency | Panth Infotech |
| kishansavaliyakb@gmail.com | |
| +91 84012 70422 |
Ready to take control of how bots and crawlers see your store?
SEO Keywords: magento 2 robots.txt, magento 2 robots seo, magento 2 x-robots-tag, magento 2 llm bot policy, magento 2 ai crawler control, magento 2 block gptbot, magento 2 block claudebot, magento 2 block perplexitybot, magento 2 block bytespider, magento 2 google-extended, magento 2 noindex, magento 2 noindex layered nav, magento 2 noindex search results, magento 2 crawl delay, magento 2 robots meta, magento 2 seo headers, hyva robots seo, luma robots seo, magento 2 robots extension, magento 2 robots module, mage2kishan robots seo, panth robots seo, panth infotech, hire magento developer, top rated plus upwork, kishan savaliya magento, custom magento development, magento 2.4.8 robots, php 8.4 magento seo




