mage2kishan/module-robots-seo

Panth Robots SEO — dedicated robots.txt, X-Robots-Tag, and LLM-bot (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, etc.) policy control for Magento 2. Extracted from Panth_AdvancedSEO. Self-contained: emits a per-store robots.txt, adds X-Robots-Tag response headers, validates robots meta directi

Maintainers

Package info

github.com/mage2sk/module-robots-seo

Homepage

Type:magento2-module

pkg:composer/mage2kishan/module-robots-seo

Statistics

Installs: 24

Dependents: 0

Suggesters: 1

Stars: 0

Open Issues: 0

1.0.8 2026-06-18 09:43 UTC

This package is auto-updated.

Last update: 2026-06-18 09:44:13 UTC


README

Magento 2 Robots SEO Extension: robots.txt, X-Robots-Tag and LLM Bot Policy Control (Hyva + Luma)

Magento 2.4.4 - 2.4.8 PHP 8.1 - 8.4 Hyva + Luma Live Demo & Details Packagist Upwork Top Rated Plus Website

Full robots and crawler policy control for Magento 2. Panth Robots SEO takes over /robots.txt at the router layer, emits an X-Robots-Tag HTTP header on every frontend response, and gives you one-click toggles for 14 AI and LLM crawlers including GPTBot, ClaudeBot, PerplexityBot, and Google-Extended. Every directive passes a CRLF-safe validator before it reaches the wire. Works identically on Hyva and Luma.

Product page: kishansavaliya.com/magento-2-robots-seo.html

Quick Answer

What is Panth Robots SEO? It is a Magento 2 extension that replaces Magento's limited robots handling with a dedicated controller for /robots.txt, a per-response X-Robots-Tag HTTP header, and an admin UI for managing per-user-agent path policies and AI crawler access.

What does it add to my store?

  • A dynamic /robots.txt per store view built from LLM bot toggles, admin policy rows, crawl-delay, and sitemap references. No static file is ever read from disk.
  • 14 LLM and AI crawler toggles for GPTBot, ClaudeBot, ChatGPT-User, PerplexityBot, Google-Extended, Cohere-AI, Bytespider, and more. One click blocks or allows each bot.
  • An X-Robots-Tag HTTP response header on every frontend HTML page, with automatic noindex for error pages, private paths, layered-nav filters, and search result pages.
  • An admin CRUD grid for per-user-agent, per-path, per-store-view allow/disallow rows, plus a live robots.txt Preview page so you can verify output before it goes public.

Which themes are supported? Both Hyva and Luma. The module works at the controller and plugin layer, so no theme-specific template is needed.

What does it need? Magento 2.4.4 to 2.4.8, PHP 8.1 to 8.4, and the free mage2kishan/module-core package.

Need Custom Magento 2 Development?

Get a free quote for your project in 24 hours for custom modules, Hyva themes, performance work, M1 to M2 migrations, and Adobe Commerce Cloud.

Get a Free Quote

Kishan Savaliya

Top Rated Plus on Upwork

Hire on Upwork

100% Job Success - 10+ Years Magento Experience Adobe Certified - Hyva Specialist

Panth Infotech Agency

Magento Development Team

Visit Agency

Custom Modules - Theme Design - Migrations Performance - SEO - Adobe Commerce Cloud

Visit our website: kishansavaliya.com  |  Get a quote: kishansavaliya.com/get-quote

Table of Contents

Who Is It For

  • Stores worried about AI training scrapers that want to block GPTBot, Bytespider, CCBot, or other data-collection bots in one click rather than hand-editing a file on every deploy.
  • SEO-conscious merchants who need layered-nav pages, search result pages, and customer account paths excluded from indexing through proper HTTP headers, not just a meta tag.
  • Multi-store setups where each store view needs its own robots.txt body, noindex path list, and LLM bot policy.
  • Stores upgrading from Panth_AdvancedSEO that want robots handling as a standalone, self-contained module without pulling in the full SEO suite.
  • Developers who need a structured, schema-backed policy grid instead of a single admin textarea with no validation.

Key Features

Dynamic robots.txt Per Store View

  • Router-level controller takes over /robots.txt so the response is built from live config every time. No static file is ever served.
  • LLM and AI bot blocks are written as User-agent: <bot>\nDisallow: / sections when their toggle is set to No.
  • Admin policy rows from the panth_seo_robots_policy table are merged under the matching user-agent block.
  • Crawl-delay, Sitemap, and Host lines are appended automatically from config.
  • Custom body override lets you paste your own robots.txt verbatim and skip the generation pipeline entirely.

X-Robots-Tag HTTP Response Header

  • Added to every frontend HTML response by Plugin\Response\XRobotsTagPlugin before the response is sent.
  • Automatic noindex for error pages (404, 410, 500, 503), non-HTML assets (.pdf, .doc, .xls), layered-nav filter pages, and search result pages.
  • Configurable noindex path list with wildcard * support for private paths like /customer/*, /checkout, /wishlist, and more.
  • max-image-preview and max-snippet are appended to every header value, including the large setting recommended for Google Discover.

14 LLM and AI Crawler Toggles

  • One Yes/No toggle per bot in the LLM Bot Policy config group. Turning a bot to No writes a Disallow: / block for that user-agent in robots.txt.
  • Bots covered: GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot (covers Claude-Web too), Anthropic-AI, Google-Extended (covers GoogleOther), PerplexityBot, Cohere-AI, CCBot, Bytespider, Amazonbot, Applebot-Extended, FacebookBot, Meta-ExternalAgent.
  • CCBot and Bytespider are blocked by default because they feed large-scale training pipelines and are known to ignore partial disallows.

Admin Policy Grid and robots.txt Preview

  • panth_seo_robots_policy table stores per-user-agent, per-path, per-store-view allow/disallow rows with a priority column.
  • Full CRUD grid at Admin - Panth Infotech - Robots & LLM Bots - Robots Policies with mass enable, disable, and delete actions.
  • robots.txt Preview page renders the live output for any store view so you can check the result before it goes public.
  • Store-view scope on every row and config value so each store can have its own policy.

Security Built In

  • CRLF-injection-safe validator runs on every directive string before it reaches a response header or the robots.txt body. \r, \n, and \0 are rejected outright.
  • User-agent and path validation on every policy save. UAs must match /^[A-Za-z0-9._\-+*\/ ]+$/; paths must start with / and contain no control bytes.
  • ACL on every admin controller. All routes require a valid admin session and declare their own ADMIN_RESOURCE.
  • XSS-safe Preview page renders the robots.txt body through escapeHtml() so a hostile custom body cannot execute script in the admin browser.

Built to Last

  • Constructor DI only across all classes. No ObjectManager calls.
  • Full Page Cache friendly. The robots.txt controller and the X-Robots-Tag plugin do not break Varnish or Fastly.
  • Translation ready. All admin labels use Magento's __() function.
  • Zero data loss on upgrade from Panth_AdvancedSEO. The panth_seo_robots_policy table name is preserved and the schema shapes match exactly.

Screenshots

Live Walkthrough

End-to-end admin flow: enable the module, toggle a few LLM bots, add a policy row, preview the generated robots.txt, curl /robots.txt on both Hyva and Luma, and confirm the X-Robots-Tag header on a customer account page.

Panth Robots SEO demo

Admin Configuration

Global configuration: toggle the module, set the default meta robots value, configure layered-nav and catalogsearch noindex, edit the noindex path list, and set max-image-preview, max-snippet, and Crawl-delay.

Admin configuration

Robots Policies Grid

One row per user-agent, path, directive, and store view combination. Filter by store, mass-enable, disable or delete, and set priority so the evaluator knows which rule wins when two patterns overlap.

Admin grid

Policy Edit Form

Pick a user-agent (* for the default block, or GPTBot, ClaudeBot, a custom crawler), pick allow or disallow, enter a path, scope to a store view, and set priority and active flag.

Admin edit form

robots.txt Preview

Dedicated Panth Infotech - Robots & LLM Bots - robots.txt Preview page renders the live body for the selected store view, exactly as the frontend controller will serve it.

robots.txt preview

Compatibility

Requirement Versions Supported
Magento Open Source 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8
Adobe Commerce 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8
Adobe Commerce Cloud 2.4.4 to 2.4.8
PHP 8.1.x, 8.2.x, 8.3.x, 8.4.x
Hyva Theme 1.0+ (fully compatible)
Luma Theme Native support
Required Dependency mage2kishan/module-core (free)

Installation

Composer Installation (Recommended)

composer require mage2kishan/module-robots-seo
bin/magento module:enable Panth_Core Panth_RobotsSeo
bin/magento setup:upgrade
bin/magento setup:di:compile
bin/magento setup:static-content:deploy -f
bin/magento cache:flush

Manual Installation via ZIP

  1. Download the latest release from Packagist or from the product page.
  2. Extract it to app/code/Panth/RobotsSeo/ in your Magento install.
  3. Make sure Panth_Core is installed too (required dependency).
  4. Run the commands above starting from bin/magento module:enable.

Verify Installation

bin/magento module:status Panth_RobotsSeo
# Expected: Module is enabled

curl -s -o /dev/null -w '%{http_code}\n' https://your-store.test/robots.txt
# 200

curl -sI https://your-store.test/customer/account/login | grep -i x-robots-tag
# X-Robots-Tag: noindex, nofollow, max-image-preview:large, max-snippet:-1

After install, open:

Admin -> Panth Infotech -> Robots & LLM Bots

Configuration

Go to Stores -> Configuration -> Panth Infotech -> Robots & LLM Bots.

General

Setting Group Default Description
Enable Module General Yes Master switch. When No, the X-Robots-Tag plugin is a no-op and /robots.txt serves a stock User-agent: *\nAllow: /.
Debug Logging General No When Yes, every header and meta decision is written to var/log/panth_robots_seo.log.
Default Meta Robots General index,follow Baseline directive applied when no per-entity or per-path override fires.
Noindex Layered-Nav Filtered Pages General Yes Emit noindex, follow when a catalog listing has layered-nav or sort/limit/page query parameters.
Noindex Search Result Pages General Yes Emit noindex, follow on /catalogsearch/result/* pages.
Noindex URL Paths (one per line) General 18-line seeded list Private path patterns that must emit noindex, nofollow. Wildcards supported: * matches anything.
max-image-preview Directive General large Appended to every X-Robots-Tag. large is recommended for Google Discover eligibility.
max-snippet Directive General -1 -1 = unlimited. A positive integer caps the SERP snippet length.
Crawl-delay (seconds) General 0 Emitted under User-agent: * in robots.txt. 0 omits the directive.

LLM Bot Policy

Setting Group Default Description
Allow GPTBot (OpenAI) LLM Bot Policy Yes No = emits User-agent: GPTBot\nDisallow: /.
Allow ClaudeBot (Anthropic) LLM Bot Policy Yes Covers both ClaudeBot and Claude-Web.
Allow Google-Extended LLM Bot Policy Yes Covers both Google-Extended and GoogleOther.
Allow CCBot (Common Crawl) LLM Bot Policy No Blocked by default. CCBot feeds dataset-scale training pipelines.
Allow PerplexityBot LLM Bot Policy Yes
Allow Bytespider (ByteDance) LLM Bot Policy No Blocked by default. Bytespider is known to ignore partial disallows.
Allow ChatGPT-User LLM Bot Policy Yes
Allow OAI-SearchBot LLM Bot Policy Yes
Allow Anthropic-AI LLM Bot Policy Yes
Allow Cohere-AI LLM Bot Policy Yes
Allow Amazonbot LLM Bot Policy Yes
Allow Applebot-Extended LLM Bot Policy Yes
Allow Facebookbot LLM Bot Policy Yes
Allow Meta-ExternalAgent LLM Bot Policy Yes

robots.txt Override

Setting Group Default Description
Use Custom robots.txt Body robots.txt Override No When Yes, the custom body REPLACES the generated output. All LLM toggles and policy rows are ignored.
Custom robots.txt Body robots.txt Override (empty) Pasted verbatim into the response. CRLF is normalised to LF. Leave empty to use the generated output.

Every setting resolves at store-view scope, so each store can have a different LLM policy, noindex path list, or custom body.

Supported LLM Bots

Bot UA string(s) Default
GPTBot (OpenAI) GPTBot Allow
ChatGPT-User ChatGPT-User Allow
OAI-SearchBot OAI-SearchBot Allow
ClaudeBot (Anthropic) ClaudeBot, Claude-Web Allow
Anthropic-AI anthropic-ai Allow
Google-Extended Google-Extended, GoogleOther Allow
PerplexityBot PerplexityBot Allow
Cohere-AI cohere-ai Allow
CCBot (Common Crawl) CCBot Disallow
Bytespider (ByteDance) Bytespider Disallow
Amazonbot Amazonbot Allow
Applebot-Extended Applebot-Extended Allow
FacebookBot FacebookBot Allow
Meta-ExternalAgent meta-externalagent Allow

Bots not listed here (YouBot, PetalBot, Diffbot, AI2Bot, etc.) are not blocked by default. To block them, add a Disallow: / row in the Robots Policies grid with the UA as the user-agent string.

How It Works

  1. Controller\Robots\Index serves GET /robots.txt with the generated or override body at Content-Type: text/plain; charset=utf-8. The core Magento robots router is disabled in etc/frontend/di.xml so this controller always wins.
  2. Setup\Patch\Data\InstallRobotsTxtRewrite writes the url_rewrite row that maps /robots.txt to the module controller at install time. RefreshRobotsTxtRewrite re-points stale rows left behind by Panth_AdvancedSEO, so upgrades need no manual DB work.
  3. Plugin\Response\XRobotsTagPlugin runs beforeSendResponse on Magento\Framework\App\Response\Http. It reads the request path, HTTP status code, and Content-Type, then sets X-Robots-Tag once per response using this precedence order:
    • Self-skip on /robots.txt.
    • Error-code override (404, 410, 500, 503) to noindex, nofollow.
    • Non-HTML asset override (.pdf, .doc, .xls, .xlsx) to noindex, nofollow.
    • Catalogsearch noindex when noindex_search_results = Yes.
    • Configured noindex_paths match via Service\NoindexPathMatcher.
    • Layered-nav or sort-filter parameters to noindex, follow.
    • Default directive from panth_robots_seo/general/default_directive.
  4. Model\Robots\PolicyResolver aggregates LLM-bot toggles, rows from panth_seo_robots_policy, the configured crawl-delay, and sitemap references into the final robots.txt body for a given store.
  5. Service\DirectiveValidator is the single chokepoint every directive string passes through before it reaches a response header or the robots.txt body. It rejects any string containing \r, \n, \0, or bytes outside printable ASCII.

FAQ

Does it work on Hyva themes?

Yes. The module works at the controller and plugin layer, not through layout or template. Both Hyva and Luma stores get the same robots.txt output and X-Robots-Tag header with no extra configuration.

Magento already has a robots.txt textarea in Content -> Design -> Configuration. Why replace it?

Magento's built-in option is a single store-wide textarea with no header control, no LLM bot awareness, and no path-level noindex logic. Panth Robots SEO adds per-store-view generation, 14 AI crawler toggles, a structured policy grid, and the X-Robots-Tag header that Magento does not provide at all.

Will it affect my existing robots.txt content?

When you first install the module, the generated robots.txt starts from your LLM bot toggles and any policy rows you create. If you want to keep your existing content exactly, paste it into the Custom robots.txt Body field and enable the override.

Can I block just one path for a specific bot, not the whole site?

Yes. Open Admin - Panth Infotech - Robots & LLM Bots - Robots Policies and add a row with the user-agent, disallow, and the path you want blocked. You can scope the row to a specific store view and set a priority.

Does it set the HTML <meta name="robots"> tag too?

The module sets the X-Robots-Tag HTTP response header on every page, which search engines treat as equivalent to the meta tag. The HTML meta tag is updated too if Panth_AdvancedSEO is installed alongside this module. Without AdvancedSEO, only the HTTP header is set.

Is the noindex path list configurable?

Yes. Go to Stores - Configuration - Panth Infotech - Robots & LLM Bots - General - Noindex URL Paths. Enter one path per line. The * wildcard matches anything. The seeded default covers /customer/*, /checkout, /wishlist, /sales/*, and about 14 other private patterns.

What happens on a 404 page?

The X-Robots-Tag plugin hard-overrides to noindex, nofollow for HTTP status codes 404, 410, 500, and 503, regardless of any other config. Error pages can never appear in the index.

Does it need Panth_AdvancedSEO?

No. The module is fully standalone. If AdvancedSEO is also installed, they share the panth_seo_robots_policy table and do not conflict with each other.

Is it multi-store safe?

Yes. Every config value, every policy row, and every X-Robots-Tag decision resolves at store-view scope. A setting on one store view never affects another.

Support

Channel Contact
Product Page kishansavaliya.com/magento-2-robots-seo.html
Email kishansavaliyakb@gmail.com
Website kishansavaliya.com
WhatsApp +91 84012 70422
GitHub Issues github.com/mage2sk/module-robots-seo/issues
Upwork (Top Rated Plus) Hire Kishan Savaliya
Upwork Agency Panth Infotech

Response time: 1-2 business days.

Need Custom Magento Development?

Looking for custom Magento module development, Hyva theme work, store migrations, or performance tuning? Get a free quote in 24 hours:

Get a Free Quote

Hire on Upwork    Visit Agency    View Product Page

About Panth Infotech

Built and maintained by Kishan Savaliya (kishansavaliya.com), a Top Rated Plus Magento developer on Upwork with 10+ years of eCommerce experience.

Panth Infotech is a Magento 2 development agency that builds high quality, security focused extensions and themes for both Hyva and Luma storefronts. The extension suite covers SEO, performance, checkout, product presentation, customer engagement, and store management, with each module built to MEQP standards and tested across Magento 2.4.4 to 2.4.8.

Browse the full extension catalog on our Magento extensions page or on Packagist.

Quick Links

Resource Link
Product Page magento-2-robots-seo.html
Packagist mage2kishan/module-robots-seo
GitHub mage2sk/module-robots-seo
Website kishansavaliya.com
Free Quote kishansavaliya.com/get-quote
Upwork (Top Rated Plus) Hire Kishan Savaliya
Upwork Agency Panth Infotech
Email kishansavaliyakb@gmail.com
WhatsApp +91 84012 70422

Ready to take control of how bots and crawlers see your store?
See Robots SEO

SEO Keywords: magento 2 robots.txt, magento 2 robots seo, magento 2 x-robots-tag, magento 2 llm bot policy, magento 2 ai crawler control, magento 2 block gptbot, magento 2 block claudebot, magento 2 block perplexitybot, magento 2 block bytespider, magento 2 google-extended, magento 2 noindex, magento 2 noindex layered nav, magento 2 noindex search results, magento 2 crawl delay, magento 2 robots meta, magento 2 seo headers, hyva robots seo, luma robots seo, magento 2 robots extension, magento 2 robots module, mage2kishan robots seo, panth robots seo, panth infotech, hire magento developer, top rated plus upwork, kishan savaliya magento, custom magento development, magento 2.4.8 robots, php 8.4 magento seo