angeo / module-llms-txt
Magento 2 module for AI Engine Optimization (AEO). Generates spec-compliant llms.txt and llms-full.txt per llmstxt.org standard, plus streaming JSONL for vector indexing. Multi-store, multi-website, CLI, cron, async admin UI, Page Builder-aware sanitization, customer-group pricing, atomic writes, ET
Package info
github.com/angeo-dev/module-llms-txt
Type:magento2-module
pkg:composer/angeo/module-llms-txt
Requires
- php: ~8.1.0||~8.2.0||~8.3.0||~8.4.0||~8.5.0
- ext-json: *
- ext-mbstring: *
- magento/framework: >=102.0
- magento/module-backend: >=102.0
- magento/module-catalog: >=104.0
- magento/module-catalog-inventory: >=100.4
- magento/module-catalog-url-rewrite: >=100.4
- magento/module-cms: >=104.0
- magento/module-config: >=101.2
- magento/module-store: >=101.0
- magento/module-url-rewrite: >=102.0
Requires (Dev)
- magento/magento-coding-standard: ^32.0
- phpstan/phpstan: ^1.10
- phpunit/phpunit: ^10.5
Suggests
- magento/module-page-builder: Enable to opt-in or opt-out of Page Builder content elements per content-type during sanitization
- magento/module-shared-catalog: Adobe Commerce: integrate B2B shared catalogs so llms.txt only exposes the allowed catalog
README
AI Engine Optimization (AEO) for Magento 2 / Adobe Commerce. Generates
spec-compliant llms.txt, llms-full.txt, and JSONL files so ChatGPT,
Claude, Gemini, Perplexity, and other LLM-powered crawlers can ingest your
catalog efficiently.
What this module does
After install, your storefront serves:
| URL | What it is |
|---|---|
https://shop/llms.txt |
Spec-compliant llmstxt.org file (compact markdown) |
https://shop/llms-full.txt |
Same structure, full sanitized descriptions inline |
https://shop/llms.jsonl |
One JSON record per line, for vector indexing |
https://shop/{url-key}.md |
On-the-fly Markdown mirror of any product/category/CMS page |
Generation happens via cron (daily by default), CLI, or the admin "Generate
Now" button. The output is streamed to disk with bounded memory, atomically
renamed on completion, and served with proper ETag / Cache-Control headers.
Why this module exists
LLM crawlers can ingest a typical Magento storefront — full theme, JS, image sprites, navigation chrome — but that's wasteful for everyone. The llmstxt.org standard defines a clean text format optimized for AI ingestion: stable links, structured headings, descriptions in their natural prose form rather than buried in product cards.
This module produces that format for Magento, with care taken for the things Magento makes hard: multi-store layout, Page Builder content, CMS directive resolution, customer-group pricing, and very large catalogs.
Installation
composer require angeo/module-llms-txt:^3.0 bin/magento module:enable Angeo_LlmsTxt bin/magento setup:upgrade bin/magento setup:di:compile # only in production mode bin/magento setup:static-content:deploy adminhtml # only in production mode bin/magento cache:flush
Then generate your first batch:
bin/magento angeo:llms:generate
Visit https://your-store.tld/llms.txt.
Configuration reference
All settings live at Stores → Configuration → Angeo → LLMs.txt.
General
| Field | Default | Notes |
|---|---|---|
| Enable | Yes | Master switch. |
| Exclude This Scope | No | Available at website + store scope. Skips generation for this scope. |
| Store Summary | — | One-line summary used as the spec-compliant blockquote. If empty, falls back to Design → HTML Head → Default Description. |
Content
| Field | Default | Notes |
|---|---|---|
| Include Categories | Yes | |
| Include CMS Pages | Yes | |
| Include Products | Yes | |
Products under ## Optional |
Yes | Recommended. Lets context-budget-constrained AI clients drop products without losing categories / pages. |
| Product Limit | 5000 | 0 = unlimited. |
| Exclude Out-of-Stock Products | No | |
| CMS Identifiers to Exclude | no-route, enable-cookies, privacy-policy-cookie-restriction-mode |
Comma- or newline-separated. |
| Customer Group for Pricing | NOT LOGGED IN | Which group's final price (with special / group prices) is exposed. |
Output formats
| Field | Default | Notes |
|---|---|---|
| Generate llms.txt | Yes | |
| Generate llms-full.txt | No | 5–50× larger; enable only if you actually want it. |
| Generate JSONL | Yes | One record per line; embeds-ready. |
Serve /url-key.md Mirrors |
No | Per-entity Markdown rendering; on-the-fly, no disk. |
Content sanitization
| Field | Default | Notes |
|---|---|---|
| Resolve CMS Directives | Yes | Renders {{widget}}, {{block}}, {{var}} via Magento's frontend filter. |
| Page Builder Strategy | Exclude | See below. |
| Excluded Content-Types | products, banner, slider, slide, video, map, buttons, button-item, block, dynamic-block, divider, spacer |
Used under Exclude strategy. |
| Allowed Content-Types | text, heading, html, tabs, tab-item, row, column, column-group |
Used under Allow strategy. |
Page Builder strategies
| Strategy | Effect |
|---|---|
| Preserve | Keep all Page Builder content; only strip wrapper attributes. |
| Exclude | Drop elements whose data-content-type is in the excluded list. Default. |
| Allow | Drop everything EXCEPT data-content-type in the allowed list. |
| Strip | Drop ALL elements that carry a data-content-type attribute. |
The filter parses content with DOMDocument (not regex), so nested Page
Builder containers are handled correctly. Known content-types include:
row, column-group, column, tabs, tab-item, text, heading,
html, image, video, map, divider, spacer, buttons, button-item,
banner, slider, slide, products, block, dynamic-block.
Performance
| Field | Default | Notes |
|---|---|---|
| Collection Page Size | 1000 | Lower if hitting memory limits on shared hosting. |
HTTP caching
| Field | Default | Notes |
|---|---|---|
| Cache-Control TTL (s) | 3600 | Sent as public, max-age=… on the served files. |
Cron
| Field | Default | Notes |
|---|---|---|
| Cron Expression | 0 2 * * * |
Daily at 02:00 server time. |
CLI commands
# Generate everything for all eligible stores bin/magento angeo:llms:generate # Single store, skip JSONL bin/magento angeo:llms:generate --store=default --no-jsonl # Per-store/per-format last-run status bin/magento angeo:llms:status # Lint generated files for spec compliance bin/magento angeo:llms:validate
Extending — custom providers
Drop a new section into llms.txt (e.g. a "Brands" list, a "Recent Posts"
section, etc.) by implementing Angeo\LlmsTxt\Api\ProviderInterface and
registering it via di.xml.
namespace Vendor\Module\Provider\Llms; use Angeo\LlmsTxt\Api\OutputContextInterface; use Angeo\LlmsTxt\Model\Provider\AbstractProvider; class BrandsProvider extends AbstractProvider { public function provide(OutputContextInterface $context): iterable { yield "## Brands\n\n"; foreach ($this->brandRepo->getList($context->getStore()->getId()) as $brand) { $label = $this->escapeMarkdown($brand->getName()); yield "- [{$label}]({$brand->getUrl()})\n"; } yield "\n"; } }
<!-- etc/di.xml --> <type name="Angeo\LlmsTxt\Model\Generator\LlmsTxtGenerator"> <arguments> <argument name="providers" xsi:type="array"> <item name="brands" xsi:type="object">Vendor\Module\Provider\Llms\BrandsProvider</item> </argument> </arguments> </type>
The base class gives you escapeMarkdown(), encodeJsonl(), isJsonl(),
isFullTxt(), and isApplicable() overridable to opt out per-format.
Extending — custom sanitizer filters
Insert your own filter between Page Builder and HTML stripping (e.g. to
remove <script> data attributes, redact phone numbers, etc.) by implementing
Angeo\LlmsTxt\Api\SanitizerFilterInterface and re-declaring the pipeline
in di.xml.
<type name="Angeo\LlmsTxt\Model\Sanitizer\Sanitizer"> <arguments> <argument name="filters" xsi:type="array"> <item name="cms_directive" xsi:type="object">Angeo\LlmsTxt\Model\Sanitizer\Filter\CmsDirectiveFilter</item> <item name="page_builder" xsi:type="object">Angeo\LlmsTxt\Model\Sanitizer\Filter\PageBuilderFilter</item> <item name="redact_pii" xsi:type="object">Vendor\Module\Sanitizer\Filter\PiiRedactionFilter</item> <item name="html" xsi:type="object">Angeo\LlmsTxt\Model\Sanitizer\Filter\HtmlFilter</item> <item name="whitespace" xsi:type="object">Angeo\LlmsTxt\Model\Sanitizer\Filter\WhitespaceFilter</item> </argument> </arguments> </type>
Events
Hook in via observers — three events are dispatched per store/format pass:
| Event | Data |
|---|---|
angeo_llms_generation_before |
store, format, context |
angeo_llms_generation_after |
store, format, file, bytes, items, duration |
angeo_llms_generation_failed |
store, format, error |
Migrating from 2.x
- Old files in
media/llms/can be deleted (output now lives inmedia/angeo/llms/). - Any custom
ProviderInterfaceimplementations must change from returning astringto yieldingiterable<string>. See Extending — custom providers. - Drop any reverse-proxy / Nginx rewrites pointing at the old paths.
- Re-run Stores → Configuration → Angeo → LLMs.txt to set the new fields (Page Builder strategy, customer group, etc.).
- External tooling that called the GET
/admin/angeo_llms/generate/indexURL must switch to the CLI command (the admin endpoint is now POST + CSRF).
License
MIT — see LICENSE.
Support
- GitHub Issues: https://github.com/angeo-dev/module-llms-txt/issues
- Email: support@angeo.dev