angeo/module-llms-txt

Magento 2 module for AI Engine Optimization (AEO). Generates spec-compliant llms.txt and llms-full.txt per llmstxt.org standard, plus streaming JSONL for vector indexing. Multi-store, multi-website, CLI, cron, async admin UI, Page Builder-aware sanitization, customer-group pricing, atomic writes, ET

Maintainers

Package info

github.com/angeo-dev/module-llms-txt

Homepage

Type:magento2-module

pkg:composer/angeo/module-llms-txt

Statistics

Installs: 588

Dependents: 0

Suggesters: 3

Stars: 5

Open Issues: 0

3.0.5 2026-06-04 19:39 UTC

This package is auto-updated.

Last update: 2026-06-11 16:42:44 UTC


README

AI Engine Optimization (AEO) for Magento 2 / Adobe Commerce. Generates spec-compliant llms.txt, llms-full.txt, and JSONL files so ChatGPT, Claude, Gemini, Perplexity, and other LLM-powered crawlers can ingest your catalog efficiently.

Magento PHP License

What this module does

After install, your storefront serves:

URL What it is
https://shop/llms.txt Spec-compliant llmstxt.org file (compact markdown)
https://shop/llms-full.txt Same structure, full sanitized descriptions inline
https://shop/llms.jsonl One JSON record per line, for vector indexing
https://shop/{url-key}.md On-the-fly Markdown mirror of any product/category/CMS page

Generation happens via cron (daily by default), CLI, or the admin "Generate Now" button. The output is streamed to disk with bounded memory, atomically renamed on completion, and served with proper ETag / Cache-Control headers.

Why this module exists

LLM crawlers can ingest a typical Magento storefront — full theme, JS, image sprites, navigation chrome — but that's wasteful for everyone. The llmstxt.org standard defines a clean text format optimized for AI ingestion: stable links, structured headings, descriptions in their natural prose form rather than buried in product cards.

This module produces that format for Magento, with care taken for the things Magento makes hard: multi-store layout, Page Builder content, CMS directive resolution, customer-group pricing, and very large catalogs.

Installation

composer require angeo/module-llms-txt:^3.0
bin/magento module:enable Angeo_LlmsTxt
bin/magento setup:upgrade
bin/magento setup:di:compile      # only in production mode
bin/magento setup:static-content:deploy adminhtml   # only in production mode
bin/magento cache:flush

Then generate your first batch:

bin/magento angeo:llms:generate

Visit https://your-store.tld/llms.txt.

Configuration reference

All settings live at Stores → Configuration → Angeo → LLMs.txt.

General

Field Default Notes
Enable Yes Master switch.
Exclude This Scope No Available at website + store scope. Skips generation for this scope.
Store Summary One-line summary used as the spec-compliant blockquote. If empty, falls back to Design → HTML Head → Default Description.

Content

Field Default Notes
Include Categories Yes
Include CMS Pages Yes
Include Products Yes
Products under ## Optional Yes Recommended. Lets context-budget-constrained AI clients drop products without losing categories / pages.
Product Limit 5000 0 = unlimited.
Exclude Out-of-Stock Products No
CMS Identifiers to Exclude no-route, enable-cookies, privacy-policy-cookie-restriction-mode Comma- or newline-separated.
Customer Group for Pricing NOT LOGGED IN Which group's final price (with special / group prices) is exposed.

Output formats

Field Default Notes
Generate llms.txt Yes
Generate llms-full.txt No 5–50× larger; enable only if you actually want it.
Generate JSONL Yes One record per line; embeds-ready.
Serve /url-key.md Mirrors No Per-entity Markdown rendering; on-the-fly, no disk.

Content sanitization

Field Default Notes
Resolve CMS Directives Yes Renders {{widget}}, {{block}}, {{var}} via Magento's frontend filter.
Page Builder Strategy Exclude See below.
Excluded Content-Types products, banner, slider, slide, video, map, buttons, button-item, block, dynamic-block, divider, spacer Used under Exclude strategy.
Allowed Content-Types text, heading, html, tabs, tab-item, row, column, column-group Used under Allow strategy.

Page Builder strategies

Strategy Effect
Preserve Keep all Page Builder content; only strip wrapper attributes.
Exclude Drop elements whose data-content-type is in the excluded list. Default.
Allow Drop everything EXCEPT data-content-type in the allowed list.
Strip Drop ALL elements that carry a data-content-type attribute.

The filter parses content with DOMDocument (not regex), so nested Page Builder containers are handled correctly. Known content-types include: row, column-group, column, tabs, tab-item, text, heading, html, image, video, map, divider, spacer, buttons, button-item, banner, slider, slide, products, block, dynamic-block.

Performance

Field Default Notes
Collection Page Size 1000 Lower if hitting memory limits on shared hosting.

HTTP caching

Field Default Notes
Cache-Control TTL (s) 3600 Sent as public, max-age=… on the served files.

Cron

Field Default Notes
Cron Expression 0 2 * * * Daily at 02:00 server time.

CLI commands

# Generate everything for all eligible stores
bin/magento angeo:llms:generate

# Single store, skip JSONL
bin/magento angeo:llms:generate --store=default --no-jsonl

# Per-store/per-format last-run status
bin/magento angeo:llms:status

# Lint generated files for spec compliance
bin/magento angeo:llms:validate

Extending — custom providers

Drop a new section into llms.txt (e.g. a "Brands" list, a "Recent Posts" section, etc.) by implementing Angeo\LlmsTxt\Api\ProviderInterface and registering it via di.xml.

namespace Vendor\Module\Provider\Llms;

use Angeo\LlmsTxt\Api\OutputContextInterface;
use Angeo\LlmsTxt\Model\Provider\AbstractProvider;

class BrandsProvider extends AbstractProvider
{
    public function provide(OutputContextInterface $context): iterable
    {
        yield "## Brands\n\n";
        foreach ($this->brandRepo->getList($context->getStore()->getId()) as $brand) {
            $label = $this->escapeMarkdown($brand->getName());
            yield "- [{$label}]({$brand->getUrl()})\n";
        }
        yield "\n";
    }
}
<!-- etc/di.xml -->
<type name="Angeo\LlmsTxt\Model\Generator\LlmsTxtGenerator">
    <arguments>
        <argument name="providers" xsi:type="array">
            <item name="brands" xsi:type="object">Vendor\Module\Provider\Llms\BrandsProvider</item>
        </argument>
    </arguments>
</type>

The base class gives you escapeMarkdown(), encodeJsonl(), isJsonl(), isFullTxt(), and isApplicable() overridable to opt out per-format.

Extending — custom sanitizer filters

Insert your own filter between Page Builder and HTML stripping (e.g. to remove <script> data attributes, redact phone numbers, etc.) by implementing Angeo\LlmsTxt\Api\SanitizerFilterInterface and re-declaring the pipeline in di.xml.

<type name="Angeo\LlmsTxt\Model\Sanitizer\Sanitizer">
    <arguments>
        <argument name="filters" xsi:type="array">
            <item name="cms_directive" xsi:type="object">Angeo\LlmsTxt\Model\Sanitizer\Filter\CmsDirectiveFilter</item>
            <item name="page_builder"  xsi:type="object">Angeo\LlmsTxt\Model\Sanitizer\Filter\PageBuilderFilter</item>
            <item name="redact_pii"    xsi:type="object">Vendor\Module\Sanitizer\Filter\PiiRedactionFilter</item>
            <item name="html"          xsi:type="object">Angeo\LlmsTxt\Model\Sanitizer\Filter\HtmlFilter</item>
            <item name="whitespace"    xsi:type="object">Angeo\LlmsTxt\Model\Sanitizer\Filter\WhitespaceFilter</item>
        </argument>
    </arguments>
</type>

Events

Hook in via observers — three events are dispatched per store/format pass:

Event Data
angeo_llms_generation_before store, format, context
angeo_llms_generation_after store, format, file, bytes, items, duration
angeo_llms_generation_failed store, format, error

Migrating from 2.x

  • Old files in media/llms/ can be deleted (output now lives in media/angeo/llms/).
  • Any custom ProviderInterface implementations must change from returning a string to yielding iterable<string>. See Extending — custom providers.
  • Drop any reverse-proxy / Nginx rewrites pointing at the old paths.
  • Re-run Stores → Configuration → Angeo → LLMs.txt to set the new fields (Page Builder strategy, customer group, etc.).
  • External tooling that called the GET /admin/angeo_llms/generate/index URL must switch to the CLI command (the admin endpoint is now POST + CSRF).

License

MIT — see LICENSE.

Support