flowd / phirewall-preset-bots
Bot and AI-crawler control presets for the Phirewall PHP firewall - block AI scrapers and rate-limit aggressive SEO crawlers
Requires
- php: >=8.2
- flowd/phirewall: ^0.6
Requires (Dev)
- friendsofphp/php-cs-fixer: ^3.89
- nyholm/psr7: ^1.8
- phpstan/phpstan: ^1.12
- phpunit/phpunit: ^11.5
- rector/rector: ^1.2
This package is auto-updated.
Last update: 2026-06-17 21:55:36 UTC
README
Block AI crawlers and rate-limit aggressive SEO crawlers with flowd/phirewall.
Presets are PortableConfig data (and ConfigLayers), materialized onto your cache with Config::with().
Installation
composer require flowd/phirewall-preset-bots
Usage
use Flowd\Phirewall\Config; use Flowd\PhirewallPresetBots\Presets; // Block AI crawlers, rate-limit SEO crawlers. $config = (new Config($cache))->with( Presets::blockAiCrawlers(), Presets::throttleSeoCrawlers(limit: 60, period: 60), );
Presets:
| Preset | Effect |
|---|---|
Presets::blockAiCrawlers() |
Blocks (403) requests whose User-Agent matches a known AI/LLM crawler. |
Presets::throttleAiCrawlers(limit, period) |
Rate-limits AI crawlers per client IP instead of blocking; stays indexable. |
Presets::throttleSeoCrawlers(limit, period) |
Rate-limits aggressive SEO/marketing crawlers per client IP. |
Each rule is named preset.bots.*; combine a later layer that redefines a name to override it.
What it matches
blockAiCrawlers() targets crawlers that identify as AI/LLM agents - GPTBot, ChatGPT-User,
OAI-SearchBot, ClaudeBot, anthropic-ai, Claude-Web, CCBot, PerplexityBot, Bytespider,
Amazonbot, Meta-ExternalAgent, FacebookBot, cohere-ai, Diffbot, omgili, ImagesiftBot,
Timpibot, YouBot, DuckAssistBot. The SEO list covers AhrefsBot, SemrushBot, MJ12bot, DotBot,
DataForSeoBot, BLEXBot, rogerbot, PetalBot and similar. Full lists: CrawlerCatalog.
Search and link-preview agents (Googlebot, bingbot, Applebot, facebookexternalhit) are not in
either list. Google-Extended and Applebot-Extended are robots.txt opt-out tokens, not
User-Agents, so they are not matched - set those in robots.txt.
Limits to be aware of
- User-Agent matching is policy enforcement, not a security control. It stops honest
crawlers that send a truthful
User-Agent; a hostile scraper can send any string. Pair this with the OWASP CRS and rate-limit presets for hostile traffic. - Throttles key on the client IP from
REMOTE_ADDR. Behind a proxy or CDN that is the proxy address, which buckets every client together. Configure a trusted client-IP resolver on theConfig, or override the throttle by name. - The catalogue is opinionated. To keep a crawler you value (for example Amazonbot or FacebookBot), override the rule by name with your own narrower list.
Development
composer install composer test # rector (dry-run), php-cs-fixer (dry-run), phpunit, phpstan
License
LGPL-3.0-or-later (dual-licensed, proprietary licensing available), like flowd/phirewall.