fkrzski / robots-txt
A modern, fluent PHP package for managing robots.txt rules with type safety and great developer experience
Fund package maintenance!
fkrzski
Installs: 922
Dependents: 0
Suggesters: 0
Security: 0
Stars: 4
Watchers: 1
Forks: 1
Open Issues: 0
pkg:composer/fkrzski/robots-txt
Requires
- php: ^8.3
Requires (Dev)
- friendsofphp/php-cs-fixer: ^3.0
- pestphp/pest: ^3.3
- pestphp/pest-plugin-type-coverage: ^3.2
- phpstan/phpstan: ^1.0
- rector/rector: ^1.0
- vimeo/psalm: ^6.0@dev
This package is auto-updated.
Last update: 2025-10-08 21:28:00 UTC
README
PHP Robots.txt
A modern, fluent PHP package for managing robots.txt rules with type safety and great developer experience.
Requirements
- PHP 8.3 or higher
- Code coverage driver
Installation
You can install the package via composer:
composer require fkrzski/robots-txt
Documentation
The RobotsTxt
class provides a fluent interface for creating and managing robots.txt rules with type safety and immutability.
Basic Methods
Constructor
Creates a new instance of the RobotsTxt class.
$robots = new RobotsTxt();
Output:
// Empty output - no rules defined yet
allow(string $path)
Adds an Allow rule for the specified path. The path must:
- Start with a forward slash (/)
- Not contain query parameters
- Not contain fragments
- Not be empty
$robots = new RobotsTxt(); $robots->allow('/public');
Output:
User-agent: *
Allow: /public
disallow(string $path)
Adds a Disallow rule for the specified path. Has the same path requirements as allow()
.
$robots = new RobotsTxt(); $robots->disallow('/private');
Output:
User-agent: *
Disallow: /private
crawlDelay(int $seconds)
Sets the crawl delay in seconds. The delay value must be non-negative.
$robots = new RobotsTxt(); $robots->crawlDelay(10);
Output:
User-agent: *
Crawl-delay: 10
sitemap(string $url)
Adds a Sitemap URL. The URL must:
- Be a valid URL
- Use HTTP or HTTPS protocol
- Have an .xml extension
$robots = new RobotsTxt(); $robots->sitemap('https://example.com/sitemap.xml');
Output:
Sitemap: https://example.com/sitemap.xml
disallowAll(bool $disallow = true)
A convenience method for quickly blocking access to the entire site. When $disallow
is true (default):
- Clears all existing rules in the current context (global or user-agent specific)
- Adds a single "Disallow: /*" rule
- Preserves sitemap entries and rules for other user agents
// Block everything globally $robots = new RobotsTxt(); $robots ->allow('/public') // This will be cleared ->disallow('/admin') // This will be cleared ->disallowAll(); // Only Disallow: /* remains
Output:
User-agent: *
Disallow: /*
Block access only for specific crawler:
$robots = new RobotsTxt(); $robots ->disallow('/admin') // Global rule - keeps ->userAgent(CrawlerEnum::GOOGLE) ->allow('/public') // Google rule - cleared ->disallow('/private') // Google rule - cleared ->disallowAll() // Only Disallow: /* for Google ->userAgent(CrawlerEnum::BING) ->disallow('/secret'); // Bing rule - keeps
Output:
User-agent: *
Disallow: /admin
User-agent: Googlebot
Disallow: /*
User-agent: Bingbot
Disallow: /secret
When $disallow
is false, the method does nothing.
userAgent(CrawlerEnum $crawler)
Sets the context for subsequent rules to apply to a specific crawler.
$robots = new RobotsTxt(); $robots->userAgent(CrawlerEnum::GOOGLE);
Output:
User-agent: Googlebot
Combining Methods
Basic Rule Combinations
You can chain multiple rules together:
$robots = new RobotsTxt(); $robots ->disallow('/admin') ->allow('/public') ->crawlDelay(5);
Output:
User-agent: *
Disallow: /admin
Allow: /public
Crawl-delay: 5
Crawler-Specific Rules
You can set rules for specific crawlers:
$robots = new RobotsTxt(); $robots ->userAgent(CrawlerEnum::GOOGLE) ->disallow('/private') ->allow('/public') ->crawlDelay(10);
Output:
User-agent: Googlebot
Disallow: /private
Allow: /public
Crawl-delay: 10
Multiple Crawlers
You can define rules for multiple crawlers:
$robots = new RobotsTxt(); $robots ->userAgent(CrawlerEnum::GOOGLE) ->disallow('/google-private') ->userAgent(CrawlerEnum::BING) ->disallow('/bing-private');
Output:
User-agent: Googlebot
Disallow: /google-private
User-agent: Bingbot
Disallow: /bing-private
Using forUserAgent()
The forUserAgent()
method provides a closure-based syntax for grouping crawler-specific rules:
$robots = new RobotsTxt(); $robots->forUserAgent(CrawlerEnum::GOOGLE, function (RobotsTxt $robots): void { $robots ->disallow('/private') ->allow('/public') ->crawlDelay(10); });
Output:
User-agent: Googlebot
Disallow: /private
Allow: /public
Crawl-delay: 10
Complex Example
Combining global rules, multiple crawlers, and sitemaps:
$robots = new RobotsTxt(); $robots ->disallow('/admin') // Global rule ->sitemap('https://example.com/sitemap1.xml') ->forUserAgent(CrawlerEnum::GOOGLE, function (RobotsTxt $robots): void { $robots ->disallow('/google-private') ->allow('/public/*'); }) ->forUserAgent(CrawlerEnum::BING, function (RobotsTxt $robots): void { $robots ->disallow('/bing-private') ->crawlDelay(5); }) ->sitemap('https://example.com/sitemap2.xml');
Output:
User-agent: *
Disallow: /admin
User-agent: Googlebot
Disallow: /google-private
Allow: /public/*
User-agent: Bingbot
Disallow: /bing-private
Crawl-delay: 5
Sitemap: https://example.com/sitemap1.xml
Sitemap: https://example.com/sitemap2.xml
Working with Wildcards
The library supports wildcards in paths:
$robots = new RobotsTxt(); $robots ->disallow('/*.php') // Block all PHP files ->allow('/public/*') // Allow all under public ->disallow('/private/$'); // Exact match for /private/
Output:
User-agent: *
Disallow: /*.php
Allow: /public/*
Disallow: /private/$
toFile(?string $path = null)
Saves the robots.txt content to a file. If no path is provided, saves to robots.txt
in the project root directory.
$robots = new RobotsTxt(); $robots ->disallow('/admin') ->allow('/public'); // Save to default location (project root) $robots->toFile(); // Save to custom location $robots->toFile('/var/www/html/robots.txt');
The method will throw a RuntimeException
if:
- The target directory doesn't exist or isn't writable
- The existing robots.txt file isn't writable
Returns true
if the file was successfully written.
Best Practices
- Start with Global Rules: Define global rules before crawler-specific rules for better organization.
- Group Related Rules: Use the
forUserAgent()
method to group rules for the same crawler. - Use Wildcards Carefully: Be precise with wildcard patterns to avoid unintended matches.
- Order Matters: More specific rules should come before more general ones.
- Validate Paths: Always ensure paths start with a forward slash and don't contain query parameters or fragments.
Error Handling
The class will throw InvalidArgumentException
in the following cases:
- Path doesn't start with forward slash
- Path contains query parameters or fragments
- Path is empty
- Sitemap URL is invalid or not HTTP/HTTPS
- Sitemap URL doesn't end with .xml
- Crawl delay is negative
These validations ensure that the generated robots.txt file is always valid and follows the standard format.
Testing and Development
The project includes several command groups for testing and code quality:
Code Quality & Formatting
# Run profanity checks composer test:profanity # Run static analysis with PHPStan composer analyse # Format code with Laravel Pint composer lint # Check code formatting (without fixing) composer test:lint # Run automated refactoring with Rector composer refactor # Check refactor suggestions (dry-run) composer test:refactor # Check type coverage (100% required) composer test:type-coverage
Testing
# Check PHP syntax composer test:syntax # Run unit tests with coverage composer test:unit # Run mutation testing composer test:unit:mutation
Complete Test Suite
# Run all tests and quality checks composer test
๐ค Contributing
We welcome contributions! Please see our Contributing Guide for details on:
- Setting up the development environment
- Running tests
- Submitting pull requests
- Code style guidelines
- Reporting issues
Quick Contribution Setup
- Fork this repository
- Clone your fork:
git clone https://github.com/yourusername/php-package-skeleton.git
- Install dependencies:
composer install
- Create a feature branch:
git checkout -b feature/amazing-feature
- Make your changes and run tests:
composer test
- Submit a pull request
๐ License
This project is open-sourced software licensed under the MIT License.
๐จโ๐ป Author
PHP Package Skeleton was created by Filip Krzyลผanowski.
๐ Acknowledgments
Special thanks to the amazing PHP community and the maintainers of:
Happy coding! ๐