nickmoline / robots-checker
Class to check a URL for robots exclusion using all possible methods of robots exclusion
v1.0.5
2018-04-03 21:27 UTC
Requires
- php: >=5.5.9
- ext-fileinfo: *
- ext-intl: *
- ext-mbstring: *
- league/uri: ^4.0 || ^5.0
- php-curl-class/php-curl-class: 3.5.*
- thesoftwarefanatics/php-html-parser: ^1.7
- tomverran/robots-txt-checker: ^1.14
Requires (Dev)
- friendsofphp/php-cs-fixer: ^1.13 || ^2.0
- phpspec/phpspec: ~2.0
- phpunit/phpunit: ~4.8
This package is auto-updated.
Last update: 2025-03-07 01:51:08 UTC
README
These classes allow you to check all of the different ways you can exclude a URL from search engines.
Classes
You can instantiate the following classes:
NickMoline\Robots\RobotsTxt
: Checks the corresponding robots.txt file for a urlNickMoline\Robots\Status
: Checks the HTTP Status code for an indexable URLNickMoline\Robots\Header
: Checks the HTTPX-Robots-Tag
HeaderNickMoline\Robots\Meta
: Checks the<meta name="robots">
tag (as well as bot specific tags)NickMoline\Robots\All
: Wrapper class that will run all of the above checks
Example Usage
<?php require NickMoline\Robots\RobotsTxt; require NickMoline\Robots\Header as RobotsHeader; require NickMoline\Robots\All as RobotsAll; $checker = new RobotsTxt("http://www.example.com/test.html"); $allowed = $checker->verify(); // By default it checks Googlebot $allowed = $checker->setUserAgent("bingbot")->verify(); // Checks to see if blocked for bingbot by robots.txt file echo $checker->getReason(); // Get the reason the url is allowed or denied $checker2 = new RobotsHeader("http://www.example.com/test.html"); $allowed = $checker->verify(); // Same as above but will test the X-Robots-Tag HTTP headers $checkerAll = new RobotsAll("http://www.example.com/test.html"); $allowed = $checker->verify(); // This one runs all of the available tests