Web crawler based on PHP Guzzle HTTP Client with concurrency support for faster operation.
Web crawler based on PHP Guzzle HTTP Client with concurrency support for faster operation. Includes support for any content-type download, link profiler and response observers.
Thingston Crawler requires:
- PHP 7.1 or above.
Add Thingston Crawler to any PHP project using Composer:
composer require thingston/crawler
Simply create a new
Crawler instance and invoke
start method with any public URI:
use Thingston\Crawler; $crawler = new Crawler(); $crawler->start('https://www.wikipedia.org/');
In order to process results from the crawling process you may add as many many Observers.
An Observer is a concrete class implement
In case you find issues with this code please open a ticket in Github Issues at https://github.com/thingston/crawler/issues.
Open Source is made of contribuition. If you want to contribute to Thingston please follow these steps:
- Fork latest version into your own repository.
- Write your changes or additions and commit them.
- Follow PSR-2 coding style standard.
- Make sure you have unit tests with full coverage to your changes.
- Go to Github Pull Requests at https://github.com/thingston/crawler/pulls and create a new request.
All relevant changes on this code are logged in a separated log file.
Version numbers follow recommendations from Semantic Versioning.
Thingston code is maintained under The MIT License.