thingston/crawler

Web crawler based on PHP Guzzle HTTP Client with concurrency support for faster operation.

0.7.0 2018-11-01 22:41 UTC

This package is not auto-updated.

Last update: 2022-01-16 19:20:11 UTC


README

Web crawler based on PHP Guzzle HTTP Client with concurrency support for faster operation. Includes support for any content-type download, link profiler and response observers.

Requirements

Thingston Crawler requires:

Instalation

Add Thingston Crawler to any PHP project using Composer:

composer require thingston/crawler

Getting Started

Simply create a new Crawler instance and invoke start method with any public URI:

use Thingston\Crawler;

$crawler = new Crawler();
$crawler->start('https://www.wikipedia.org/');

In order to process results from the crawling process you may add as many many Observers. An Observer is a concrete class implement Thingston/Crawler/Observer/ObserverInterface.

Reporting Issues

In case you find issues with this code please open a ticket in Github Issues at https://github.com/thingston/crawler/issues.

Contributors

Open Source is made of contribuition. If you want to contribute to Thingston please follow these steps:

  1. Fork latest version into your own repository.
  2. Write your changes or additions and commit them.
  3. Follow PSR-2 coding style standard.
  4. Make sure you have unit tests with full coverage to your changes.
  5. Go to Github Pull Requests at https://github.com/thingston/crawler/pulls and create a new request.

Thank you!

Changes and Versioning

All relevant changes on this code are logged in a separated log file.

Version numbers follow recommendations from Semantic Versioning.

License

Thingston code is maintained under The MIT License.