brittainmedia/phpcrawl

PHPCrawl is a webcrawler/webspider-library written in PHP. It supports filters, limiters, cookie-handling, robots.txt-handling, multiprocessing and much more.

0.9.16 2022-11-29 13:00 UTC

README

Latest Stable Version Total Downloads License

Initially just a copy of http://phpcrawl.cuab.de/ forked from mmerian for using with composer.

Due to the main project now seemingly being abandoned (having no updates for 4 years) I am going to proceed to make any changes/fixes in this repository.

Latest updates

  • PHP 7 Only - Not backwards compatible with 0.8 versions.
  • Introduced namespaces
  • Lots of bug fixes
  • Refactored various class sections
  • Preperation for Windows OS multiprocess mode (pthreads or parallel extension)

Pull requests are welcome