lexxtor/easy-php-crawler

There is no license information available for the latest version (dev-master) of this package.

Simple yet flexible URL crawler.

dev-master / 0.0.x-dev 2017-02-20 10:16 UTC

This package is not auto-updated.

Last update: 2024-05-25 18:33:18 UTC


README

It's a simple yet flexible crawler for parsing URLs and loading content.

Usage Example

<?php

use Lexxtor\EasyPhpCrawler\EasyPhpCrawler;

require 'EasyPhpCrawler.php';

EasyPhpCrawler::go('http://news.yandex.ru', [
    'beforeLoadUrl' => function($url, $crawler) {
        echo $crawler->currentUrlIndex . '/' . $crawler->getQueueSize() . "  $url  ";
    },
    'afterLoadUrlSuccess' => function($url, $content, $crawler) {
        echo 'loaded: ' . strlen($content) . "\n";
    },
    'afterLoadUrlFail' => static function($url, $errorMessage, $crawler) {
        echo 'Error: ' . $errorMessage . "\n";
    },
    'allowUrlRules' => [
        '/\/\/news.yandex.ru\//',
    ],
    'denyUrlRules' => [
        '/search/',
        '/\/$/',
        '/maps/',
        '/themes/',
        '/\?redircnt=/',
    ],
]);