deloz / botium
A light web crawl written in PHP
0.1.0
2015-07-09 07:38 UTC
Requires
- php: >=5.6.0
- guzzlehttp/guzzle: 6.0.2
- symfony/css-selector: ^2.7
- symfony/dom-crawler: ^2.7
Requires (Dev)
- phpunit/php-invoker: *
- phpunit/phpunit: 5.0.x-dev
This package is auto-updated.
Last update: 2025-01-15 20:44:35 UTC
README
A light web crawl written in PHP.
Installation
-
Install composer:
curl -sS https://getcomposer.org/installer | php
You can add Botium as a dependency using the composer.phar CLI:
php composer.phar require deloz/botium:~0.1
-
Alternatively, you can specify Botium as dependency in your project's existing composer.json file:
{ "require": { "deloz/botium": "~0.1" } }
-
After installing, you need to require Composer's autoloader:
require 'vendor/autoload.php';
Running the tests
cd tests
php runtest.php
Usage
$settings
must contain baseUrl
, eg:
$settings = [ 'baseUrl' => 'www.douban.com', 'debug' => true, 'interval' => 10, 'every' => 5, ];
every site is a Class which inherit from Deloz\Botium\Botium
with overriding the methods as blow:
namespace Tests; use Symfony\Component\DomCrawler\Crawler; use Deloz\Botium\Response; use Deloz\Botium\Botium; class Haixiu extends Botium { public function start() { $res = $this->crawl('http://www.douban.com/group/haixiuzu/discussion'); $res and $this->index($res); } public function index(Response $res) { $res->doc('td.title > a')->each(function (Crawler $node, $i) { $link = $node->attr('href'); if ($link) { $res = $this->crawl($link); $res and $this->detail($res); } }); } public function detail(Response $res) { $title = $res->doc('#content > h1')->text(); $author = $res->doc('#content > div > div.article > div.topic-content.clearfix > div.topic-doc > h3 > span.from > a')->text(); $images = []; $res->doc('div.topic-content > div.topic-figure.cc img')->each(function (Crawler $node, $i) use (&$images, $res) { $img = $node->attr('src'); if ($img) { $images[] = $img; } }); $this->result([ 'title' => $title, 'author' => $author, 'images' => $images, ]); } public function result(array $item = []) { var_dump($item); } }
more examples, see directory tests
License
licensed using the MIT license