mediashare/crawler

Crawl urls from a webpage and provide a DomCrawler with Scraper Library

0.1.6 2020-01-27 12:58 UTC

This package is auto-updated.

Last update: 2020-03-27 13:17:26 UTC


README

💫 Crawl urls from a webpage and provide a DomCrawler with Scraper Library.

DomCrawler

Scraper use DomCrawler library. This is symfony component for DOM navigation for HTML and XML documents. You can retrieve Documentation Here.

Installation

composer require mediashare/crawler

Usage

<?php
require 'vendor/autoload.php';

use Mediashare\Crawler\Crawler;

$crawler = new Crawler("https://mediashare.fr");
$crawler->run();
dump($crawler);
With Config
<?php
require 'vendor/autoload.php';

use Mediashare\Crawler\Crawler;
use Mediashare\Crawler\Config;

$config = new Config();
$config->setWebspider(true); // All website crawling
$config->setVerbose(true); // Prompt progress bar
$config->setPathRequires(['/Kernel/']); // Not crawl other path
$config->setPathExceptions(['/CodeSnippet/']); // Not crawl this path

$crawler = new Crawler("https://mediashare.fr", $config);
$crawler->run();
dump($crawler);