mediashare / crawler
Crawl urls from a webpage and provide a DomCrawler with Scraper Library
0.2.8
2021-11-27 19:44 UTC
Requires
- league/climate: ^3.5
- mediashare/scraper: *
Requires (Dev)
- tracy/tracy: ^2.7
This package is auto-updated.
Last update: 2024-11-12 20:43:14 UTC
README
💫 Crawl urls from a webpage and provide a DomCrawler with Scraper Library.
DomCrawler
Scraper use DomCrawler library. This is symfony component for DOM navigation for HTML and XML documents. You can retrieve Documentation Here.
Installation
composer require mediashare/crawler
Usage
<?php require 'vendor/autoload.php'; use Mediashare\Crawler\Crawler; $crawler = new Crawler("https://mediashare.fr"); $crawler->run(); dump($crawler);
With Config
<?php require 'vendor/autoload.php'; use Mediashare\Crawler\Crawler; use Mediashare\Crawler\Config; $config = new Config(); $config->setWebspider(true); // All website crawling $config->setVerbose(true); // Prompt progress bar $config->setPathRequires(['/Kernel/']); // Not crawl other path $config->setPathExceptions(['/CodeSnippet/']); // Not crawl this path $crawler = new Crawler("https://mediashare.fr", $config); $crawler->run(); dump($crawler);