piopi / behatcrawler
A Behat extension that crawls links on a website and executes user-defined function on each one of them.
Installs: 16
Dependents: 0
Suggesters: 0
Security: 0
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 1
Type:behat-extension
Requires
- php: >=7.4
- behat/behat: 3.*
- behat/mink-extension: *
- behat/mink-selenium2-driver: *
- guzzlehttp/psr7: ^1.6
This package is auto-updated.
Last update: 2024-03-29 04:25:57 UTC
README
The BehatCrawler is a Behat, MinkExtension and Selenium2Driver extension that crawls a given URL and executes user-defined functions in each crawled page.
Multiple options for crawling are available, see available options.
Installation
composer require piopi/behatcrawler
Usage
Start by importing the extension, to your Feature Context (or any of your Context):
use Behat\Crawler\Crawler;
Create your Crawler object with the default configuration:
The crawler is only compatible at this time with Selenium2Driver
//$crawler=New Crawler(BehatSession); $crawler= New Crawler($this->getSession());
For custom settings (passed as an array), see the following table for all the available options.
$crawler= New Crawler($this->getSession(),["internalLinksOnly"=>true,"HTMLOnly"=>true,'MaxCrawl'=>20]);
Available options: (More functionalities coming soon)
Option | Description | Default Value |
---|---|---|
Depth | Maximum depth that can be crawled from URL | 0 (unlimited) |
MaxCrawl | Maximum number of crawls | 0 (unlimited) |
HTMLOnly | Will only crawl HTML/xHTML pages | true |
internalLinksOnly | Will crawl internal links only (links with same Domaine name as the initial URL) | true |
waitForCrawl | Will wait for the crawler to finish crawling before throwing any exception originating from the user defined functions. (Compile a list of all exceptions found with their respective location) | false |
Option can either be set in the constructor or with the appropriate getters/setters:
$crawler= New Crawler($this->getSession(),["MaxCrawl"=>10]); //or $crawler->setMaximumCrawl(10);
Start Crawling
After creating and setting up the crawler, you can start crawling by passing your function as an argument:
Please refer to the PHP Callables documentation for more details.
Examples:
Closure::fromCallable is used to pass by parameter private function
//function 1 is a private function $crawler->startCrawling(Closure::fromCallable([$this, 'function1'])); //function 2 is a public class function $crawler->startCrawling([$this, 'function1']);
For functions with one or more arguments, they can be passed as the following:
$crawler->startCrawling(Closure::fromCallable([$this, 'function3']),[arg1]); $crawler->startCrawling(Closure::fromCallable([$this, 'function4']),[arg1,arg2]);
Usage Example
use Behat\Crawler\Crawler; //Crawler with different settings $crawler= New Crawler($this->getSession(),["internalLinksOnly"=>true,"HTMLOnly"=>true,'MaxCrawl'=>20,"waitForCrawl"=>true]); //Function without arguments $crawler->startCrawling(Closure::fromCallable([$this, 'function1'])); //Will start crawling //Function with one or more argument $crawler->startCrawling(Closure::fromCallable([$this, 'function2']),[arg1,arg2]);
In a Behat step function:
/** * @Given /^I crawl the website with a maximum of (\d+) level$/ */ public function iCrawlTheWebsiteWithAMaximumOfLevel($arg1) { $crawler= New Crawler($this->getSession(),["Depth"=>$arg1]); $crawler->startCrawling([$this, 'test']); }
Copyright
Copyright (c) 2020 Mostapha El Sabah elsabah.mostapha@gmail.com
Maintainers
Mostapha El Sabah Piopi