README

The BehatCrawler is a Behat, MinkExtension and Selenium2Driver extension that crawls a given URL and executes user-defined functions in each crawled page.

Multiple options for crawling are available, see available options.

Installation

composer require piopi/behatcrawler

Usage

Start by importing the extension, to your Feature Context (or any of your Context):

use Behat\Crawler\Crawler;

Create your Crawler object with the default configuration:

The crawler is only compatible at this time with Selenium2Driver

//$crawler=New Crawler(BehatSession);
$crawler= New Crawler($this->getSession());

For custom settings (passed as an array), see the following table for all the available options.

$crawler= New Crawler($this->getSession(),["internalLinksOnly"=>true,"HTMLOnly"=>true,'MaxCrawl'=>20]);

Available options: (More functionalities coming soon)

Option	Description	Default Value
Depth	Maximum depth that can be crawled from URL	0 (unlimited)
MaxCrawl	Maximum number of crawls	0 (unlimited)
HTMLOnly	Will only crawl HTML/xHTML pages	true
internalLinksOnly	Will crawl internal links only (links with same Domaine name as the initial URL)	true
waitForCrawl	Will wait for the crawler to finish crawling before throwing any exception originating from the user defined functions. (Compile a list of all exceptions found with their respective location)	false

Option can either be set in the constructor or with the appropriate getters/setters:

 $crawler= New Crawler($this->getSession(),["MaxCrawl"=>10]);
//or
$crawler->setMaximumCrawl(10);

Start Crawling

After creating and setting up the crawler, you can start crawling by passing your function as an argument:

Please refer to the PHP Callables documentation for more details.

Examples:

Closure::fromCallable is used to pass by parameter private function

//function 1 is a private function
$crawler->startCrawling(Closure::fromCallable([$this, 'function1']));
//function 2 is a public class function
$crawler->startCrawling([$this, 'function1']);

For functions with one or more arguments, they can be passed as the following:

$crawler->startCrawling(Closure::fromCallable([$this, 'function3']),[arg1]);
$crawler->startCrawling(Closure::fromCallable([$this, 'function4']),[arg1,arg2]);

Usage Example

use Behat\Crawler\Crawler;
//Crawler with different settings
$crawler= New Crawler($this->getSession(),["internalLinksOnly"=>true,"HTMLOnly"=>true,'MaxCrawl'=>20,"waitForCrawl"=>true]);
//Function without arguments
$crawler->startCrawling(Closure::fromCallable([$this, 'function1'])); //Will start crawling
//Function with one or more argument
$crawler->startCrawling(Closure::fromCallable([$this, 'function2']),[arg1,arg2]);

In a Behat step function:

   /**
     * @Given /^I crawl the website with a maximum of (\d+) level$/
     */
    public function iCrawlTheWebsiteWithAMaximumOfLevel($arg1)
    {
        $crawler= New Crawler($this->getSession(),["Depth"=>$arg1]);
        $crawler->startCrawling([$this, 'test']);
    }

Copyright

Maintainers

Mostapha El Sabah Piopi

piopi / behatcrawler

Maintainers

Details

README

Installation

Usage

Available options: (More functionalities coming soon)

Start Crawling

Usage Example

Copyright

Maintainers