piopi/behatcrawler

A Behat extension that crawls links on a website and executes user-defined function on each one of them.

Installs: 16

Dependents: 0

Suggesters: 0

Security: 0

Stars: 1

Watchers: 1

Forks: 0

Open Issues: 1

Type:behat-extension

dev-master 2020-10-01 18:23 UTC

This package is auto-updated.

Last update: 2024-03-29 04:25:57 UTC


README

PHP Composer

The BehatCrawler is a Behat, MinkExtension and Selenium2Driver extension that crawls a given URL and executes user-defined functions in each crawled page.

Multiple options for crawling are available, see available options.

Installation

composer require piopi/behatcrawler

Usage

Start by importing the extension, to your Feature Context (or any of your Context):

use Behat\Crawler\Crawler;

Create your Crawler object with the default configuration:

The crawler is only compatible at this time with Selenium2Driver

//$crawler=New Crawler(BehatSession);
$crawler= New Crawler($this->getSession());

For custom settings (passed as an array), see the following table for all the available options.

$crawler= New Crawler($this->getSession(),["internalLinksOnly"=>true,"HTMLOnly"=>true,'MaxCrawl'=>20]);

Available options: (More functionalities coming soon)

Option Description Default Value
Depth Maximum depth that can be crawled from URL 0 (unlimited)
MaxCrawl Maximum number of crawls 0 (unlimited)
HTMLOnly Will only crawl HTML/xHTML pages true
internalLinksOnly Will crawl internal links only (links with same Domaine name as the initial URL) true
waitForCrawl Will wait for the crawler to finish crawling before throwing any exception originating from the user defined functions. (Compile a list of all exceptions found with their respective location) false

Option can either be set in the constructor or with the appropriate getters/setters:

 $crawler= New Crawler($this->getSession(),["MaxCrawl"=>10]);
//or
$crawler->setMaximumCrawl(10);

Start Crawling

After creating and setting up the crawler, you can start crawling by passing your function as an argument:

Please refer to the PHP Callables documentation for more details.

Examples:

Closure::fromCallable is used to pass by parameter private function

//function 1 is a private function
$crawler->startCrawling(Closure::fromCallable([$this, 'function1']));
//function 2 is a public class function
$crawler->startCrawling([$this, 'function1']);

For functions with one or more arguments, they can be passed as the following:

$crawler->startCrawling(Closure::fromCallable([$this, 'function3']),[arg1]);
$crawler->startCrawling(Closure::fromCallable([$this, 'function4']),[arg1,arg2]);

Usage Example

use Behat\Crawler\Crawler;
//Crawler with different settings
$crawler= New Crawler($this->getSession(),["internalLinksOnly"=>true,"HTMLOnly"=>true,'MaxCrawl'=>20,"waitForCrawl"=>true]);
//Function without arguments
$crawler->startCrawling(Closure::fromCallable([$this, 'function1'])); //Will start crawling
//Function with one or more argument
$crawler->startCrawling(Closure::fromCallable([$this, 'function2']),[arg1,arg2]);

In a Behat step function:

   /**
     * @Given /^I crawl the website with a maximum of (\d+) level$/
     */
    public function iCrawlTheWebsiteWithAMaximumOfLevel($arg1)
    {
        $crawler= New Crawler($this->getSession(),["Depth"=>$arg1]);
        $crawler->startCrawling([$this, 'test']);
    }

Copyright

Copyright (c) 2020 Mostapha El Sabah elsabah.mostapha@gmail.com

Maintainers

Mostapha El Sabah Piopi