A powerful web crawler and web scraper with Blackfire support

Fund package maintenance!

Installs: 6 988

Dependents: 0

Suggesters: 0

Security: 0

Stars: 462

Watchers: 35

Forks: 48

Open Issues: 13


v1.9.0 2020-06-23 15:05 UTC


Blackfire Player is a powerful Web Crawling, Web Testing, and Web Scraper application. It provides a nice DSL to crawl HTTP services, assert responses, and extract data from HTML/XML/JSON responses.

Some Blackfire Player use cases:

  • Crawl a website/API and check expectations -- aka Acceptance Tests;
  • Scrape a website/API and extract values;
  • Monitor a website;
  • Test code with unit test integration (PHPUnit, Behat, Codeception, ...);
  • Test code behavior from the outside thanks to the native Blackfire Profiler integration -- aka Unit Tests from the HTTP layer (tm).

Blackfire Player executes scenarios written in a special DSL (files should end with .bkf).


Running .bkf files can be done via the Blackfire Player:

curl -OLsS

Use php blackfire-player.phar to run the player or make it executable and move it to a directory under your PATH:

chmod +x blackfire-player.phar
mv blackfire-player.phar /usr/local/bin/blackfire-player


Blackfire Player is licensed under the MIT Open-Source license. Its source code is hosted on Github.


Use the run command to execute a scenario file:

blackfire-player run scenario.bkf

You can also run scenarios contained in a .blackfire.yaml file:

blackfire-player run .blackfire.yaml

Use the --endpoint option to override the endpoint defined in the scenario file:

blackfire-player run scenario.bkf --endpoint=

Use the --json option to output a JSON report:

blackfire-player run scenario.bkf --json

Use the --variable option to override variable values:

blackfire-player run scenario.bkf --variable "foo=bar" --variable "bar=foo"

Use the --concurrency option to run scenarios in parallel (experimental):

blackfire-player run scenario.bkf --concurrency=5

Use -v to get logs about the progress of the player or use tracer option to store all requests and responses on disk.

The command returns 1 if at least one scenario fails, 0 otherwise.

Crawling an HTTP application

Blackfire Player lets you crawl an application thanks to descriptive scenarios written in a domain specific language:

name "A build made of scenario"

# Default endpoint
# Can be override with option "--endpoint="
endpoint ""

    name "Scenario Name"

    visit url('/')
        expect status_code() == 200

This example shows how to make a request on an HTTP application ( and be sure that it behaves the way you expect it to by Writing Expectations (the status code of the response is 200).

Store the scenario in a scenario.bkf, and run it:

blackfire-player run scenario.bkf

# or
php blackfire-player run scenario.bkf

Add more requests to a scenario by indenting lines as below:

    visit url('/')
        expect status_code() == 200

    visit url('/blog/')
        expect status_code() == 200


The line indentation defines the structure like for Python scripts or YAML files. Validate bkf files with the validate command: blackfire-player validate scenario.bkf.

A scenario is a sequence of HTTP calls (steps) that share the HTTP session and cookies. Scenario definitions are declarative, the order of settings (like expectations) within a "step" does not matter.

Instead of making discrete requests like above, you can also interact with the HTTP response if the content type is HTML by clicking on links, submitting forms, or follow redirections (see Making requests for more information):

    visit url('/')
        expect status_code() == 200

    click link('Read more')
        expect status_code() == 200


If your scenario does not work as expected, use -v to get a more verbose output.


You can add comments in a scenario file by prefixing the line with #:

# This is a comment
    # Comment are ignored
    visit url('/')
        expect status_code() == 200

Making Requests

There are several ways you can jump from one HTTP request to the next.

Visiting a Page with visit

visit goes directly to the referenced HTTP URL (defaults to the GET HTTP method unless you define one explicitly):

    visit url('/')
        method 'POST'

You can also pass a Request body:

    visit url('/')
        method 'PUT'
        body '{ "title": "New Title" }'


An expression can be written on several lines with the following syntax:

    visit url('/login')
        method 'POST'
            "user": "john",
            "password": "doe"

Clicking on a Link with click

click clicks on a link in an HTML page (takes an expression as an argument):

    click link("Add a blog post")

Submitting Forms with submit

submit submits a form in an HTML page (takes an expression as an argument); parameters to submit with the form are defined via param entries:

    submit button("Submit")
        param title 'Happy Scraping'
        param content 'Scraping with Blackfire Player is so easy!'

        # File Upload:
        # the path is relative to the current .bkf file
        # the name parameter is optional
        param image file('relative/path/to/image.png', 'blackfire.png')

Values can also be randomly generated via the fake() function:

    submit button("Submit")
        param title fake('sentence', 5)
        param content join(fake('paragraphs', 3), "\n\n")


fake() use the Faker library under the hood.

Following Redirections

HTTP redirections are never followed automatically to let you write expectations and assertions on redirect responses:

    visit "redirect.php"
        expect status_code() == 302
        expect header('Location') == '/redirected.php'

Use follow to follow one redirection:

    visit "redirect.php"
        expect status_code() == 302
        expect header('Location') == '/redirected.php'

        expect status_code() == 200

follow_redirects switches the player to automatically follow all redirections:

    follow_redirects true


    visit "redirect.php"

Please note that when using follow_redirects, expectations (expect) and assertions (assert) are checked on the redirecting response (so, before the redirection). Use a follow step if you need to check them after the redirection.

Embedding Scenarios with include

include allows to embed some repetitive steps into several scenarios to avoid copy/pasting the same code over and over again:

In a groups.bkf file, write a group that contains the logic to log in:

group login
    visit url('/login')
        expect status_code() == 200

    submit button('Login')
        param user 'admin'
        param password 'admin'

Then, in another file, load the group and include it when you need it:

load "groups.bkf"

    name "Scenario Name"

    include login

    visit url('/admin')
        expect status_code() == 200

Configuring the Request

Each step can be configured via the following options.

Setting a Header with header

header sets a header:

    visit url('/')
        header "Accept-Language: en-US"


Simulate a specific browser is as simple as overriding the default User-Agent and using fake():

    visit url('/')
        header 'User-Agent: ' ~ fake('firefox')

Setting a User and Password with auth

auth sets the Authorization header:

    visit url('/')
        auth "username:password"

Waiting after sending the request with wait

wait adds a delay in milliseconds after sending the request:

    visit url('/')
        wait 10000

The wait value can be any valid expression; get a random delay by using fake():

    visit url('/')
        wait fake('numberBetween', 1000, 3000)

Sending a JSON Body with json

json configures the Request to upload JSON encoded data as the body:

    visit url('/')
        method 'POST'
        param foo "bar"
        json true

Setting Options for all Steps

You can also set some of these options for all steps of a scenario:

    auth "username:password"
    header "Accept-Language: en-US"

... which can be disabled on any given step by setting the value to false:

    visit url('/')
        header "Accept-Language: false"
        auth false

Writing Expectations

Expectations are expressions evaluated against the current HTTP response and if one of them returns a falsy value, Blackfire Player stops the run and generates an error.

Expressions have access to the following functions:

  • current_url(): Returns the current URL
  • status_code(): The HTTP status code for the current HTTP response;
  • header(): Returns the value of an HTTP header;
  • body(): The HTTP body for the current HTTP response;
  • trim(): Strip whitespace from the beginning and end of a string;
  • unique(): Removes duplicate values from an array;
  • join(): Join array elements with a string;
  • merge(): Merge one or more arrays;
  • regex(): Perform a regular expression match;
  • css(): Returns nodes matching the CSS selector (for HTML responses);
  • xpath(): Returns nodes matching the XPath selector (for HTML and XML responses);
  • json(): Returns JSON elements (from the request) matching the CSS expression.
  • transform(): Returns JSON elements matching the CSS expression.

The css() and xpath() functions return Symfony\Component\DomCrawler\Crawler instances. Learn more about methods you can call on Crawler instances; the json() function returns a PHP array.

The json() function accepts JMESPath.

The result of calling functions can be checked via operators described.


Learn more about Expressions syntax in the Symfony documentation.

Here are some expression examples:

# return all HTML nodes matching ".post h2 a"
css(".post h2 a")

# return the text of the first node matching ".post h2 a"
css(".post h2 a").first().text()

# return the href attribute of the first node matching ".post h2 a"
css(".post h2 a").first().attr("href")

# check that "h1" contains "Welcome"
css("h1:contains('Welcome')").count() > 0

# same as above
css("h1").first().text() matches "/Welcome/"

# return the Age request HTTP header

# check that the HTML body contains "Welcome"
body() matches "/Welcome/"

# get a value

# get keys

Using Variables

Variables can be defined to make your scenarios dynamic. Use set to define the default value:

    name "HTTP Cache"
    set env "dev"
    set urls [ ... ]

    when "prod" == env
        with url in urls
            # check HTTP cache, but only on production

And override it with the --variable option on the CLI:

blackfire-player run scenario.bkf --variable env=prod

Organizing Scenario Files

To run scenarios defined in several files, you can use load instead of listing all the files as arguments to the player:

# load and execute all scenarios from files in this directory
load "*.bkf"

# load and execute all scenarios from files in all sub-directories
load "**/*.bkf"

Blackfire Profiler integration

Blackfire Player integrates seamlessly with Blackfire Profiler. Read out the dedicated documentation to learn more about Blackfire Profiler integration.

Scraping Values

When crawling an HTTP application, you can extract values from HTTP responses:

    visit url('/')
        expect status_code() == 200
        set latest_post_title css(".post h2").first()
        set latest_post_href css(".post h2 a").first().attr("href")
        set latest_posts css(".post h2 a").extract('_text', 'href')
        set age header("Age")
        set content_type header("Content-Type")
        set token regex('/name="_token" value="([^"]+)"/')

set takes two arguments:

  • The name of the variable you want to store the value in;
  • An expression to evaluate.

Using json(), css(), and xpath() on JSON, HTML, and XML responses is recommended, but for pure text responses or complex values, you can use the generic regex() function.


regex() takes a regex as an argument and always returns the first match. Note that backslashes must be escaped by doubling them: "/\\.git/".

The values are also available at the end of a crawling session:

# use --json to display a report including variable values
blackfire-player run scenario.bkf --json

Variable values can also be injected before running another scenario:

    name "Scenario name"
    auth api_username ~ ':' ~ api_password
    set profile_uuid 'zzzz'

    visit url('/profiles' ~ profile_uuid)
        expect status_code() == 200
        set sql_queries json('arguments."sql.pdo.queries".keys(@)')
        set store_url json("")

    visit url(store_url)
        method 'POST'
        body '{ "foo": "batman" }'
        expect status_code() == 200