This package is abandoned and no longer maintained. No replacement package was suggested.

A simple PHP Web Scraper

v1.0.1 2013-03-08 08:00 UTC

Goutte is a screen scraping and web crawling library for PHP.

Goutte provides a nice API to crawl websites and extract data from the HTML/XML responses.


Goutte works with PHP 5.3.3 or later.


Installing Goutte is as easy as it can get. Download the Goutte.phar file and you're done!


Require the Goutte phar file to use Goutte in a script:

require_once '/path/to/goutte.phar';

Create a Goutte Client instance (which extends Symfony\Component\BrowserKit\Client):

use Goutte\Client;

$client = new Client();

Make requests with the request() method:

$crawler = $client->request('GET', '');

The method returns a Crawler object (Symfony\Component\DomCrawler\Crawler).

Click on links:

$link = $crawler->selectLink('Plugins')->link();
$crawler = $client->click($link);

Submit forms:

$form = $crawler->selectButton('sign in')->form();
$crawler = $client->submit($form, array('signin[username]' => 'fabien', 'signin[password]' => 'xxxxxx'));

Extract data:

$nodes = $crawler->filter('.error_list');
if ($nodes->count())
  die(sprintf("Authentication error: %s\n", $nodes->text()));

printf("Nb tasks: %d\n", $crawler->filter('#nb_tasks')->text());

More Information

Read the documentation of the BrowserKit and DomCrawler Symfony Components for more information about what you can do with Goutte.

Technical Information

Goutte is a thin wrapper around the following fine PHP libraries:

  • Symfony Components: BrowserKit, ClassLoader, CssSelector, DomCrawler, Finder, and Process

  • Guzzle


Goutte is licensed under the MIT license.