socloz / goutte
A simple PHP Web Scraper
Installs: 20 167
Dependents: 1
Suggesters: 0
Security: 0
Stars: 0
Watchers: 11
Forks: 918
Type:application
Requires
- php: >=5.3.0
- ext-curl: *
- guzzle/guzzle: >=3.0, <3.4
- symfony/browser-kit: ~2.1
- symfony/css-selector: ~2.1
- symfony/dom-crawler: ~2.1
- symfony/finder: ~2.1
- symfony/process: ~2.1
This package is not auto-updated.
Last update: 2020-01-20 05:26:49 UTC
README
Goutte is a screen scraping and web crawling library for PHP.
Goutte provides a nice API to crawl websites and extract data from the HTML/XML responses.
Requirements
Goutte works with PHP 5.3.3 or later.
Installation
Installing Goutte is as easy as it can get. Download the Goutte.phar
file and you're done!
Usage
Require the Goutte phar file to use Goutte in a script:
require_once '/path/to/goutte.phar';
Create a Goutte Client instance (which extends
Symfony\Component\BrowserKit\Client
):
use Goutte\Client;
$client = new Client();
Make requests with the request()
method:
$crawler = $client->request('GET', 'http://www.symfony-project.org/');
The method returns a Crawler
object
(Symfony\Component\DomCrawler\Crawler
).
Click on links:
$link = $crawler->selectLink('Plugins')->link();
$crawler = $client->click($link);
Submit forms:
$form = $crawler->selectButton('sign in')->form();
$crawler = $client->submit($form, array('signin[username]' => 'fabien', 'signin[password]' => 'xxxxxx'));
Extract data:
$nodes = $crawler->filter('.error_list');
if ($nodes->count())
{
die(sprintf("Authentication error: %s\n", $nodes->text()));
}
printf("Nb tasks: %d\n", $crawler->filter('#nb_tasks')->text());
More Information
Read the documentation of the BrowserKit and DomCrawler Symfony Components for more information about what you can do with Goutte.
Technical Information
Goutte is a thin wrapper around the following fine PHP libraries:
-
Symfony Components: BrowserKit, ClassLoader, CssSelector, DomCrawler, Finder, and Process
License
Goutte is licensed under the MIT license.