helloiamlukas / laravel-chrome
A Laravel Wrapper for Chrome Headless. Get the DOM of any webpage.
Installs: 4 216
Dependents: 0
Suggesters: 0
Security: 0
Stars: 8
Watchers: 1
Forks: 4
Open Issues: 0
Requires
- php: ^7.1
- helloiamlukas/chrome-php: ^2.0
Requires (Dev)
- orchestra/testbench: ^3.6
- phpunit/phpunit: ^6.1|^7.0
This package is auto-updated.
Last update: 2021-09-16 19:48:49 UTC
README
Get the DOM of any webpage by using headless Chrome.
💡 This is a Laravel wrapper of helloiamlukas/chrome-php.
Requirements
This package requires the Puppeteer Chrome Headless Node library.
If you want to install it on Ubuntu 16.04 you can do it like this:
sudo apt-get update
curl -sL https://deb.nodesource.com/setup_8.x | sudo -E bash -
sudo apt-get install -y nodejs gconf-service libasound2 libatk1.0-0 libc6 libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 ca-certificates fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils wget
sudo npm install --global --unsafe-perm puppeteer
sudo chmod -R o+rx /usr/lib/node_modules/puppeteer/.local-chromium
Installation
You can install this package via composer by running:
composer require helloiamlukas/laravel-chrome
After that, the package will automatically register itself.
To publish the configuration file, you need to run:
php artisan vendor:publish --provider="ChromeHeadless\ChromeHeadlessServiceProvider"
This will create a config file at config/chrome.php
.
Configuration
The configuration can be found at config/chrome.php
.
Custom Chrome Path
You can specify a custom path to your Chrome installation.
/* |-------------------------------------------------------------------------- | Chrome Path |-------------------------------------------------------------------------- | | Manually set the path where Google Chrome is installed. | */ 'exec_path' => '/path/to/chrome',
Custom User Agent
You can specify a custom user agent. By default the standard Chrome Headless user agent will be used.
/* |-------------------------------------------------------------------------- | User Agent |-------------------------------------------------------------------------- | | Change the user agent that will be used by Google Chrome. | */ 'user_agent' => 'nice-user-agent',
Timeout
You can specify a timeout after which the process will be killed. The timeout should be given in seconds.
/* |-------------------------------------------------------------------------- | Timeout |-------------------------------------------------------------------------- | | Specify a timeout in seconds. | (null = no timeout) | */ 'timeout' => 10,
If the process runs out of time a Symfony\Component\Process\Exception\ProcessTimedOutException
will be thrown.
Viewport
You can specify a custom viewport that will be used when you make a request. By default the Chrome Headless standard of 800x600px will be used.
/* |-------------------------------------------------------------------------- | Viewport |-------------------------------------------------------------------------- | | Specify a viewport. | */ 'viewport' => [ 'width' => 1920, 'height' => 1080 ],
Blacklist
You can specify a list of regular expressions for files that should not be loaded when you request a website. These expressions will be checked against the url of the file.
/* |-------------------------------------------------------------------------- | Blacklist |-------------------------------------------------------------------------- | | Specify a list of files that should not be loaded. | */ 'blacklist' => [ 'www.google-analytics.com', 'analytics.js' ],
Custom Headers
You can specify custom headers which will be used for the request.
/* |-------------------------------------------------------------------------- | Additional Request Headers |-------------------------------------------------------------------------- | | Specify additional headers. | */ 'headers' => [ 'DNT' => 1 // DO NOT TRACK ],
Usage
Here is a quick example how to use this package:
use ChromeHeadless\ChromeHeadless; $html = ChromeHeadless::url('https://example.com')->getHtml();
Instead of getting the DOM as a string, you can also use thegetDOMCrawler()
method, which will return a Symfony\Component\DomCrawler\Crawler
instance.
use ChromeHeadless\ChromeHeadless; $dom = ChromeHeadless::url('https://example.com')->getDOMCrawler(); $title = $dom->filter('title')->text();
This makes it easy to filter the DOM for specific elements. Check the full documentation here.
Testing
You can run the tests by using
composer test