doghouseagency / puphpeteer
A Puppeteer bridge for PHP, supporting the entire API.
Requires
- php: >=7.3
- ext-json: *
- nigelcunningham/rialto: master
- psr/log: ^1.0 || ^2.0 || ^3.0
- vierbergenlars/php-semver: master
Requires (Dev)
- monolog/monolog: ^2.0
- phpunit/phpunit: ^9.0
- symfony/console: ^4.0|^5.0|^6.0
- symfony/process: ^4.0|^5.0|^6.0
This package is not auto-updated.
Last update: 2024-11-07 11:12:43 UTC
README
PHP 8.1 compatibility:
This package relies on php-semver, which is currently (28 Feb 2023) not patched with PHP 8.1 support. A patch is available at https://github.com/RobinDev/php-semver/tree/patch-1, which can be installed using composer in the following way:
In your project that requires puphpeteer, add a custom repository:
"RobinDev/php-semver": {
"type": "vcs",
"url": "https://github.com/RobinDev/php-semver"
}
Then composer require the upgraded versions of Puphpeteer, Rialto and php-semver:
composer require nigelcunningham/puphpeteer:master nigelcunningham/rialto:master vierbergenlars/php-semver:dev-patch-1
This will pull in the forked php-semver.
I'm typing this having just completed getting the above steps to work for me. If furhter modifications are required, I'll fork php-semver too and modify the above.
===
A Puppeteer bridge for PHP, supporting the entire API. Based on Rialto, a package to manage Node resources from PHP.
Here are some examples borrowed from Puppeteer's documentation and adapted to PHP's syntax:
Example - navigating to https://example.com and saving a screenshot as example.png:
use NigelCunningham\Puphpeteer\Puppeteer;
$puppeteer = new Puppeteer;
$browser = $puppeteer->launch();
$page = $browser->newPage();
$page->goto('https://example.com');
$page->screenshot(['path' => 'example.png']);
$browser->close();
Example - evaluate a script in the context of the page:
use NigelCunningham\Puphpeteer\Puppeteer;
use NigelCunningham\Rialto\Data\JsFunction;
$puppeteer = new Puppeteer;
$browser = $puppeteer->launch();
$page = $browser->newPage();
$page->goto('https://example.com');
// Get the "viewport" of the page, as reported by the page.
$dimensions = $page->evaluate(JsFunction::createWithBody("
return {
width: document.documentElement.clientWidth,
height: document.documentElement.clientHeight,
deviceScaleFactor: window.devicePixelRatio
};
"));
printf('Dimensions: %s', print_r($dimensions, true));
$browser->close();
Example - send post request:
$puppeteer = new Puppeteer();
$browser = $puppeteer->launch();
$page = $browser->newPage();
$page->setRequestInterception(true);
$page->on('request', new JsFunction(
['interceptedRequest'],
"
var data = {
'method': 'POST',
'postData': '" . http_build_query($options['form_params']) . "'
};
interceptedRequest.continue(data);
"
));
$response = $page->goto('https://example.com');
Requirements and installation
This package requires PHP >= 7.3 and Node >= 8.
Install it with these two command lines:
composer require nesk/puphpeteer
npm install @nesk/puphpeteer
Notable differences between PuPHPeteer and Puppeteer
Puppeteer's class must be instantiated
Instead of requiring Puppeteer:
const puppeteer = require('puppeteer');
You have to instantiate the Puppeteer
class:
$puppeteer = new Puppeteer;
This will create a new Node process controlled by PHP.
You can also pass some options to the constructor, see Rialto's documentation. PuPHPeteer also extends these options:
[
// Logs the output of Browser's console methods (console.log, console.debug, etc...) to the PHP logger
'log_browser_console' => false,
]
⏱ Want to use some timeouts higher than 30 seconds in Puppeteer's API?
If you use some timeouts higher than 30 seconds, you will have to set a higher value for the `read_timeout` option (default: `35`): ```php $puppeteer = new Puppeteer([ 'read_timeout' => 65, // In seconds ]); $puppeteer->launch()->newPage()->goto($url, [ 'timeout' => 60000, // In milliseconds ]); ```
No need to use the await
keyword
With PuPHPeteer, every method call or property getting/setting is synchronous.
Some methods have been aliased
The following methods have been aliased because PHP doesn't support the $
character in method names:
$
=>querySelector
$$
=>querySelectorAll
$x
=>querySelectorXPath
$eval
=>querySelectorEval
$$eval
=>querySelectorAllEval
Use these aliases just like you would have used the original methods:
$divs = $page->querySelectorAll('div');
Evaluated functions must be created with JsFunction
Functions evaluated in the context of the page must be written with the JsFunction
class, the body of these functions must be written in JavaScript instead of PHP.
use NigelCunningham\Rialto\Data\JsFunction;
$pageFunction = JsFunction::createWithParameters(['element'])
->body("return element.textContent");
Exceptions must be caught with ->tryCatch
If an error occurs in Node, a Node\FatalException
will be thrown and the process closes, you will have to create a new instance of Puppeteer
.
To avoid that, you can ask Node to catch these errors by prepending your instruction with ->tryCatch
:
use NigelCunningham\Rialto\Exceptions\Node;
try {
$page->tryCatch->goto('invalid_url');
} catch (Node\Exception $exception) {
// Handle the exception...
}
Instead, a Node\Exception
will be thrown, the Node process will stay alive and usable.
Puppeteer plugins
To use puppeteer-extra plugins add them to your project:
npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth
Then override js inclusion with js_extra option
$puppeteer = new Puppeteer([
'js_extra' => /** @lang JavaScript */ "
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());
instruction.setDefaultResource(puppeteer);
"
]);
License
The MIT License (MIT). Please see License File for more information.
Logo attribution
PuPHPeteer's logo is composed of:
- Puppet by Luis Prado from the Noun Project.
- Elephant by Lluisa Iborra from the Noun Project.
Thanks to Laravel News for picking the icons and colors of the logo.