revolution / salvager
Tiny WebCrawler for Laravel using Playwright
Fund package maintenance!
invokable
Installs: 7 343
Dependents: 1
Suggesters: 0
Security: 0
Stars: 6
Watchers: 1
Forks: 3
pkg:composer/revolution/salvager
Requires
- php: ^8.3
- illuminate/support: ^11.0||^12.0
- playwright-php/playwright: ^1.0
Requires (Dev)
- laravel/pint: ^1.26
- mockery/mockery: ^1.6.1
- orchestra/testbench: ^10.8
- phpunit/phpunit: ^12.4
- revolution/laravel-boost-copilot-cli: ^1.0
README
Tiny WebCrawler for Laravel using Playwright.
Version 2
Version 2 has been reworked as a simple package that depends on Playwright. It only implements minimal functionality, since you can use playwright-php/playwright directly.
In addition, version 2.2 now supports the Vercel agent-browser.
Requirements
- PHP >= 8.3
- Laravel >= 11.x
Installation
composer require revolution/salvager
Playwright
Install Playwright browsers:
vendor/bin/playwright-install --browsers
Or install Playwright browsers with OS dependencies:
vendor/bin/playwright-install --with-deps
Vercel agent-browser
Install agent-browser and Chromium globally and run it as a Laravel Process.
npm install -g agent-browser agent-browser install
If you want to use custom Chromium binary, you can specify it in .env file.
# .env SALVAGER_AGENT_BROWSER_PATH=/path/to/agent-browser SALVAGER_AGENT_BROWSER_EXECUTABLE_PATH=/path/to/chromium SALVAGER_AGENT_BROWSER_OPTIONS=
Usage
Playwright
The browser will be terminated when you exit Salvager::browse(), so please obtain any necessary data within the Salvager::browse() closure. The Page object cannot be used outside of Salvager::browse().
use Revolution\Salvager\Facades\Salvager; use Playwright\Page\Page; class SalvagerController { public function __invoke() { Salvager::browse(function (Page $page) use (&$url, &$text) { $page->goto('https://example.com/'); $page->screenshot(config('salvager.screenshots').'example.png'); $url = $page->url(); $text = $page->locator('p')->first()->innerText(); }); dump($url); dump($text); } }
If you want more control, just launch the browser with Salvager::launch().
use Playwright\Browser\BrowserContextInterface; use Revolution\Salvager\Facades\Salvager; /* @var BrowserContextInterface $browser */ $browser = Salvager::launch(); $page = $browser->newPage(); $page->goto('https://example.com/'); // Do something... // Don't forget to close the browser $browser->close();
Vercel agent-browser
use Revolution\Salvager\AgentBrowser; use Revolution\Salvager\Facades\Salvager; Salvager::agent(function (AgentBrowser $agent) use (&$url, &$text, &$html) { $agent->userAgent('Chromium'); $agent->open('https://example.com/'); $agent->screenshot(config('salvager.screenshots').'agent-test.png'); $url = $agent->url(); $text = $agent->text('xpath=//p[1]', '--json'); $html = $agent->html('css=html'); // Run any agent-browser command $result = $agent->run(command: '', args: '', options: ''); $agent->close(); });
Since text() and html() use Playwright's page.locator(), using a CSS selector will result in an error if multiple elements are found. If you want to specify one of multiple elements, use XPath.
LICENSE
MIT