ordinary9843 / html-master
Analyzing and crawling the html structure of a static/dynamic website
Installs: 3 691
Dependents: 0
Suggesters: 0
Security: 0
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Requires
- php: >7.1
- guzzlehttp/guzzle: ^6.5
Requires (Dev)
- phpunit/phpunit: >7.0
This package is auto-updated.
Last update: 2024-10-23 09:22:55 UTC
README
Intro
Analyzing and crawling the html structure of a static/dynamic website.
Requirements
This library has the following requirements:
- PHP 7.1+
- NodeJs 12+
- Browser (default browser is
/use/bin/chromium
)
Installation
Requires:
apt-get install nodejs
apt-get install chromium # or `chromium-browser`
Require the package via composer:
composer require ordinary9843/html-master
Usage
Example usage:
<?php require './vendor/autoload.php'; use Ordinary9843\HtmlMaster; $htmlMaster = new HtmlMaster(); // For the first time use of this package, it is recommended to enable the debug mode. $htmlMaster->setDebug(true); // Set the browser path for dynamic mode. $htmlMaster->setExecutablePath('/usr/bin/chromium'); /** * Set the connection time (in seconds) for dynamic mode. * * If you are unable to obtain the dynamic (SPA) HTML. * You can try extending the wait time in seconds to wait for the website JavaScript elements to finish rendering. */ $htmlMaster->setWaitSeconds(5); // Set the connection time (in seconds) for static mode. $htmlMaster->setConnectTimeout(5); $htmlMaster->setTimeout(5); /** * The decision to execute the crawler in static or dynamic mode depends on whether your browser path is correctly set. * Please use `setExecutablePath()` to set the browser path. * * Output: [ * 'title' => '', * 'description' => '', * 'meta' => [ * 'keywords' => '', * 'description' => '', * 'viewport' => '', * 'author' => '', * 'copyright' => '', * 'robots' => '', * 'og' => [], * 'twitter' => [] * ], * 'icons' => [], * 'images' => [], * 'css' => [], * 'js' => [] * ] */ $htmlMaster->parse('https://github.com/ordinary9843'); /** * Get all messages. * * Output: [ * '[INFO] Message.', * '[ERROR] Message.' * ] */ $htmlMaster->getMessages();
Testing
composer test
Licenses
(The MIT License)