tomaj / meta-scraper
Page meta scraper library
Installs: 64 165
Dependents: 0
Suggesters: 0
Security: 0
Stars: 5
Watchers: 3
Forks: 4
Open Issues: 1
Requires
- php: >= 7.1.0
- guzzlehttp/guzzle: ^6.0 | ^7.0
Requires (Dev)
- codeclimate/php-test-reporter: 0.4.4
- phpunit/phpunit: ^8 || ^9
- squizlabs/php_codesniffer: ^3.5
Suggests
- ext-dom: Required for Tomaj\Scraper\Parser\OgDomParser
- ext-libxml: Required for Tomaj\Scraper\Parser\OgDomParser
README
Page meta scraper parse meta information from page.
Installation
via composer:
composer require tomaj/meta-scraper
How to use
Example:
use Tomaj\Scraper\Scraper; use Tomaj\Scraper\Parser\OgParser; $scraper = new Scraper(); $parsers = [new OgParser()]; $meta = $scraper->parse(file_get_contents('http://www.google.com/'), $parsers); var_dump($meta);
or you can use parseUrl
method (internally use Guzzle library)
use Tomaj\Scraper\Scraper; use Tomaj\Scraper\Parser\OgParser; $scraper = new Scraper(); $parsers = [new OgParser()]; $meta = $scraper->parseUrl('http://www.google.com/', $parsers); var_dump($meta);
Parsers
There are 3 parsers included in package and you can create new implementing interface Tomaj\Scraper\Parser\ParserInterface
.
3 parsers:
Tomaj\Scraper\Parser\OgParser
- based on og (Open Graph) meta attributes in html (built on regular expressions)Tomaj\Scraper\Parser\OgDomParser
- also based on og (Open Graph) meta attributes in html (built on php DOM extension)Tomaj\Scraper\Parser\SchemaParser
- based on schema json structure
You can combine these parsers. Data that will not be found in first parser will be replaced with data from second parser.
use Tomaj\Scraper\Scraper; use Tomaj\Scraper\Parser\SchemaParser; use Tomaj\Scraper\Parser\OgParser; $scraper = new Scraper(); $parsers = [new SchemaParser(), new OgParser()]; $meta = $scraper->parseUrl('http://www.google.com/', $parsers); var_dump($meta);