tomaj/meta-scraper

Page meta scraper library

4.1.0 2020-06-05 09:29 UTC

This package is auto-updated.

Last update: 2020-07-05 11:27:06 UTC


README

Build Status Code Climate Test Coverage

SensioLabsInsight

Page meta scraper parse meta information from page.

Instalation

via composer:

composer require tomaj/meta-scraper

How to use

Example:

use Tomaj\Scraper\Scraper;
use Tomaj\Scrapper\Parser\OgParser;
$scraper = new Scraper();
$parsers = [new OgParser()];
$meta = $scraper->parse(file_get_contents('http://www.google.com/'), $parsers);
var_dump($meta);

or you can use parseUrl method (internaly use Guzzle library)

use Tomaj\Scraper\Scraper;
use Tomaj\Scrapper\Parser\OgParser;
$scraper = new Scraper();
$parsers = [new OgParser()];
$meta = $scraper->parseUrl('http://www.google.com/', $parsers);
var_dump($meta);

Parsers

There are 2 parsers included in package and you can crate new implementing interface Tomaj\Scraper\Parser\ParserInterface.

2 parsers:

  • Tomaj\Scraper\Parser\OgParsers - based on og meta attributes in html
  • Tomaj\Scraper\Parser\SchemaParser - based on schema json structure

You can combine these parsers. Data that will not fe found in first parser will be replaced with data from second parser.

use Tomaj\Scraper\Scraper;
use Tomaj\Scrapper\Parser\SchemaParser;
use Tomaj\Scrapper\Parser\OgParser;
$scraper = new Scraper();
$parsers = [new SchemaParser(), new OgParser()];
$meta = $scraper->parseUrl('http://www.google.com/', $parsers);
var_dump($meta);