innmind/crawler

Library to extract meaningful informations out of a webpage

Installs: 936

Dependents: 4

Suggesters: 0

Security: 0

Stars: 1

Watchers: 3

Forks: 0

Language:HTML

6.1.0 2021-02-13 11:35 UTC

This package is auto-updated.

Last update: 2024-04-13 18:38:30 UTC


README

Build Status codecov Type Coverage

This tool allows you to extract a lot of useful informations out of a web page (may it be html, an image, or anything else).

Installation

composer require innmind/crawler

Usage

use function Innmind\Crawler\bootstrap;
use Innmind\OperatingSystem\Factory;
use Innmind\UrlResolver\UrlResolver;
use Innmind\Url\Url;
use Innmind\Http\{
    Message\Request\Request,
    Message\Method\Method,
    ProtocolVersion,
};
use function Innmind\Html\bootstrap as reader;

$os = Factory::build();

$crawl = bootstrap(
    $os->remote()->http(),
    $os->clock(),
    reader(),
    new UrlResolver
);

$resource = $crawl(
    new Request(
        Url::of('https://en.wikipedia.org/wiki/H2g2'),
        new Method('GET'),
        new ProtocolVersion(2, 0),
    ),
);

Here $resource is an instance of HttpResource.