innmind / crawler

Library to extract meaningful informations out of a webpage

Maintainers

Package info

github.com/Innmind/Crawler

Language:HTML

pkg:composer/innmind/crawler

Transparency log

Statistics

Security

Aikido package health analysis

6.1.0 2021-02-13 11:35 UTC

Requires

Requires (Dev)

Suggests

None

Provides

None

Conflicts

None

Replaces

None

MIT 3225b8748e742bc9fd82b220334a87e772021c3c

Baptiste Langlade <langlade.baptiste.woop@gmail.com>

Published on Packagist.org on 2021-02-13 11:36 UTC

This package is auto-updated.

Last update: 2026-07-15 13:30:31 UTC

README

This tool allows you to extract a lot of useful informations out of a web page (may it be html, an image, or anything else).

Installation

composer require innmind/crawler

Usage

use function Innmind\Crawler\bootstrap;
use Innmind\OperatingSystem\Factory;
use Innmind\UrlResolver\UrlResolver;
use Innmind\Url\Url;
use Innmind\Http\{
    Message\Request\Request,
    Message\Method\Method,
    ProtocolVersion,
};
use function Innmind\Html\bootstrap as reader;

$os = Factory::build();

$crawl = bootstrap(
    $os->remote()->http(),
    $os->clock(),
    reader(),
    new UrlResolver
);

$resource = $crawl(
    new Request(
        Url::of('https://en.wikipedia.org/wiki/H2g2'),
        new Method('GET'),
        new ProtocolVersion(2, 0),
    ),
);

Here $resource is an instance of HttpResource.