innmind/crawler-app

Crawl the web and publish the graph to an api

1.5.2 2020-10-25 17:28 UTC

This package is auto-updated.

Last update: 2024-11-29 05:38:41 UTC


README

Build Status codecov Type Coverage

This is an app to crawl internet and publish resource attributes to a Library.

Installation

composer install
docker-compose up -d

Copy config/.env.dist to config/.env and adapt the url of the amqp server to your need.

Usage

bin/crawler consume crawler

This will launch a consumer to read the urls to crawl

bin/console crawl http://the.url/to/crawl https://innmind_library.host/

This will crawl http://the.url/to/crawl, extract the resource attributes and publish them to the library https://innmind_library.host/. It will automatically detect the api resource to publish to.