innmind / crawler-app
Crawl the web and publish the graph to an api
Installs: 8
Dependents: 0
Suggesters: 0
Security: 0
Stars: 0
Watchers: 4
Forks: 1
Open Issues: 2
Type:project
Requires
- php: ~7.4
- innmind/amqp: ~3.0
- innmind/cli: ~2.0
- innmind/cli-framework: ^1.2
- innmind/crawler: ~6.0
- innmind/genome: ^3.0
- innmind/homeostasis: ~4.0
- innmind/installation-monitor: ~3.0
- innmind/ipc: ~3.0
- innmind/json: ^1.1
- innmind/logger: ~2.0
- innmind/operating-system: ~2.0
- innmind/rest-client: ~8.0
- innmind/robots-txt: ~5.0
- innmind/silent-cartographer: ~2.0
- innmind/stack: ^1.0
- monolog/monolog: ~2.0
- symfony/dotenv: ~5.0
Requires (Dev)
- giorgiosironi/eris: ^0.11.0
- innmind/debug: ~2.0
- phpunit/phpunit: ~8.0
- roave/security-advisories: dev-master
- vimeo/psalm: ~4.0
Provides
- innmind/genome-genes: 3.0
This package is auto-updated.
Last update: 2024-11-29 05:38:41 UTC
README
This is an app to crawl internet and publish resource attributes to a Library.
Installation
composer install docker-compose up -d
Copy config/.env.dist
to config/.env
and adapt the url of the amqp server to your need.
Usage
bin/crawler consume crawler
This will launch a consumer to read the urls to crawl
bin/console crawl http://the.url/to/crawl https://innmind_library.host/
This will crawl http://the.url/to/crawl
, extract the resource attributes and publish them to the library https://innmind_library.host/
. It will automatically detect the api resource to publish to.