mediamonks / crawler
Crawl your own website with various clients for SEO and indexing purposes.
Requires
- php: ^5.5|^7.0
- fabpot/goutte: ^3.0
- league/uri: ^4.2
- psr/log: ^1.0
- symfony/dom-crawler: ^2.8|^3.0|^4.0
Requires (Dev)
- codeclimate/php-test-reporter: dev-master@dev
- mockery/mockery: ^0.9.4
- monolog/monolog: ^1.21
- phpunit/phpunit: ^4.8
This package is auto-updated.
Last update: 2024-10-29 04:47:40 UTC
README
MediaMonks Crawler
This tool allows you to easily crawl a website and get a DOM object for every url that was found. We use this to crawl our own site pages regardless if it was generated with server and/or client side content by using the Prerender.io client. The resulting data can be used for creating a full site search and/or improving SEO for single-page applications.
Highlights
- Ships with Prerender & Prerender.io clients, uses Goutte by default
- Supports any Symfony BrowserKit client
- Supports both whitelisting and blacklisting of urls
- Supports url normalization which allow you to prevent duplicates based on minor url differences
- Implements the PSR-3 Logger Interface
Documentation
Documentation and examples can be found in the /doc folder.
System Requirements
You need:
- PHP >= 5.5.0
To use the library.
Install
Install this package by using Composer.
$ composer require mediamonks/crawler
Security
If you discover any security related issues, please email devmonk@mediamonks.com instead of using the issue tracker.
License
The MIT License (MIT). Please see License File for more information.