mediamonks/crawler

Crawl your own website with various clients for SEO and indexing purposes.

2.0.0 2017-12-04 14:27 UTC

This package is auto-updated.

Last update: 2024-10-29 04:47:40 UTC


README

Build Status Scrutinizer Code Quality Code Coverage Total Downloads Latest Stable Version Latest Unstable Version SensioLabs Insight License

MediaMonks Crawler

This tool allows you to easily crawl a website and get a DOM object for every url that was found. We use this to crawl our own site pages regardless if it was generated with server and/or client side content by using the Prerender.io client. The resulting data can be used for creating a full site search and/or improving SEO for single-page applications.

Highlights

  • Ships with Prerender & Prerender.io clients, uses Goutte by default
  • Supports any Symfony BrowserKit client
  • Supports both whitelisting and blacklisting of urls
  • Supports url normalization which allow you to prevent duplicates based on minor url differences
  • Implements the PSR-3 Logger Interface

Documentation

Documentation and examples can be found in the /doc folder.

System Requirements

You need:

  • PHP >= 5.5.0

To use the library.

Install

Install this package by using Composer.

$ composer require mediamonks/crawler

Security

If you discover any security related issues, please email devmonk@mediamonks.com instead of using the issue tracker.

License

The MIT License (MIT). Please see License File for more information.