buse974/simple-page-crawler

ZF3 module v0.3.0 - Provide a crawler to get web page informations : title, meta, heading tags and images

Installs: 1 566

Dependents: 0

Suggesters: 0

Security: 0

Stars: 1

Watchers: 2

Forks: 0

Type:module

0.3.0 2013-01-22 13:19 UTC

This package is auto-updated.

Last update: 2024-10-16 02:07:33 UTC


README

Version 0.3.0 Created by Vincent Blanchon

Introduction

SimplePageCrawler is a web page crawler. You can get informations :

  • Title
  • Meta (decsription, open graph, etc.)
  • H1, H2, etc.
  • List of the images
  • List of the links

Usage

Get page informations :

$crawler = $this->getServiceLocator('SimplePageCrawler');
$page = $crawler->get('http://www.nytimes.com');

echo sprintf('The title is "%s"', $page->getTitle());
echo sprintf('The description is "%s"', $page->getMeta('description'));

You can use th action helper :

$page = $this->simplePageCrawler('http://www.nytimes.com');

echo sprintf('The title is "%s"', $page->getTitle());
echo sprintf('The description is "%s"', $page->getMeta('description'));

Advanced usage

You can get Open graph metadatas :

$page = $this->simplePageCrawler('http://www.nytimes.com');
$metas = $page->getMeta()->getOpenGraph();