adabra/googlesitemapparser

Google Sitemap is a Sitemap standard that is supported by Ask.com, Google, YAHOO and MSN Search. This library can read in such Sitemaps and parse all urls from them.

1.1.6 2020-10-21 15:21 UTC

This package is not auto-updated.

Last update: 2024-12-13 09:51:40 UTC


README

An easy-to-use library to parse sitemaps compliant with the Google Standard

Install

Install via composer:

{
    "require": {
        "tzfrs/googlesitemapparser": "1.0.*"
    }
}

Run composer install or composer update.

Getting Started

Basic parsing

Parses the data from the sitemap.xml of your server. Supports .xml and text format

<?php
require __DIR__ . '/vendor/autoload.php';

use \tzfrs\GoogleSitemapParser;
use \tzfrs\Exceptions\GoogleSitemapParserException;

try {
    $posts = (new GoogleSitemapParser('http://tzfrs.de/sitemap.xml'))->parse();
    foreach ($posts as $post) {
        print $post . '<br>';
    }
} catch (GoogleSitemapParserException $e) {
    print $e->getMessage();
}

Parsing from robots.txt

Searches for Sitemap entries in the robots.txt and parses those files. Also downloads/extracts gzip compressed sitemaps and searches for them

<?php
require __DIR__ . '/vendor/autoload.php';

use \tzfrs\GoogleSitemapParser;
use \tzfrs\Exceptions\GoogleSitemapParserException;

try {
    $posts = (new GoogleSitemapParser('http://www.sainsburys.co.uk/robots.txt'))->parseFromRobots();
    foreach ($posts as $post) {
        print $post . '<br>';
    }
} catch (GoogleSitemapParserException $e) {
    print $e->getMessage();
}

Including the priority for the sitemap entry in the response

If you also want to get the priority of a sitemap set the 2nd parameter of the constructor to true If the priority can't be found or is not set in the file an empty string will be returned.

<?php
require __DIR__ . '/vendor/autoload.php';

use \tzfrs\GoogleSitemapParser;
use \tzfrs\Exceptions\GoogleSitemapParserException;
try {
    $posts = (new GoogleSitemapParser('http://www.sainsburys.co.uk/robots.txt', true))->parseFromRobots();
    foreach ($posts as $post=>$priority) {
        print 'URL: '. $post . '<br>Priority: '. $priority . '<hr>';
    }
} catch (GoogleSitemapParserException $e) {
    print $e->getMessage();
}

Parsing compressed sitemaps

If you have an URL to a compressed sitemap such as example.com/sitemap.xml.gz then you need to use this method

<?php
require __DIR__ . '/vendor/autoload.php';

use \tzfrs\GoogleSitemapParser;
use \tzfrs\Exceptions\GoogleSitemapParserException;
try {
    $posts = (new GoogleSitemapParser('http://www.sainsburys.co.uk/wcsstore/robots/sitemap_10151_4.xml.gz'))->parseCompressed();
    foreach ($posts as $post=>$priority) {
        print 'URL: '. $post . '<br>Priority: '. $priority . '<hr>';
    }
} catch (GoogleSitemapParserException $e) {
    print $e->getMessage();
}

Methods

parse
parseFromRobots

Contributing is surely allowed! :-)

See the file LICENSE for licensing informations