bitandblack/sitemap

Creates a sitemap.xml by parsing the whole website.

2.1.0 2024-10-26 14:01 UTC

This package is auto-updated.

Last update: 2024-11-26 14:12:20 UTC


README

PHP from Packagist Codacy Badge Latest Stable Version Total Downloads License

Bit&Black Sitemap

Creates a sitemap.xml by parsing the whole website including all language versions and all images.

If multiple language versions are found, multiple xml files will be written.

Installation

This library is made for the use with Composer. Add it to your project by running $ composer require bitandblack/sitemap.

Usage

Auto-generation of a sitemap for a whole website

Set up the sitemap generation like that:

<?php

use BitAndBlack\Sitemap\Config\YamlConfig;
use BitAndBlack\Sitemap\SitemapCrawler;
use BitAndBlack\Sitemap\Writer\FileWriter;

$config = new YamlConfig('/path/to/config.yaml');
$writer = new FileWriter('/path/to/xml/files');

$sitemapCrawler = new SitemapCrawler(
    $config,
    $writer
);

$sitemapCrawler->createSitemap('https://crawl.me');

The YamlConfig stores some information which are needed when the process needs to run in multiple steps. Therefore it needs a path where the config file may get stored.

FileWriter stores the xml files, so it needs to know a folder for those files.

createSitemap() starts the crawling. If the time limit has been reached, the process will stop and store its status in the config file. If you call createSitemap() again it will continue the process. This is helpful for large websites which may take a long time to crawl.

Options

Page limit

Set a page limit that stops the crawler when the defined page count has been reached:

<?php

$sitemapCrawler->setCrawlingLimit(500);

Crawling a single page

You can crawl a single page by using the PageCrawler class. It will result in an object containing the page's headers, the body and some information about the languages, links and media.

<?php

use BitAndBlack\Sitemap\PageCrawler;

$pageCrawler = new PageCrawler('https://www.bitandblack.com/de.html');
$page = $pageCrawler->getPage();

Manual generation of the sitemap.xml

You can also create the sitemap.xml by your own:

<?php

use BitAndBlack\Sitemap\Collection;
use BitAndBlack\Sitemap\Config\YamlConfig;
use BitAndBlack\Sitemap\Page;
use BitAndBlack\Sitemap\SitemapXML;

$collection = new Collection(
    new YamlConfig()
);

/**
 * The page doesn't need to exist.
 */
$page = new Page('https://example.org');

$sitemapXML = new SitemapXML($collection, [$page]);

file_put_contents(
    'sitemap.xml',
    $sitemapXML->getSitemap()->saveXML()
);

Available Crawlers

Per default, the Bit&Black Sitemap library uses the Symfony Http Client for requests.

You can use a different crawler, depending on your needs. Currently supported are:

The AutoPageCrawler will detect the available crawler by its own.

However, you can set up the PageCrawler with a specific crawler, for example:

<?php

use BitAndBlack\Sitemap\PageCrawler;
use BitAndBlack\Sitemap\PageCrawler\ReactCrawler;

$pageCrawler = new PageCrawler();
$pageCrawler->setPageCrawler(new ReactCrawler());

Help

If you have any questions, feel free to contact us under hello@bitandblack.com.

Further information about Bit&Black can be found under www.bitandblack.com.