bitandblack / sitemap
Creates a sitemap.xml by parsing the whole website.
Fund package maintenance!
Buymeacoffee
Requires
- php: >=8.2
- ext-dom: *
- ext-libxml: *
- bitandblack/composer-helper: ^0 || ^1.0 || ^2.0
- bitandblack/helpers: ^1.0 || ^2.0
- nyholm/psr7: ^1.0
- php-http/discovery: ^1.0
- psr/http-client: ^1.0
- psr/http-message: ^1.0 || ^2.0
- psr/log: ^1.0 || ^2.0 || ^3.0
- symfony/console: ^5.0 || ^6.0 || ^7.0
- symfony/http-client: ^6.0 || ^7.0
- symfony/yaml: ^5.0 || ^6.0 || ^7.0
Requires (Dev)
- ext-curl: *
- ext-json: *
- guzzlehttp/guzzle: ^7.0
- monolog/monolog: ^3.0
- phpstan/phpstan: ^1.0
- phpunit/phpunit: ^11.0
- react/async: ^4.0
- react/http: ^1.0
- rector/rector: ^1.0
- spatie/browsershot: ^3.0 || ^4.0
- symplify/easy-coding-standard: ^12.0
README
Bit&Black Sitemap
Creates a sitemap.xml
by parsing the whole website including all language versions and all images.
If multiple language versions are found, multiple xml
files will be written.
Installation
This library is made for the use with Composer. Add it to your project by running $ composer require bitandblack/sitemap
.
Usage
Auto-generation of a sitemap for a whole website
Set up the sitemap generation like that:
<?php
use BitAndBlack\Sitemap\Config\YamlConfig;
use BitAndBlack\Sitemap\SitemapCrawler;
use BitAndBlack\Sitemap\Writer\FileWriter;
$config = new YamlConfig('/path/to/config.yaml');
$writer = new FileWriter('/path/to/xml/files');
$sitemapCrawler = new SitemapCrawler(
$config,
$writer
);
$sitemapCrawler->createSitemap('https://crawl.me');
The YamlConfig
stores some information which are needed when the process needs to run in multiple steps. Therefore it needs a path where the config file may get stored.
FileWriter
stores the xml
files, so it needs to know a folder for those files.
createSitemap()
starts the crawling. If the time limit has been reached, the process will stop and store its status in the config file. If you call createSitemap()
again it will continue the process. This is helpful for large websites which may take a long time to crawl.
Options
Page limit
Set a page limit that stops the crawler when the defined page count has been reached:
<?php
$sitemapCrawler->setCrawlingLimit(500);
Crawling a single page
You can crawl a single page by using the PageCrawler
class. It will result in an object containing the page's headers, the body and some information about the languages, links and media.
<?php
use BitAndBlack\Sitemap\PageCrawler;
$pageCrawler = new PageCrawler('https://www.bitandblack.com/de.html');
$page = $pageCrawler->getPage();
Manual generation of the sitemap.xml
You can also create the sitemap.xml
by your own:
<?php
use BitAndBlack\Sitemap\Collection;
use BitAndBlack\Sitemap\Config\YamlConfig;
use BitAndBlack\Sitemap\Page;
use BitAndBlack\Sitemap\SitemapXML;
$collection = new Collection(
new YamlConfig()
);
/**
* The page doesn't need to exist.
*/
$page = new Page('https://example.org');
$sitemapXML = new SitemapXML($collection, [$page]);
file_put_contents(
'sitemap.xml',
$sitemapXML->getSitemap()->saveXML()
);
Available Crawlers
Per default, the Bit&Black Sitemap library uses the Symfony Http Client for requests.
You can use a different crawler, depending on your needs. Currently supported are:
The AutoPageCrawler will detect the available crawler by its own.
However, you can set up the PageCrawler with a specific crawler, for example:
<?php
use BitAndBlack\Sitemap\PageCrawler;
use BitAndBlack\Sitemap\PageCrawler\ReactCrawler;
$pageCrawler = new PageCrawler();
$pageCrawler->setPageCrawler(new ReactCrawler());
Help
If you have any questions, feel free to contact us under hello@bitandblack.com
.
Further information about Bit&Black can be found under www.bitandblack.com.