alexoliverwd / sitemap-parser
A simple sitemap parser that makes parsing sitemaps easier.
Requires
- php: >=8.3
Requires (Dev)
- pestphp/pest: ^3.6
- phpstan/phpstan: ^2.0
This package is auto-updated.
Last update: 2025-02-24 22:03:55 UTC
README
Overview
The Sitemap Parser is a PHP class designed to parse XML sitemaps, whether they are local files or hosted URLs. It can handle both standard sitemaps and sitemap indexes, extracting location URLs and their last modification dates.
Class: Parser
Namespace: AOWD\SitemapParser
Features
- Parse XML sitemaps from local files or URLs
- Support for nested sitemaps (sitemap index files)
- Extract URL locations and last modification dates
- Built-in URL validation
- cURL-based URL content fetching
- Error handling with custom exceptions
Public Methods
parser(string $sitemap_location): array
The main method to parse a sitemap.
Parameters:
$sitemap_location
: String containing either a file path or URL to the sitemap
Returns:
- Array of parsed entries, each containing:
location
: The URL from the sitemapupdated
: The last modification date
Example:
$entries = Parser::parser('https://example.com/sitemap.xml'); // or $entries = Parser::parser('/path/to/local/sitemap.xml');
Error Handling
The class uses a custom ParserException
class for error handling. Exceptions are thrown for:
- Invalid XML content
- File reading errors
- URL fetching errors
Requirements
- PHP 8.3 or higher
- cURL extension
- SimpleXML extension
Example Usage
use AOWD\SitemapParser\Parser; use AOWD\Exceptions\ParserException; try { // Parse a sitemap $entries = Parser::parser('https://example.com/sitemap.xml'); // Access the parsed entries foreach ($entries as $entry) { echo "URL: " . $entry['location'] . "\n"; echo "Last Updated: " . $entry['updated'] . "\n"; } } catch (ParserException $e) { echo "Error: " . $e->getMessage(); }
Technical Details
Entry Format
Each parsed entry is stored as an associative array with the following structure:
[ 'location' => 'https://example.com/page', 'updated' => '2023-01-01T12:00:00+00:00' ]
Supported Sitemap Types
- Standard XML sitemaps
- Sitemap index files
- Nested sitemaps
Validation
- URL validation using
filter_var()
- File extension checking (.xml)
- File existence verification for local files
Best Practices
- Always wrap parser calls in try-catch blocks
- Verify sitemap file extensions
- Ensure proper file permissions for local files
- Handle potential network issues for remote sitemaps
Limitations
- Only supports XML format sitemaps
- Requires proper XML formatting
- Network dependent for remote sitemaps
This parser provides a robust solution for handling XML sitemaps in PHP applications, with built-in error handling and support for both local and remote sitemap files.