wp-php-toolkit / xml
XML component for WordPress.
Package info
pkg:composer/wp-php-toolkit/xml
Requires
- php: >=7.2
- wp-php-toolkit/encoding: ^0.8
Requires (Dev)
- phpunit/phpunit: ^9.5
- dev-trunk
- v0.8.0
- v0.7.9
- v0.7.8
- v0.7.7
- v0.7.6
- v0.7.5
- v0.7.4
- v0.7.3
- v0.7.2
- v0.7.1
- v0.7.0
- v0.6.2
- v0.6.1
- v0.6.0
- v0.5.1
- v0.5.0
- v0.4.1
- v0.4.0
- v0.3.1
- v0.3.0
- v0.2.0
- v0.1.5
- v0.1.4
- v0.1.3
- v0.1.2
- v0.1.1
- v0.1.0
- 0.0.19
- 0.0.18
- 0.0.17
- 0.0.16
- v0.0.15
- v0.0.15-alpha
- 0.0.14
- 0.0.13
- 0.0.12
- 0.0.11
- v0.0.8-alpha
- 0.0.7
- v0.0.7-alpha
- 0.0.6
- v0.0.6-alpha
- v0.0.5-alpha
- v0.0.4-alpha
- v0.0.3-alpha
- v0.0.2-alpha
- v0.0.1-alpha
This package is auto-updated.
Last update: 2026-05-19 20:26:19 UTC
README
| slug | xml | |||
|---|---|---|---|---|
| title | XML | |||
| install | wp-php-toolkit/xml | |||
| see_also |
|
A streaming, namespace-aware XML processor in pure PHP. Read and modify huge feeds, WXR exports, ePub manifests, and Office Open XML parts without ever loading the document into memory and without depending on libxml2.
When the native API extension is loaded, XMLProcessor can use a
native delegate by default while preserving PHP fallback behavior. Define
WP_NATIVE_APIS_DISABLE_DEFAULTS before loading the component to
force the pure PHP fallback.
Why this exists
SimpleXMLElement and DOMDocument both need libxml2 and both build a complete in-memory tree. XMLProcessor walks the document forward as a cursor, keeps modifications in a side buffer, and emits the full updated XML with get_updated_xml() only when you ask for it.
This design came from WordPress-scale documents such as WXR exports. A migration may only need to rewrite wp:attachment_url values or bump a feed attribute, so the processor optimizes for targeted cursor edits instead of a full validating XML stack.
Footgun: Namespace-aware methods use the namespace URI, not the prefix written in the tag. In WXR, get_attribute( 'wp', 'status' ) looks for a namespace literally named wp; for the usual WXR declaration you want get_attribute( 'http://wordpress.org/export/1.2/', 'status' ).
Footgun: In streaming mode next_tag() can return false because input ran out, not because the document ended. Check is_paused_at_incomplete_input() before assuming you're done.
Bump every price in a catalog
Find each <book>, read its price, write a new one, emit the updated document.
<?php require '/php-toolkit/vendor/autoload.php'; use WordPress\XML\XMLProcessor; $xml = <<<'XML' <catalog> <book sku="A1" price="29.99"><title>PHP Internals</title></book> <book sku="A2" price="14.50"><title>WordPress at Scale</title></book> </catalog> XML; $p = XMLProcessor::create_from_string( $xml ); while ( $p->next_tag( 'book' ) ) { $old = (float) $p->get_attribute( '', 'price' ); $new = number_format( $old * 1.10, 2, '.', '' ); $p->set_attribute( '', 'price', $new ); } echo $p->get_updated_xml();
<catalog>
<book sku="A1" price="32.99"><title>PHP Internals</title></book>
<book sku="A2" price="15.95"><title>WordPress at Scale</title></book>
</catalog>
Read namespaced attributes from a WXR export
WordPress's WXR commonly uses wp:, dc:, and content: prefixes bound to namespace names such as http://wordpress.org/export/1.2/. Pass that expanded namespace name, not the prefix; the processor handles whichever prefix the document actually uses.
<?php require '/php-toolkit/vendor/autoload.php'; use WordPress\XML\XMLProcessor; $wxr = <<<'XML' <?xml version="1.0"?> <rss xmlns:wp="http://wordpress.org/export/1.2/" xmlns:dc="http://purl.org/dc/elements/1.1/"> <channel><item> <title>Hello World</title> <dc:creator>admin</dc:creator> <wp:post_id>42</wp:post_id> <wp:status>publish</wp:status> </item></channel></rss> XML; $WP = 'http://wordpress.org/export/1.2/'; $DC = 'http://purl.org/dc/elements/1.1/'; $p = XMLProcessor::create_from_string( $wxr ); while ( $p->next_tag( 'item' ) ) { while ( $p->next_token() ) { if ( $p->is_tag_closer() && 'item' === $p->get_tag_local_name() ) break; if ( ! $p->is_tag_opener() ) continue; $ns = $p->get_tag_namespace(); $local = $p->get_tag_local_name(); $prefix = ( $WP === $ns ) ? 'wp/' : ( ( $DC === $ns ) ? 'dc/' : '' ); echo "{$prefix}{$local}: "; while ( $p->next_token() && '#text' !== $p->get_token_name() ) {} echo trim( $p->get_modifiable_text() ) . "\n"; } }
title: Hello World
dc/creator: admin
wp/post_id: 42
wp/status: publish
Rewrite URLs across an entire WXR export
Large WXR exports can hold many URLs in <link>, <guid>, and post content. Streaming the file lets you rewrite large exports without loading the whole XML document into memory.
<?php require '/php-toolkit/vendor/autoload.php'; use WordPress\XML\XMLProcessor; $wxr = <<<'XML' <?xml version="1.0"?><rss xmlns:wp="http://wordpress.org/export/1.2/"><channel> <wp:base_site_url>https://old.example.com</wp:base_site_url> <item><link>https://old.example.com/2024/post-1</link> <guid>https://old.example.com/?p=1</guid></item> </channel></rss> XML; $from = 'https://old.example.com'; $to = 'https://new.example.com'; $p = XMLProcessor::create_from_string( $wxr ); $rewritten = 0; while ( $p->next_token() ) { if ( '#text' !== $p->get_token_name() ) continue; $text = $p->get_modifiable_text(); if ( false === strpos( $text, $from ) ) continue; $p->set_modifiable_text( str_replace( $from, $to, $text ) ); $rewritten++; } echo "rewrote {$rewritten} text nodes\n\n"; echo $p->get_updated_xml();
rewrote 3 text nodes
<?xml version="1.0"?><rss xmlns:wp="http://wordpress.org/export/1.2/"><channel>
<wp:base_site_url>https://new.example.com</wp:base_site_url>
<item><link>https://new.example.com/2024/post-1</link>
<guid>https://new.example.com/?p=1</guid></item>
</channel></rss>
Parse OPML to extract feed URLs
OPML is the format Feedly and many readers use to import/export feed lists. Flat, attribute-heavy XML — exactly what a tag processor handles best.
<?php require '/php-toolkit/vendor/autoload.php'; use WordPress\XML\XMLProcessor; $opml = <<<'XML' <?xml version="1.0"?><opml version="2.0"><head><title>My Feeds</title></head> <body> <outline text="Tech"><outline text="Hacker News" type="rss" xmlUrl="https://news.ycombinator.com/rss"/> <outline text="LWN" type="rss" xmlUrl="https://lwn.net/headlines/rss"/></outline> <outline text="WordPress" type="rss" xmlUrl="https://wordpress.org/news/feed/"/> </body></opml> XML; $p = XMLProcessor::create_from_string( $opml ); while ( $p->next_tag( 'outline' ) ) { $url = $p->get_attribute( '', 'xmlUrl' ); if ( null === $url ) continue; echo $p->get_attribute( '', 'text' ) . "\t" . $url . "\n"; }
Hacker News https://news.ycombinator.com/rss
LWN https://lwn.net/headlines/rss
WordPress https://wordpress.org/news/feed/