wp-php-toolkit / blockparser
BlockParser component for WordPress.
Requires
- php: >=7.2
Requires (Dev)
- phpunit/phpunit: ^9.5
- dev-trunk
- v0.7.4
- v0.7.3
- v0.7.2
- v0.7.1
- v0.7.0
- v0.6.2
- v0.6.1
- v0.6.0
- v0.5.1
- v0.5.0
- v0.4.1
- v0.4.0
- v0.3.1
- v0.3.0
- v0.2.0
- v0.1.5
- v0.1.4
- v0.1.3
- v0.1.2
- v0.1.1
- v0.1.0
- 0.0.19
- 0.0.18
- 0.0.17
- 0.0.16
- 0.0.15
- v0.0.15-alpha
- 0.0.14
- 0.0.13
- 0.0.12
- 0.0.11
- v0.0.8-alpha
- 0.0.7
- v0.0.7-alpha
- 0.0.6
- v0.0.6-alpha
- v0.0.5-alpha
- v0.0.4-alpha
- v0.0.3-alpha
- v0.0.2-alpha
- v0.0.1-alpha
This package is auto-updated.
Last update: 2026-05-11 13:21:06 UTC
README
| slug | blockparser | |||
|---|---|---|---|---|
| title | BlockParser | |||
| install | wp-php-toolkit/blockparser | |||
| credit_title | WordPress core, packaged standalone | |||
| credit_body | <code>WP_Block_Parser</code> is WordPress core's block parser, packaged here so importers and linters can read <a href="https://developer.wordpress.org/block-editor/reference-guides/block-api/">block markup</a> without booting WordPress. Source: <a href="https://github.com/WordPress/wordpress-develop/blob/trunk/src/wp-includes/class-wp-block-parser.php">WordPress/wordpress-develop</a>. | |||
| see_also |
|
WordPress core's block parser, packaged as a standalone library. Turn block markup into a structured tree, lint posts for common authoring mistakes, and audit block usage — all without booting WordPress.
Why this exists
Block markup is not plain HTML. A post can contain HTML comments that identify blocks, JSON attributes inside those comments, freeform HTML between blocks, and nested blocks whose rendered HTML is interleaved with parent markup.
This component packages WordPress core's block parser so importers, linters, migration tools, and static analyzers can understand block content without loading WordPress. It deliberately mirrors core behavior — same array shape, same null blocks for freeform HTML, same core block names such as core/paragraph — so code written against this parser keeps working when run inside WordPress, and vice versa.
Reach for it when you need answers about the block tree: which blocks a post uses, which attributes they carry, where nested blocks appear, or whether content violates a rule your project cares about.
What you get back
WP_Block_Parser::parse() returns an array of blocks. Each block is an associative array with five keys: blockName, attrs, innerBlocks, innerHTML, and innerContent.
innerHTML is the HTML inside the block with inner blocks stripped out. innerContent is the interleaved version: an array of HTML strings with null placeholders marking where each inner block belongs.
Most code starts by checking blockName, then reading attrs or innerHTML. When a post has container blocks such as Group, Columns, or Navigation, look inside innerBlocks too.
Footgun: Freeform HTML between blocks shows up as a block with blockName === null. Always skip that case before comparing names.
Parse a document
The simplest possible use. Pass a string, get back a tree.
<?php require '/wordpress/wp-content/php-toolkit/vendor/autoload.php'; $document = "<!-- wp:heading {\"level\":2} -->\n<h2>Welcome</h2>\n<!-- /wp:heading -->\n\n" . "<!-- wp:paragraph -->\n<p>Hello from the block editor.</p>\n<!-- /wp:paragraph -->"; $blocks = ( new WP_Block_Parser() )->parse( $document ); foreach ( $blocks as $block ) { if ( null === $block['blockName'] ) { continue; } echo $block['blockName'] . ': ' . trim( strip_tags( $block['innerHTML'] ) ) . "\n"; }
core/heading: Welcome
core/paragraph: Hello from the block editor.
Count every block type in a post
A common audit task: "How many Paragraph, Image, and Gallery blocks does this post use?" A small queue keeps the example readable while still visiting nested blocks.
<?php require '/wordpress/wp-content/php-toolkit/vendor/autoload.php'; $document = "<!-- wp:group --><div class=\"wp-block-group\">" . "<!-- wp:heading --><h2>Title</h2><!-- /wp:heading -->" . "<!-- wp:paragraph --><p>One.</p><!-- /wp:paragraph -->" . "<!-- wp:paragraph --><p>Two.</p><!-- /wp:paragraph -->" . "<!-- wp:image {\"id\":1} --><figure><img src=\"a.jpg\"/></figure><!-- /wp:image -->" . "</div><!-- /wp:group -->"; $blocks = ( new WP_Block_Parser() )->parse( $document ); $counts = array(); $queue = $blocks; while ( ! empty( $queue ) ) { $block = array_shift( $queue ); if ( null !== $block['blockName'] ) { $name = $block['blockName']; $counts[ $name ] = isset( $counts[ $name ] ) ? $counts[ $name ] + 1 : 1; } foreach ( $block['innerBlocks'] as $inner_block ) { $queue[] = $inner_block; } } arsort( $counts ); foreach ( $counts as $name => $n ) { echo str_pad( (string) $n, 4, ' ', STR_PAD_LEFT ) . ' ' . $name . "\n"; }
2 core/paragraph
1 core/group
1 core/heading
1 core/image
Check whether a post uses a block
Useful for templates, audits, and migrations: answer one yes/no question without caring where the block appears in the tree.
<?php require '/wordpress/wp-content/php-toolkit/vendor/autoload.php'; $document = "<!-- wp:group --><div class=\"wp-block-group\">" . "<!-- wp:buttons --><div class=\"wp-block-buttons\">" . "<!-- wp:button --><div class=\"wp-block-button\"><a>Buy now</a></div><!-- /wp:button -->" . "</div><!-- /wp:buttons -->" . "</div><!-- /wp:group -->"; $blocks = ( new WP_Block_Parser() )->parse( $document ); function post_has_block( $blocks, $name ) { $queue = $blocks; while ( ! empty( $queue ) ) { $block = array_shift( $queue ); if ( $name === $block['blockName'] ) { return true; } foreach ( $block['innerBlocks'] as $inner_block ) { $queue[] = $inner_block; } } return false; } echo post_has_block( $blocks, 'core/button' ) ? "has button\n" : "missing button\n"; echo post_has_block( $blocks, 'core/gallery' ) ? "has gallery\n" : "missing gallery\n";
has button
missing gallery
Lint headings for hierarchy mistakes
"Don't skip from H2 to H4" is a real accessibility rule. The helper below keeps headings in document order, including headings nested inside Group, Column, and Cover blocks.
<?php require '/wordpress/wp-content/php-toolkit/vendor/autoload.php'; $document = "<!-- wp:heading -->\n<h2>Intro</h2>\n<!-- /wp:heading -->" . "<!-- wp:heading {\"level\":4} -->\n<h4>Subsection</h4>\n<!-- /wp:heading -->" . "<!-- wp:heading {\"level\":3} -->\n<h3>Body</h3>\n<!-- /wp:heading -->"; $blocks = ( new WP_Block_Parser() )->parse( $document ); function collect_headings( $blocks, &$headings ) { foreach ( $blocks as $block ) { if ( 'core/heading' === $block['blockName'] ) { $headings[] = array( 'level' => isset( $block['attrs']['level'] ) ? (int) $block['attrs']['level'] : 2, 'text' => trim( strip_tags( $block['innerHTML'] ) ), ); } collect_headings( $block['innerBlocks'], $headings ); } } $headings = array(); collect_headings( $blocks, $headings ); $last = 1; foreach ( $headings as $heading ) { $level = $heading['level']; $label = $heading['text']; if ( $level > $last + 1 ) { echo "WARN {$label}: jumped from H{$last} to H{$level}\n"; } else { echo "ok {$label}: H{$level}\n"; } $last = $level; }
ok Intro: H2
WARN Subsection: jumped from H2 to H4
ok Body: H3
Find all instances of a custom block
When auditing an export for a block your plugin owns, collect every match and print the fields a human cares about.
<?php require '/wordpress/wp-content/php-toolkit/vendor/autoload.php'; $document = "<!-- wp:paragraph --><p>Reviews</p><!-- /wp:paragraph -->" . "<!-- wp:my-plugin/testimonial {\"author\":\"Jane\",\"rating\":5} -->" . "<blockquote>Loved it.</blockquote>" . "<!-- /wp:my-plugin/testimonial -->" . "<!-- wp:my-plugin/testimonial {\"author\":\"Joe\",\"rating\":4} -->" . "<blockquote>Pretty good.</blockquote>" . "<!-- /wp:my-plugin/testimonial -->"; $blocks = ( new WP_Block_Parser() )->parse( $document ); function find_blocks_by_name( $blocks, $name, &$matches ) { foreach ( $blocks as $block ) { if ( $name === $block['blockName'] ) { $matches[] = $block; } find_blocks_by_name( $block['innerBlocks'], $name, $matches ); } } $testimonials = array(); find_blocks_by_name( $blocks, 'my-plugin/testimonial', $testimonials ); foreach ( $testimonials as $i => $b ) { echo ( $i + 1 ) . '. ' . $b['attrs']['author'] . ' (' . $b['attrs']['rating'] . '/5): ' . trim( strip_tags( $b['innerHTML'] ) ) . "\n"; }
1. Jane (5/5): Loved it.
2. Joe (4/5): Pretty good.
Detect blocks with stale embed URLs
A real-world content audit: find every core/embed whose URL points at a domain you have retired.
<?php require '/wordpress/wp-content/php-toolkit/vendor/autoload.php'; $document = <<<'HTML' <!-- wp:embed {"url":"https://twitter.com/wordpress/status/1","providerNameSlug":"twitter"} /--> <!-- wp:embed {"url":"https://youtube.com/watch?v=abc","providerNameSlug":"youtube"} /--> <!-- wp:embed {"url":"https://vine.co/v/xyz","providerNameSlug":"vine"} /--> HTML; $retired = array( 'vine.co', 'plus.google.com' ); foreach ( ( new WP_Block_Parser() )->parse( $document ) as $b ) { if ( 'core/embed' !== $b['blockName'] ) { continue; } $url = isset( $b['attrs']['url'] ) ? $b['attrs']['url'] : ''; $host = parse_url( $url, PHP_URL_HOST ); $bad = $host && in_array( $host, $retired, true ); echo ( $bad ? 'STALE ' : 'ok ' ) . $url . "\n"; }
ok https://twitter.com/wordpress/status/1
ok https://youtube.com/watch?v=abc
STALE https://vine.co/v/xyz