manychois / simdom
A simple-to-use PHP library for processing DOM documents.
0.2.1
2023-02-15 12:26 UTC
Requires
- php: >=7.4
Requires (Dev)
- ext-dom: *
- doctrine/instantiator: ^1.3.1
- phpunit/phpunit: ^9.0
- slevomat/coding-standard: ^8.8
- squizlabs/php_codesniffer: ^3.7
This package is auto-updated.
Last update: 2024-11-15 05:26:02 UTC
README
Simdom is a lightweight PHP library designed to make parsing and manipulating DOM documents as straightforward as possible.
This library requires no external dependencies or extensions - such as libxml
or dom
.
Though not a full DOM implementation, for most use cases Simdom proves to be more than sufficient.
Regular expressions are used extensively in the parsing logic. It is OK if you don't like this approach, we can't please everyone.
Feel free to try its parsing ability at the demo site.
Features
- Depends on no extensions or external libraries.
- Conversion to and from PHP's native DOM objects for integration with existing code.
- Pretty print HTML5 document.
- Type hinting is placed everywhere.
- Remove meaningless properties (e.g.
childNodes
) and methods (e.g.appendChild()
) fromComment
,DocumentType
, andText
for cleaner interface. - Extra convenient methods are added to
Document
,DocumentFragment
andElement
, e.g.dfs()
for depth-first search on desendant nodes. - Throw exceptions with richer context when insertion or replacement of nodes will result in invalid HTML document.
Getting Started
Installation
To use this library in your project, run:
composer require manychois/simdom
Major differences from DOM standard
- You do not need to use
Document::importNode()
to import nodes from other documents.
Simdom has no concept of node document. - XML document will still be parsed as if it is HTML5.
- Handling of deprecated tags
frame
,frameset
, andplaintext
is not implemented.
When encountered, they are treated as ordinary tags likediv
. Attr
does not inherit fromNode
, so will never participate in the DOM tree hierarchy.- Parsing
<template>
will not create aDocumentFragment
inside thetemplate
element.
Its content will be treated as raw text. - The DOM standard has a complicated logic of handling misaligned end tags.
In Simdom we try to find any matching start tag up to 3 levels, and discard the end tag if not found. - Fixing of incorrect tag hierarchy e.g.
<li><ul></ul></li>
is not implemented.
Usage
Parsing HTML
$parser = \Manychois\Simdom\Dom::createParser(); $doc = $parser->parseFromString('<p>Hello, world!</p>'); // $doc is an instance of \Manychois\Simdom\Document
Traversing and manipulating the DOM tree
// Standard DOM methods for traversal and manipulation are available $html = $doc->documentElement(); $body = $html->children()->item($html->children()->length() - 1); $body->append(\Manychois\Simdom\Dom::createElement('div')); // Simdom also provides extra convenient methods like dfs (Depth First Search) foreach ($doc->dfs() as $node) { if ($node instanceof \Manychois\Simdom\Comment) { echo $node->data() . "\n"; } }
Outputting HTML
$option = new \Manychois\Simdom\PrettyPrintOption(); $option->indent ="\t"; echo \Manychois\Simdom\Dom::print($doc, $option);
Convertion to and from PHP's native DOM objects
$converter = new \Manychois\Simdom\DomNodeConverter(); $domDoc = new \DOMDocument(); // Convert DOMElement to Element and you can start playing with Simdom $element = $converter->convertToElement($domDoc->createElement('html')); // Convert Element back to DOMElement and you can import it to DOMDocument $domElement = $converter->importElement($element, $domDoc);