dlindberg/dom-document-factory

Simple DOMDocument factory with html purify and string output

1.0.0 2019-03-13 15:50 UTC

This package is auto-updated.

Last update: 2024-04-14 04:43:05 UTC


README

Latest Version on Packagist Software License Build Status Coverage Status Quality Score Total Downloads

The DOMDocument extension in PHP is very powerful and incredibly useful for manipulating HTML. However, there is just enough boilerplate code that having a little utility factory makes things just a little bit easier. I also find that what I frequently want to do is use HTMLPurifier to clean up some crufty HTML input, turn that into a DOMNode, manipulate it and then convert that back into a string. This is a simple factory to help with that workflow. It takes a string containing a fragment of HTML, purifies it, and turns it into a <body> DOMNode. There is also a very little bit of boiler plate in getting that string back out, so this can handle that too—optionally with different pass of HTMLPurifier on the way out (not frequently necessary, but occasionally helpful).

This factory is setup so that you can simply initialize and invoke it and get back a DOMNode. Manipulate the DOM however you need to and then stringify your DOMNode back out. Of course defaults are good, but flexibility is important. So you can inject a DOCFactoryConfig to adjust the settings as needed—useful if you would need to implement a factory anyway because of your use case.

Install

Via Composer

$ composer require dlindberg/DOMDocumentFactory

Usage

Basic Invocation

If all you really need to do is take a string of html and get a quickly usable DOMNode out of it, you can use the simply create an instance of the DOMDocumentFactory class and invoke it.

$html = '<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</p>'; // Etc. Etc.

$docFactory = new dlindberg\DOMDocumentFactory();

$DOMNode = $docfactory($html);

/* Do something with your DOMNodes */

echo $docFactory->stringify($DOMNode->firstChild);

For an input of <p>This is some Text</p> if you made not further changes to your DOMNodes, the result would also be <p>this is some text</p>

Alternatively, there are two additional methods for invoking the factory. getNode(string $blob) and getDocument(string $blob). The getNode method does the same thing that invoking the class does, and returns the body from the fragment. Using getDocument will return the entire DOMDocument class. Note that you can always get to the parent DOMDocument even when using the getNode method by using DOMDocument's ownerDocument method.

If you have a section of HTML that has multiple immediate child nodes, for example:

<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</p>
<p>In vel nibh eget turpis sagittis posuere ut vitae purus.<p>
<p>Donec in libero mauris. Aenean eu consectetur tortor.</p>
<p>Sed dolor neque, maximus et est eu, ultricies interdum libero.</p>
<p>Cras sed feugiat ante. Suspendisse ultrices eros at arcu feugiat dictum.</p>

Simply using:

$DOMElement = $docfactory($html);
echo $docfactory->stringify($DOMElement->firstChild);

would result in:

<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</p>

And using:

echo $docfactory->stringify($DOMElement);

results in:

<body>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</p>
<p>In vel nibh eget turpis sagittis posuere ut vitae purus.<p>
<p>Donec in libero mauris. Aenean eu consectetur tortor.</p>
<p>Sed dolor neque, maximus et est eu, ultricies interdum libero.</p>
<p>Cras sed feugiat ante. Suspendisse ultrices eros at arcu feugiat dictum.</p>
</body>

To get the same thing out that you put in you can use the stringifyFromList method. This returns an array of strings from each child node in a NodeList. If you need them as an array, you can simply use it as is. Or you can flatten the array using implode.

echo \implode(\PHP_EOL, $docfactory->stringifyFromList($DOMElement));

Custom Invocation

Sometimes you want to do something a little more complex, so the DOMDocumentFactory class constructor can take an instance DOMDocumentFactoryConfig class as its sole argument.

To create an instance of DOMDocumentFactoryConfig:

$DOMDocumentFactoryConfig = new DOMDocumentFactoryConfig(array $settings = [], \HTMLPurifier $inputPurifier = null, \HTMLPurifier $outputPurifier = null);

If you do not pass an instance of HTMLPurifier as $inputPurifier your settings for the HTMLPurifier will be used instead of a default HTMLPurifier object. By default no output purification is preformed. Should you want to purify the output an additional HTMLPurifier may be passed as $outputPurifier.

The configuration can also be modified after creating it:

$DOMDocumentFactoryConfig->setInputPurifier(\HTMLPurifier $purifier);
$DOMDocumentFactoryConfig->setOutputPurifier(\HTMLPurifier $purifier);
$DOMDocumentFactoryConfig->version = '1.0';

The $settings array defaults to:

$settings = [
    'version'             => '1.0',
    'encoding'            => 'UTF-8',
    'recover'             => true,
    'preserveWhiteSpace'  => false,
    'formatOutput'        => true,
    'DOMOptions'          => LIBXML_NOERROR | LIBXML_NOWARNING,
];

As a Static Function

You can also use this factory as a one off static function. You can provide an optional $DOMDocumentFactoryConfig when you do so. Internally the static methods spin up an instance of the DOMDocumentFactory class to do their work, so this method of use is mostly just a shortcut to actually integrating the factory into a project.

$node = DOMDocumentFactory::getDomNode(string $blob, DOMDocumentFactoryConfig $config = null);

$string = DOMDocumentFactory::stringifyNode(\DOMNode $node, DOMDocumentFactoryConfig $config = null);

$array = DOMDocumentFactory::stringifyNodeList(\DOMNodeList $nodes, DOMDocumentFactoryConfig $config = null);

Change log

Please see CHANGELOG for more information on what has changed recently.

Testing

$ composer test

The current tests are fairly basic; tests that more effectively attack possible edge cases or unexpected / unpredictable behaviors would be helpful.

Contributing

Please see CONTRIBUTING and CODE_OF_CONDUCT for details.

Security

If you discover any security related issues, please email dane@lindberg.xyz instead of using the issue tracker.

Credits

The boiler plate for this project is based on The League of Extraordinary Packages' Skeleton package repository.

License

The MIT License (MIT). Please see License File for more information.