typo3 / html-sanitizer
HTML sanitizer aiming to provide XSS-safe markup based on explicitly allowed tags, attributes and values.
Installs: 5 895 567
Dependents: 2
Suggesters: 0
Security: 4
Stars: 25
Watchers: 11
Forks: 15
Open Issues: 9
Requires
- php: ^7.2 || ^8.0
- ext-dom: *
- masterminds/html5: ^2.7.6
- psr/log: ^1.0 || ^2.0 || ^3.0
Requires (Dev)
- phpunit/phpunit: ^8.5
This package is auto-updated.
Last update: 2024-11-12 16:32:17 UTC
README
TYPO3 HTML Sanitizer
ℹ️ Common safe HTML tags & attributes as given in
\TYPO3\HtmlSanitizer\Builder\CommonBuilder
still might be adjusted, extended or rearranged to more specific builders.
In a Nutshell
This typo3/html-sanitizer
package aims to be a standalone component that can be used by any PHP-based
project or library. Albeit it is released within the TYPO3 namespace, it is agnostic to specifics of
TYPO3 CMS.
\TYPO3\HtmlSanitizer\Behavior
contains declarative settings for a particular process for sanitizing HTML.\TYPO3\HtmlSanitizer\Visitor\VisitorInterface
(multiple different visitors can exist at the same time) are actually doing the work based on the declaredBehavior
. Visitors can modify nodes or mark them for deletion.\TYPO3\HtmlSanitizer\Sanitizer
can be considered as the working instance, invoking visitors, parsing and serializing HTML. In general this instance does not contain much logic on how to handle particular nodes, attributes or values\TYPO3\HtmlSanitizer\Builder\BuilderInterface
can be used to create multiple different builder instances - in terms of "presets" - which combine declaring a particularBehavior
, initialization ofVisitorInterface
instances, and finally returning a ready-to-useSanitizer
instance
Installation
composer req typo3/html-sanitizer
Example & API
<?php use TYPO3\HtmlSanitizer\Behavior; use TYPO3\HtmlSanitizer\Behavior\NodeInterface; use TYPO3\HtmlSanitizer\Sanitizer; use TYPO3\HtmlSanitizer\Visitor\CommonVisitor; require_once 'vendor/autoload.php'; $commonAttrs = [ new Behavior\Attr('id'), new Behavior\Attr('class'), new Behavior\Attr('data-', Behavior\Attr::NAME_PREFIX), ]; $hrefAttr = (new Behavior\Attr('href')) ->addValues(new Behavior\RegExpAttrValue('#^https?://#')); // attention: only `Behavior` implementation uses immutability // (invoking `withFlags()` or `withTags()` returns new instance) $behavior = (new Behavior()) ->withFlags(Behavior::ENCODE_INVALID_TAG | Behavior::ENCODE_INVALID_COMMENT) ->withoutNodes(new Behavior\Comment()) ->withNodes(new Behavior\CdataSection()) ->withTags( (new Behavior\Tag('div', Behavior\Tag::ALLOW_CHILDREN)) ->addAttrs(...$commonAttrs), (new Behavior\Tag('a', Behavior\Tag::ALLOW_CHILDREN)) ->addAttrs(...$commonAttrs) ->addAttrs($hrefAttr->withFlags(Behavior\Attr::MANDATORY)), (new Behavior\Tag('br')) ) ->withNodes( (new Behavior\NodeHandler( new Behavior\Tag('typo3'), new Behavior\Handler\ClosureHandler( static function (NodeInterface $node, ?DOMNode $domNode): ?DOMNode { return $domNode === null ? null : new DOMText(sprintf('%s says: "%s"', strtoupper($domNode->nodeName), $domNode->textContent )); } ) )) ); $visitors = [new CommonVisitor($behavior)]; $sanitizer = new Sanitizer($behavior, ...$visitors); $html = <<< EOH <div id="main"> <typo3>Inspiring People To Share</typo3> <!-- will be encoded, due to Behavior::ENCODE_INVALID_COMMENT --> <a class="no-href">invalidated, due to missing mandatory `href` attr</a> <a href="https://typo3.org/" data-type="url" wrong-attr="is-removed">TYPO3</a><br> (the <span>SPAN, SPAN, SPAN</span> tag shall be encoded to HTML entities) </div> EOH; echo $sanitizer->sanitize($html);
will result in the following sanitized output
<div id="main"> TYPO3 says: "Inspiring People To Share" <!-- will be encoded, due to Behavior::ENCODE_INVALID_COMMENT --> <a class="no-href">invalidated, due to missing mandatory `href` attr</a> <a href="https://typo3.org/" data-type="url">TYPO3</a><br> (the <span>SPAN, SPAN, SPAN</span> tag shall be encoded to HTML entities) </div>
ℹ️ Changes
- since
v2.1.0
newly introduced nodesBehavior\Comment
andBehavior\CdataSection
are enabled per default for backward compatibility reasons, use e.g.$behavior->withoutNodes(new Behavior\Comment())
to remove them (later versions of this package won't have this fallback anymore) - since
v2.1.0
it is suggested to provide a\TYPO3\HtmlSanitizer\Behavior
when creating a new instance of\TYPO3\HtmlSanitizer\Sanitizer
, e.g.new Sanitizer($behavior, ...$visitors)
Find more details on all changes in UPGRADING.md.
Behavior
flags
Behavior::ENCODE_INVALID_TAG
keeps invalid tags, but "disarms" them (see<span>
in example)Behavior::ENCODE_INVALID_ATTR
keeps invalid attributes, but "disarms" the whole(!) tagBehavior::ENCODE_INVALID_COMMENT
"disarms" unexpected HTML comments by completely encoding themBehavior::ENCODE_INVALID_CDATA_SECTION
"disarms" unexpected HTML CDATA sections by completely encoding themBehavior::REMOVE_UNEXPECTED_CHILDREN
removes children forTag
entities that were created without explicitly usingTag::ALLOW_CHILDREN
, but actually contained child nodesBehavior::ALLOW_CUSTOM_ELEMENTS
allow using custom elements (having a hyphen-
) - however, it is suggested to explicitly name all known and allowed tags and avoid using this flag
License
In general the TYPO3 core is released under the GNU General Public License version
2 or any later version (GPL-2.0-or-later
). In order to avoid licensing issues and
incompatibilities this package is licenced under the MIT License. In case you
duplicate or modify source code, credits are not required but really appreciated.
Local Testing
Composer project oliverhader/html-sanitizer-demo offers a local development server to ease manual testing for potentially vulnerable XSS payloads.
Security Contact
In case of finding additional security issues in the TYPO3 project or in this package in particular, please get in touch with the TYPO3 Security Team, or directly report a vulnerability via GitHub.