rosell-dk/dom-util-for-webp

Replace image URLs found in HTML

0.7.1 2023-10-20 12:10 UTC

This package is auto-updated.

Last update: 2024-04-06 17:20:35 UTC


README

Latest Stable Version Minimum PHP Version Build Status Coverage Software License Dependents

Replace image URLs found in HTML

This library can do two things:

  1. Replace image URLs in HTML
  2. Replace <img> tags with <picture> tags, adding webp versions to sources

To setup with composer, run composer require rosell-dk/dom-util-for-webp.

1. Replacing image URLs in HTML

The ImageUrlReplacer::replace($html) method accepts a piece of HTML and returns HTML where where all image URLs have been replaced - even those in inline styles.

Usage:

$modifiedHtml = ImageUrlReplacer::replace($html);

Example replacements:

input:

<img src="image.jpg">
<img src="1.jpg" srcset="2.jpg 1000w">
<picture>
    <source srcset="1.jpg" type="image/webp">
    <source srcset="2.png" type="image/webp">
    <source src="3.gif"> <!-- gifs are skipped in default behaviour -->
    <source src="4.jpg?width=200"> <!-- urls with query string are skipped in default behaviour -->
</picture>
<div style="background-image: url('image.jpeg')"></div>
<style>
#hero {
    background: lightblue url("image.png") no-repeat fixed center;;
}
</style>
<input type="button" src="1.jpg">
<img data-src="image.jpg"> <!-- any attribute starting with "data-" are replaced (if it ends with "jpg", "jpeg" or "png"). For lazy-loading -->

output:

<img src="image.jpg.webp">
<img src="1.jpg.webp" srcset="2.jpg.webp 1000w">
<picture>
    <source srcset="1.jpg.webp" type="image/webp">
    <source srcset="2.jpg.webp" type="image/webp">
    <source srcset="3.gif"> <!-- gifs are skipped in default behaviour -->
    <source srcset="4.jpg?width=200"> <!-- urls with query string are skipped in default behaviour -->
</picture>
<div style="background-image: url('image.jpeg.webp')"></div>
<style>
#hero {
    background: lightblue url("image.png.webp") no-repeat fixed center;;
}
</style>
<input type="button" src="1.jpg.webp">
<img data-src="image.jpg.webp"> <!-- any attribute starting with "data-" are replaced (if it ends with "jpg", "jpeg" or "png"). For lazy-loading -->

Default behaviour of ImageUrlReplacer::replace:

  • The modified URL is the same as the original, with ".webp" appended (to change, override the replaceUrl function)
  • Only replaces URLs that ends with "png", "jpg" or "jpeg" (no query strings either) (to change, override the replaceUrl function)
  • Attribute search/replace limits to these tags: <img>, <source>, <input> and <iframe> (to change, override the $searchInTags property)
  • Attribute search/replace limits to these attributes: "src", "src-set" and any attribute starting with "data-" (to change, override the attributeFilter function)
  • Urls inside styles are replaced too (background-image and background properties)

The behaviour can be modified by extending ImageUrlReplacer and overriding public methods such as replaceUrl

ImageUrlReplacer uses the Sunra\PhpSimple\HtmlDomParserlibrary for parsing and modifying HTML. It wraps simplehtmldom. Simplehtmldom supports invalid HTML (it does not touch the invalid parts)

Example: Customized behaviour

class ImageUrlReplacerCustomReplacer extends ImageUrlReplacer
{
    public function replaceUrl($url) {
        // Only accept urls ending with "png", "jpg", "jpeg"  and "gif"
        if (!preg_match('#(png|jpe?g|gif)$#', $url)) {
            return;
        }

        // Only accept full urls (beginning with http:// or https://)
        if (!preg_match('#^https?://#', $url)) {
            return;
        }

        // PS: You probably want to filter out external images too...

        // Simply append ".webp" after current extension.
        // This strategy ensures that "logo.jpg" and "logo.gif" gets counterparts with unique names
        return $url . '.webp';
    }

    public function attributeFilter($attrName) {
        // Don't allow any "data-" attribute, but limit to attributes that smells like they are used for images
        // The following rule matches all attributes used for lazy loading images that we know of
        return preg_match('#^(src|srcset|(data-[^=]*(lazy|small|slide|img|large|src|thumb|source|set|bg-url)[^=]*))$#i', $attrName);

        // If you want to limit it further, only allowing attributes known to be used for lazy load,
        // use the following regex instead:
        //return preg_match('#^(src|srcset|data-(src|srcset|cvpsrc|cvpset|thumb|bg-url|large_image|lazyload|source-url|srcsmall|srclarge|srcfull|slide-img|lazy-original))$#i', $attrName);
    }
}

$modifiedHtml = ImageUrlReplacerCustomReplacer::replace($html);

2. Replacing <img> tags with <picture> tags

The PictureTags::replace($html) method accepts a piece of HTML and returns HTML where where all <img> tags have been replaced with <picture> tags, adding webp versions to sources

Usage:

$modifiedHtml = PictureTags::replace($html);

Example replacements:

Input:

<img src="1.png">
<img srcset="3.jpg 1000w" src="3.jpg">
<img data-lazy-src="9.jpg" style="border:2px solid red" class="something">
<figure class="wp-block-image">
    <img src="12.jpg" alt="" class="wp-image-6" srcset="12.jpg 492w, 12-300x265.jpg 300w" sizes="(max-width: 492px) 100vw, 492px">
</figure>

Output:

<picture><source srcset="1.png.webp" type="image/webp"><img src="1.png" class="webpexpress-processed"></picture>
<picture><source srcset="3.jpg.webp 1000w" type="image/webp"><img srcset="3.jpg 1000w" src="3.jpg" class="webpexpress-processed"></picture>
<picture><source data-lazy-src="9.jpg.webp" type="image/webp"><img data-lazy-src="9.jpg" style="border:2px solid red" class="something webpexpress-processed"></picture>
<figure class="wp-block-image">
  <picture><source srcset="12.jpg.webp 492w, 12-300x265.jpg.webp 300w" sizes="(max-width: 492px) 100vw, 492px" type="image/webp"><img src="12.jpg" alt="" class="wp-image-6 webpexpress-processed" srcset="12.jpg 492w, 12-300x265.jpg 300w" sizes="(max-width: 492px) 100vw, 492px"></picture>
</figure>'

Note that with the picture tags, it is still the img tag that shows the selected image. The picture tag is just a wrapper. So it is correct behaviour not to copy the style, width, class or any other attributes to the picture tag. See issue #9.

As with ImageUrlReplacer, you can override the replaceUrl function. There is however currently no other methods to override.

PictureTags currently uses regular expressions to do the replacing. There are plans to change implementation to use Sunra\PhpSimple\HtmlDomParser, like our ImageUrlReplacer class does.

Platforms

Works on (at least):

  • OS: Ubuntu (22.04, 20.04, 18.04), Windows (2022, 2019), Mac OS (13, 12, 11, 10.15)
  • PHP: 5.6 - 8.2 (also tested 8.3 and 8.4 development versions in October 2023)

Each new release will be tested on all combinations of OSs and PHP versions that are supported by GitHub-hosted runners. Except that we do not below PHP 5.6.
Status: Build Status

Testing consists of running the unit tests. The code in this library is almost completely covered by tests (~95% coverage).

We also test future versions of PHP monthly, in order to catch problems early.
Status: PHP 8.3 PHP 8.4

Do you like what I do?

Perhaps you want to support my work, so I can continue doing it :)