mathsgod/html-image-extractor

There is no license information available for the latest version (dev-main) of this package.

Extract embedded data: URI images from HTML and replace with placeholders

Maintainers

Package info

github.com/mathsgod/html-image-extractor

pkg:composer/mathsgod/html-image-extractor

Statistics

Installs: 0

Dependents: 0

Suggesters: 0

Stars: 0

Open Issues: 0

dev-main 2026-04-16 08:28 UTC

This package is auto-updated.

Last update: 2026-04-16 08:28:53 UTC


README

A PHP library to extract embedded data: URI images from HTML, replace them with temporary placeholders, and restore them with real URLs once the images have been saved or uploaded.

Requirements

  • PHP 8.1+

Installation

composer require mathsgod/html-image-extractor

How it works

HTML (with data: URIs)
    ↓ extract()
HTML with __IMG_xxx__ placeholders  +  image data map
    ↓ save / upload images, get URLs
    ↓ restore()
Final HTML (with real URLs)

Usage

Basic — save to local directory

use HtmlImageExtractor\HtmlImageExtractor;

$extractor = new HtmlImageExtractor();

// Step 1: extract embedded images
$extractor->extract($html);

$modifiedHtml = $extractor->getHtml();   // HTML with __IMG_xxx__ placeholders
$images       = $extractor->getImages(); // image data map
echo $extractor->count() . ' image(s) found';

// Step 2: save to disk and get URL map
$urlMap = $extractor->saveToDir(
    saveDir: __DIR__ . '/uploads',
    baseUrl: 'https://example.com/uploads'
);

// Step 3: restore placeholders with real URLs
$finalHtml = $extractor->restore($urlMap);

Advanced — custom upload (e.g. cloud storage)

$extractor->extract($html);

// Build the URL map yourself after uploading
$urlMap = [];
foreach ($extractor->getImages() as $id => $info) {
    // $info['mimeType']  — e.g. "image/png"
    // $info['data']      — base64 encoded image data
    // $info['extension'] — e.g. "png"
    $url = myCloudUpload(base64_decode($info['data']), $info['mimeType']);
    $urlMap[$id] = $url;
}

$finalHtml = $extractor->restore($urlMap);

API

Method Description
extract(string $html): static Extract all data: URI images and replace with placeholders. Returns $this for chaining.
getHtml(): string Get the HTML with placeholders (after extract()).
getImages(): array Get extracted image data keyed by placeholder ID. Each entry has mimeType, data (base64), extension.
count(): int Number of images found in the last extract() call.
saveToDir(string $saveDir, string $baseUrl): array Save images to a local directory. Returns a urlMap ready for restore().
restore(array $urlMap): string Replace placeholders with real URLs. Returns final HTML.

Supported image formats

jpeg, png, gif, webp, svg, bmp, tiff, avif

License

MIT