timostamm/url-finder

Find and replace URLs in HTML and CSS documents

v1.0.2 2020-09-28 09:20 UTC

This package is auto-updated.

Last update: 2024-04-28 17:07:56 UTC


README

Build Status

Find and replace URLs in HTML, CSS and Markdown documents.

Example input HTML:

<html>
<body>
  <img src="http://domain.tld/img/a.jpg" >
  <div style="background-image: url(./c.jpg);"></div>
  <style>
    .bg-img { background-image: url(../images/h.jpg); }
  </style>
  <script src="https://cdn.tld/angular.min.js"></script>
  <link rel="stylesheet" type="text/css" href="/styles/f.css" />
  ...

Find all jpegs on our domain and move them to /images/

$documentUrl = 'http://domain.tld/products/all.html';
$finder = UrlFinder::create($html, $documentUrl);

foreach ($finder->find('*domain.tld/*.jpg') as $url) {
  $newpath = '/images/' . $url->path->filename();
  $url
    ->replacePath($newpath)
    ->makeAbsolute()
    ->clearHost();
}

$finder->getDocument(); // returns the updated HTML string

The result:

<html>
<body>
  <img src="/images/a.jpg" >
  <div style="background-image: url(/images/c.jpg);"></div>
  <style>
    .bg-img { background-image: url(/images/h.jpg); }
  </style>
  <script src="https://cdn.tld/angular.min.js"></script>
  <link rel="stylesheet" type="text/css" href="/styles/f.css" />
  ...

The UrlFinder takes care of proper quoting of URLs in attributes, url-notations in style-attributes and url- notation within style-tags.

Using the fluid collection interface:

$urls = $finder
  ->find('*') // matches the entire absolute URL
  ->matchHost('*')
  ->matchPath('*')
  ->onlyHttps()
  ->matchFilenameNot('*.less');
  // etc.
  
$urls->count();
$urls->toArray();
$urls->first();
foreach($urls as $url) {}

Updating URLs:

$url->query->set('text', 'value');
$url->clear( Url::CREDENTIALS );

See https://github.com/timostamm/url-builder for documentation of the URL object.

Finding URLs in CSS works exactly the same:

$finder = UrlFinder::create($css, 'http://domain.tld/styles/main.css');
$finder->find()->first()->makeAbsolute();
$finder->getDocument();

Please note that import statements are not suported and you have to follow stylesheet-links yourself.

Markdown support

Markdown support is experimental right now. Caveats:

  • Link / image titles are not supported and will raise an error
  • HTML with links within markdown is ignored
  • Markdown is not available using UrlFinder::create, use new MarkdownUrlFinder()