schenke-io / laravel-url-cleaner
check and cleans url from seo or tracking data
Fund package maintenance!
spatie
Requires
- php: ^8.1|^8.2|^8.3
- ext-curl: *
- ext-json: *
- ext-simplexml: *
- archtechx/enums: ^1.1
- badges/poser: ^2.0|^3.0
- guzzlehttp/guzzle: ^7.0
- spatie/laravel-package-tools: ^1.0
Requires (Dev)
- laravel/framework: ^10.0|^11.0
- laravel/pint: ^1.18
- mockery/mockery: ^1.5
- orchestra/testbench: ^8.0|^9.0
- pestphp/pest: ^1.22|^2.0|^3.0
- phpstan/phpstan-phpunit: ^1.0
- spatie/ray: ^1.40
README
The Laravel URL Cleaner package sanitizes URLs by removing unnecessary SEO parameters, tracking information, and other clutter, ensuring clean and efficient URL handling in your Laravel applications.
To install just run:
composer require schenke-io/laravel-url-cleaner
Here a code example:
<?php use SchenkeIo\LaravelUrlCleaner\UrlCleaner; $shortUrl = (new UrlCleaner)->handle($longUrl);
Operation principle
The core UrlCleaner
class iteratively applies a series of specialized
cleaner classes to a given URL. Each cleaner class performs a specific modification
to check and clean the URL for the following reasons:
- Reducing URL clutter: Removes unnecessary SEO parameters and tracking information.
- Improving data storage efficiency: Stores cleaner, more concise URLs.
- Enhancing performance: Optimizes URL processing and caching.
- Securing sensitive information: Prevents exposure of tracking parameters.
- Enhancing data analysis: Simplifies data analysis by removing noise from URLs.
This cleaner classes are highly extensible, allowing for customization and the creation of new modification types.
Config
A default configuration file can be installed and later modified, you can install it with:
php artisan url-cleaner:install
A typical result could be:
[ 'cleaners' => [ MarketingBroad::class, RemoveLongValues::class, PreventInvalidHost::class ], 'max_length_value' => 40, 'masks' => ['dd3','vv67'], 'protected_keys' => ['search'] ]
List of cleaner classes
The use of masks
The core process of URL parameter removal utilizes specific masks.
Soem examples are outlined in the table below.
Build your own cleaner by extending special classes
To extend the list of cleaners you can build your own
cleaners and put them in the config
file config/url-cleaner.php
The following cleaners are prepared to be extended for custom applications:
Prevent domain names
Extend PreventLocalhost
and overwrite the $hostRegExes
array with regular
expressions matching unwanted hostnames.
<?php use SchenkeIo\LaravelUrlCleaner\Cleaners\PreventLocalhost; class MyCleaner extends PreventLocalhost { protected array $hostRegExes = [ '/test\.com/', '/test\.net/', ]; }
Prevent schemes
Extend PreventNonHttps
and overwrite the $allowedSchemes
array with scheme
you allow to pass.
<?php use SchenkeIo\LaravelUrlCleaner\Cleaners\PreventNonHttps; class MyCleaner extends PreventNonHttps { protected array $allowedSchemes = [ 'https', 'http', 'sftp', ]; }
Use your own masks
Extend RemoveSearch
and overwrite the $masks
array with masks you want to exclude.
<?php use SchenkeIo\LaravelUrlCleaner\Cleaners\RemoveSearch; class MyCleaner extends RemoveSearch { protected array $masks = [ 'utm_*', 'test*', 'q@test.net' ]; }
Rewrite urls
Extend ShortAmazonProductUrl
and overwrite the clean()
method using
the class as an example.
<?php use SchenkeIo\LaravelUrlCleaner\Cleaners\ShortAmazonProductUrl; class MyCleaner extends ShortAmazonProductUrl { public function clean(UrlData &$urlData): void { // check if the hostname is right if (preg_match(/* regular expression */, $urlData->host)) { // check for the path to be replaced if (preg_match(/* regular expression */, $urlData->path, $matches)) { // your code $urlData->path = /* new path */; $urlData->fragment = ''; // clean if applicable $urlData->query = ''; // clean if applicable $urlData->parameter = []; // clean if applicable } } } }
Data sources
Currently, the following sources are used:
- https://docs.flyingpress.com/en/article/ignore-query-parameters-yfejfj/
- https://support.cloudways.com/en/articles/8437462-how-to-enable-ignore-query-string-for-varnish-cache
- https://github.com/mpchadwick/tracking-query-params-registry
- https://github.com/spekulatius/url-parameter-tracker-list
- https://github.com/Smile4ever/Neat-URL
- https://github.com/henkisdabro/platform-url-click-id-parameters
- https://data.iana.org/TLD/tlds-alpha-by-domain.txt