schenke-io/laravel-url-cleaner

check and cleans url from seo or tracking data

v1.0.0 2024-11-19 16:20 UTC

This package is auto-updated.

Last update: 2024-11-19 16:32:14 UTC


README

Latest Version on Packagist GitHub Tests Action Status Total Downloads

The Laravel URL Cleaner package sanitizes URLs by removing unnecessary SEO parameters, tracking information, and other clutter, ensuring clean and efficient URL handling in your Laravel applications.

To install just run:

  composer require schenke-io/laravel-url-cleaner

Here a code example:

<?php

use SchenkeIo\LaravelUrlCleaner\UrlCleaner;


$shortUrl = (new UrlCleaner)->handle($longUrl);    

Operation principle

The core UrlCleaner class iteratively applies a series of specialized cleaner classes to a given URL. Each cleaner class performs a specific modification to check and clean the URL for the following reasons:

  • Reducing URL clutter: Removes unnecessary SEO parameters and tracking information.
  • Improving data storage efficiency: Stores cleaner, more concise URLs.
  • Enhancing performance: Optimizes URL processing and caching.
  • Securing sensitive information: Prevents exposure of tracking parameters.
  • Enhancing data analysis: Simplifies data analysis by removing noise from URLs.

This cleaner classes are highly extensible, allowing for customization and the creation of new modification types.

Config

A default configuration file can be installed and later modified, you can install it with:

php artisan url-cleaner:install

A typical result could be:

[
    'cleaners' => [
        MarketingBroad::class,
        RemoveLongValues::class,
        PreventInvalidHost::class
    ],
    'max_length_value' => 40,
    'masks' => ['dd3','vv67'],
    'protected_keys' => ['search']   
]

List of cleaner classes

The use of masks

The core process of URL parameter removal utilizes specific masks.

Soem examples are outlined in the table below.

Build your own cleaner by extending special classes

To extend the list of cleaners you can build your own cleaners and put them in the config file config/url-cleaner.php

The following cleaners are prepared to be extended for custom applications:

Prevent domain names

Extend PreventLocalhost and overwrite the $hostRegExes array with regular expressions matching unwanted hostnames.

<?php

use SchenkeIo\LaravelUrlCleaner\Cleaners\PreventLocalhost;

class MyCleaner extends PreventLocalhost {

    protected array $hostRegExes = [
        '/test\.com/',
        '/test\.net/',
    ];
    
}

Prevent schemes

Extend PreventNonHttps and overwrite the $allowedSchemes array with scheme you allow to pass.

<?php

use SchenkeIo\LaravelUrlCleaner\Cleaners\PreventNonHttps;

class MyCleaner extends PreventNonHttps {

    protected array $allowedSchemes = [
        'https',
        'http',
        'sftp',
    ];
    
}

Use your own masks

Extend RemoveSearch and overwrite the $masks array with masks you want to exclude.

<?php

use SchenkeIo\LaravelUrlCleaner\Cleaners\RemoveSearch;

class MyCleaner extends RemoveSearch {

    protected array $masks = [
        'utm_*',
        'test*',
        'q@test.net'
    ];
    
}

Rewrite urls

Extend ShortAmazonProductUrl and overwrite the clean() method using the class as an example.

<?php

use SchenkeIo\LaravelUrlCleaner\Cleaners\ShortAmazonProductUrl;

class MyCleaner extends ShortAmazonProductUrl {

    public function clean(UrlData &$urlData): void
    {
        // check if the hostname is right
        if (preg_match(/* regular expression   */, $urlData->host)) {
            // check for the path to be replaced
            if (preg_match(/* regular expression */, $urlData->path, $matches)) {
                
                // your code 

                $urlData->path = /* new path */;
                $urlData->fragment = '';  // clean if applicable
                $urlData->query = '';     // clean if applicable
                $urlData->parameter = []; // clean if applicable
            }
        }
    } 
}

Data sources

Currently, the following sources are used: