edulazaro/urlnormalizer

There is no license information available for the latest version (v1.0.2) of this package.

A package to normalize URLs and filter URLs

v1.0.2 2024-04-15 22:57 UTC

This package is auto-updated.

Last update: 2024-05-15 23:14:19 UTC


README

Total Downloads Latest Stable Version

Introduction

URL nnormalizer allows to normalize a URL. A normalized URL refers to the format of a URL that has been standardized according to a set of rules. The purpose of URL normalization (or URL canonicalization) is to transform a URL into a normalized or canonical form. This way, URLs that are essentially identical but represented differently are considered equal by web servers or applications.

Normalization helps in reducing duplication of URLs where multiple URLs point to the same content. For instance, the following URLs might point to the same content but are represented differently:

For example, the URLs http://edulazaro.com?a=1&b=2 and http://edulazaro.com?b=2&a=1 are the same, and can be normalized to http://edulazaro.com?a=1&b=2.

The same happens with the URLs http://edulazaro.com/ and http://edulazaro.com, where the forward slash is indifferent.

The same happens with dot segments like /../ or /./ and with the encoded unreserved characters like %61, which can be represented as an a.

Installation

To install the package just execute this command:

composer require edulazaro/urlnormalizer

Usage

Just import the class URLNormalizer can use the normalize method:

use Edulazaro\URLNormalizer\URLNormalizer;

$normalizer = new URLNormalizer();

$normalizedURL = $normalizer->normalize('http://edulazaro.com?a=1&b=2');

You can also get the top domain of a URL by using the getURLTopLevelDomain method:

use Edulazaro\URLNormalizer\URLNormalizer;

$normalizer = new URLNormalizer();

$topDomain = $normalizer->getURLTopLevelDomain('http://test.edulazaro.com?a=1&b=2');

The class URLCounter is also included so you can count the number of unique normalized URLs in an array:

use Edulazaro\URLNormalizer\URLCounter;

$normalizer = new URLCounter();

$normalizedURLs = $normalizer->countUniqueUrls([
    'http://edulazaro.com?a=1&b=2',
    'http://edulazaro.com?a=2&b=1'
]);

Or you can also count them per top level domain:

use Edulazaro\URLNormalizer\URLCounter;

$normalizer = new URLCounter();

$normalizedURLsPerDomain = $normalizer->countUniqueUrlsPerTopLevelDomain([
    'http://test.edulazaro.com?a=1&b=2',
    'http://edulazaro.com?a=2&b=1',
    'http://neoguias.com'
]);

Testing

To test the package run composer test.