ispserverfarm/domain-extraction-tool

A PHP library that accurately extracts the effective top-level domain name, registered domain and subdomains from a URL or Domain

v1.0.1 2018-12-12 11:56 UTC

README

Domain Extraction Tool (DET) is a PHP library that accurately extracts the effective top-level domain name, registered domain and subdomains from a URL/DOMAIN. For example, you can use it to get the domain name “google” from “http://www.google.com”, or the TLD “co.uk” from “http://www.bbc.co.uk/”.

Example:

<?php

require_once(__DIR__ . "/vendor/autoload.php");

use \ISPServerfarm\Library\TLDExtract\TLDExtract;

## Aufruf
$extract = new TLDExtract(true, __DIR__ . "/.cache");
$components = $extract($_GET['domain']);

## Ausgabe 
echo "<pre>";
echo "SUBDOMAIN: " . $components->subdomain . PHP_EOL; // www
echo "DOMAIN: " . $components->domain  . PHP_EOL;    // bbc
echo "TLD: " . $components->tld . PHP_EOL;       // co.uk
echo "</pre>";

?>

Introduction

Most people try to do this by splitting the domain name on ‘.’ and assuming that the last component is the TLD, the next-to-last one is the domain, and so on. This works in theory – by definition, the last label in a domain name is the top-level domain. However, in practice you usually want what is know as the “public suffix” or “effective TLD” – the part of the domain name under which Internet users can directly register names.

For example, consider a URL like “http://www.darwinhigh.nt.edu.au/”. In this case, “www” is the sub-domain, “darwinhigh” is the domain registered by the user (Darwin High School), and “nt.edu.au” is the domain suffix controlled by the registrar. Splitting on the dot character will give you “au” as the TLD and “edu” as the domain name, which is usually not what you want.

On the other hand, tldextract.php uses the Public Suffix List maintained by Mozilla to see what gTLDs, ccTLDs and domain suffixes are actually in use, and to find out about any TLD- or country-specific exceptions. So it will give you the right answer.

This library a PHP port of the tldextract Python module.

Installation

Download tldextractphp.zip. Unpack the archive and move the tldextractphp directory to your “libraries” (or equivalent) directory. Add this line to the top of your script:

foo@bar:~$ composer require ispserverfarm/domain-extraction-tool

Requirements:

tldextract.php requires PHP 5.3 or later. It should work on PHP 5.2 as well, but I have not tested it.

Usage

$extract = new \ISPServerfarm\Library\TLDExtract\TLDExtract();
$components = $extract('www.bbc.co.uk');

echo $components->subdomain; // www
echo $components->domain;    // bbc
echo $components->tld;       // co.uk

Alternatively, you can also access the domain components using array syntax:


$extract = new \ISPServerfarm\Library\TLDExtract\TLDExtract();
$components = $extract('domain.org.kg');

echo $components['tld']; // org.kg

Note that the value returned by tldextract() is not a native PHP array, so most array manipulation functions (e.g. implode()) will not work. Use the toArray() method to get the components as an array:

$extract = new \ISPServerfarm\Library\TLDExtract\TLDExtract();
$components = $extract('www.bbc.co.uk');

print_r($components->toArray());

// Array ( [subdomain] => www [domain] => bbc [tld] => co.uk )

Caching And Advanced Usage

This library will automatically attempt to download the latest TLD list from the Public Suffix List when you first run it. It will then cache that list in /path/to/tldextractphp/.tld_set. The cache stays valid indefinitely, so it won’t download the list again unless you manually delete .tld_set.

To prevent this download or choose a different location for the cache file, you will need to create your own TLDExtract instance. The class constructor takes two optional arguments:

$fetch – set to true to enable TLD list download, or false to disable. If disabled, the library will fall back to using the included snapshot (.tld_set_snapshot). $cacheFile – set an alternative file name for the TLD list cache. Example:

//Disable live TLD rule set updates. The library will fall back to //using the included snapshot.

$extract = new \ISPServerfarm\Library\TLDExtract\TLDExtract(false);
$components = $extract('http://example.com');

//Store the TLD cache elsewhere.

$extract = new \ISPServerfarm\Library\TLDExtract\TLDExtract(true, '/path/to/alternative/cache_file');
$components = $extract('http://example.com');

Running the tests

This library includes a set of PHPUnit tests. To run the tests, open your favourite command-line terminal, navigate to the tldextractphp directory and enter:

foo@bar:~$ phpunit ./tests

Note that the full test suite can take a while to execute. That’s because in addition to normal unit tests, it will also attempt to download the TLD list from Public Suffix List and verify that the local snapshot is up to date. To skip that test, run this instead:

foo@bar:~$ cd tests
foo@bar:~$ phpunit ExtractorTest

Source: https://w-shadow.com/blog/2012/08/28/tldextract/