crowdtwist/tld-extract

TLDExtract, library for parsing and extracting the parts of a domain name

2.0.0 2016-12-14 22:13 UTC

README

TLDExtract accurately separates the gTLD or ccTLD (generic or country code top-level domain) from the registered domain and subdomains of a URL, e.g. domain parser. For example, say you want just the 'google' part of 'http://www.google.com'.

Latest Version on Packagist

Everybody gets this wrong. Splitting on the '.' and taking the last 2 elements goes a long way only if you're thinking of simple e.g. .com domains. Think parsing http://forums.bbc.co.uk for example: the naive splitting method above will give you 'co' as the domain and 'uk' as the TLD, instead of 'bbc' and 'co.uk' respectively.

TLDExtract on the other hand knows what all gTLDs and ccTLDs look like by looking up the currently living ones according to the Public Suffix List. So, given a URL, it knows its subdomain from its domain, and its domain from its country code.

$result = tld_extract('http://forums.news.cnn.com/');
var_dump($result);
    
object(CrowdTwist\TLDExtract\Result)#34 (3) {
  ["hostname":"CrowdTwist\TLDExtract\Result":private]=>
  string(11) "forums.news"
  ["subdomain":"CrowdTwist\TLDExtract\Result":private]=>
  string(3) "cnn"
  ["suffix":"CrowdTwist\TLDExtract\Result":private]=>
  string(3) "com"
}

Result implements the ArrayAccess interface, so access its result simply.

var_dump($result['hostname']);
string(11) "forums.news"
var_dump($result['subdomain']);
string(3) "cnn"
var_dump($result['suffix']);
string(3) "com"

Also you can easily convert the result to JSON.

var_dump($result->toJson());
string(54) "{"hostname":"forums.news","subdomain":"cnn","suffix":"com"}"

Does TLDExtract make requests to Public Suffix List website?

No. TLDExtract uses the database from TLDDatabase, which is generated from the Public Suffix List and updated regularly. It does not make any HTTP requests to parse or validate a domain.

Requirements

The following versions of PHP are supported.

  • PHP 5.5
  • PHP 5.6
  • PHP 7.0
  • PHP 7.1
  • HHVM

Install

Via Composer

$ composer require crowdtwist/tld-extract

Additional result methods

Class CrowdTwist\TLDExtract\Result has some usable methods:

use CrowdTwist\TLDExtract\Extract as DomainExtractor;
$extract = new DomainExtractor;

# For domain 'shop.github.com'

$result = $extract->parse('shop.github.com');
$result->getFullHost(); // will return (string) 'shop.github.com'
$result->getRegistrableDomain(); // will return (string) 'github.com'
$result->getPublicDomain(); // will return (string) 'github.com'
$result->isValidDomain(); // will return (bool) true
$result->isIp(); // will return (bool) false

# For IP '192.168.0.1'

$result = $extract->parse('192.168.0.1');
$result->getFullHost(); // will return (string) '192.168.0.1'
$result->getRegistrableDomain(); // will return null
$result->getPublicDomain(); // will return null
$result->isValidDomain(); // will return (bool) false
$result->isIp(); // will return (bool) true

Custom database

By default the package uses the database from TLDDatabase package, but you can override this behaviour with:

new CrowdTwist\TLDExtract\Extract(__DIR__ . '/cache/mydatabase.php');

For more details and information on how to keep the database updated, see TLDDatabase.

Implementing your own result

By default, parsing will return a CrowdTwist\TLDExtract\Result object, but sometimes you need own methods or additional functionality.

You can create your own class that implements CrowdTwist\TLDExtract\ResultInterface and use it as parse result.

class CustomResult implements CrowdTwist\TLDExtract\ResultInterface
{
}

new CrowdTwist\TLDExtract\Extract(null, CustomResult::class);

Parsing modes

The package has three modes of parsing:

  • allow ICANN suffixes (domains are those delegated by ICANN or part of the IANA root zone database);
  • allow private domains (domains are amendments submitted to Public Suffix List by the domain holder, as an expression of how they operate their domain security policy);
  • allow custom (domains that are not in list, but can be usable, for example: example, mycompany, etc).

For keeping compatibility with Public Suffix List ideas package runs in all these modes by default, but you can easily change this behavior:

use CrowdTwist\TLDExtract\Extract;

new Extract(null, null, Extract::MODE_ALLOW_ICANN);
new Extract(null, null, Extract::MODE_ALLOW_PRIVATE);
new Extract(null, null, Extract::MODE_ALLOW_NOT_EXISTING_SUFFIXES);
new Extract(null, null, Extract::MODE_ALLOW_ICANN | Extract::MODE_ALLOW_PRIVATE);

Change log

Please see CHANGELOG for more information what has changed recently.

Testing

$ composer test

Contributing

Please see CONTRIBUTING and CONDUCT for details.

License

This library is released under the Apache 2.0 license. Please see License File for more information.