riimu/kit-urlparser

RFC 3986 compliant url parsing library

Installs: 1 147

Dependents: 1

Stars: 6

Watchers: 2

Forks: 2

Language: PHP

v1.1.0 2015-01-12 20:08 UTC

README

UrlParser is PHP library that provides a RFC 3986 compliant URL parser. The purpose of this library is to parse information from URLs according to the ABNF definition described in the RFC documentation. In other words, this library simplifies getting the different components from URLs.

PHP already provides a built in function parse_url(). However, that function behaves somewhat differently from the RFC definition, since it is more lenient towards URLs that do not exactly fit the specification. This library provides a more accurate implementation for parsing and even for validating URLs.

While this library is called URL parser, it does, in fact, conform to the generic URI syntax. Thus, it is entirely possible to parse any kind of URIs using this library. Some of the functionality is simply more useful when dealing with URLs.

The API documentation, which can be generated using Apigen, can be read online at: http://kit.riimu.net/api/urlparser/

Build Status Coverage Status Scrutinizer Code Quality

Requirements

In order to use this library, the following requirements must be met:

  • PHP version 5.4

Installation

This library can be installed via Composer. To do this, download the composer.phar and require this library as a dependency. For example:

$ php -r "readfile('https://getcomposer.org/installer');" | php
$ php composer.phar require riimu/kit-urlparser:1.*

Alternatively, you can add the dependency to your composer.json and run composer install. For example:

{
    "require": {
        "riimu/kit-urlparser": "1.*"
    }
}

Any library that has been installed via Composer can be loaded by including the vendor/autoload.php file that was generated by Composer.

It is also possible to install this library manually. To do this, download the latest release and extract the src folder to your project folder. To load the library, include the provided src/autoload.php file.

Usage

Using this library is relatively straightforward. The class UrlParser provides two methods for parsing URLs, which are parseUrl() and parseRelative(). Both of these methods take the URL as a parameter and return an instance of UrlInfo (or a null if the URL cannot be parsed).

For example:

<?php

require 'vendor/autoload.php';
$parser = new \Riimu\Kit\UrlParser\UrlParser();
$info = $parser->parseUrl('http://jane:pass123@www.example.com:8080/site/index.php?action=login&prev=index#form');

// The following outputs: http://jane:pass123@www.example.com:8080/site/index.php?action=login&prev=index#form
echo $info->getUrl() . PHP_EOL;

echo $info->getScheme() . PHP_EOL;        // outputs: http
echo $info->getUsername() . PHP_EOL;      // outputs: jane
echo $info->getPassword() . PHP_EOL;      // outputs: pass123
echo $info->getHostname() . PHP_EOL;      // outputs: www.example.com
echo $info->getIpAddress() . PHP_EOL;     // outputs: 93.184.216.34
echo $info->getPort() . PHP_EOL;          // outputs: 8080
echo $info->getDefaultPort() . PHP_EOL;   // outputs: 80
echo $info->getPath() . PHP_EOL;          // outputs: /site/index.php
echo $info->getFileExtension() . PHP_EOL; // outputs: php
echo $info->getQuery() . PHP_EOL;         // outputs: action=login&prev=index
echo $info->getFragment() . PHP_EOL;      // outputs: form

// The following would dump the array ['action' => 'login', 'prev' => 'index']
var_dump($info->getVariables());

The difference between parseUrl() and parseRelative() is that the former conforms to the URI definition, while the latter conforms to the relative-ref definition. In other words, URLs parsed by parseUrl() must have scheme part, but URLs parsed by parseRelative() cannot have scheme.

Retrieving information

Both of the parsing methods return an instance of UrlInfo class. This class provides the following relevant method for retrieving information about the URL:

  • getUrl() returns the entire parsed URL in it's original form.

  • getScheme() returns the scheme from the URL or false if it's not present.

  • getUsername() returns the username from the URL or false if it's not present.

  • getPassword() returns the password from the URL or false if it's not present.

  • getHostname() returns the hostname from the URL (or the IP address, if URL had an IP address instead of a hostname) or false if it's not present.

  • getIpAddress($resolve = true) returns the IP address for the hostname. If the parameter is set to false, the method will only return the IP address if it was present in the URL in the first place. False is returned, if the IP address cannot be determined.

  • getPort($useDefault = true) returns the port in the URL. If the first parameter is set to true, then the default port for the scheme will be returned if no port is present in the URL. Otherwise, false will be returned.

  • getDefaultPort() returns the default port for the scheme or false if it is not known.

  • getPath() returns the path from the URL or an empty string if no path is present.

  • getFileExtension() returns the file extension from the path or false if there is no file extension.

  • getQuery() returns the query from the URL as a string or false if it is not present.

  • getVariables() returns the query parsed into variables or an empty array if there are no variables to parse in the query.

  • getFragment() returns the fragment from the URL or false if it is not present.

URL validation

Since both of the parsing methods return a null if the URL is not valid, it is possible to use this library for validating URLs. However, it should be noted that the URI specification is very generic. For example, 'a:' is a valid URI. The parser only makes sure that the URL follows the correct format.

Thus, if you want to use this library for validating URLs, it is also highly recommended to also use the methods such as getScheme() and getHostname() to make sure that the URL contains something valid.

Credits

This library is copyright 2013 - 2015 to Riikka Kalliomäki.

See LICENSE for license and copying information.