mensbeam/microformats

A microformats parser

0.1.1 2023-06-28 19:59 UTC

This package is auto-updated.

Last update: 2024-10-27 19:09:47 UTC


README

A generic Microformats parser for PHP. While it similar to php-mf2, it combines a more accurate HTML parser with more consistent performance characteristics, and it is believed to have fewer bugs.

Usage

Functionality is provided for parsing from an HTTP URL, from a file, from a string, and from an HTML element (a \DOMElement object), as well as for serializing to JSON. A static method of the MensBeam\Microformats class is provided for each task.

The parsing methods all return a Microformats structure as an array. The Microformats wiki includes some sample structures in JSON format.

Parsing from a URL

\MensBeam\Microformats::fromUrl(string $url, array $options = []): ?array

The $url argument is an HTTP(S) URL to an HTML resource; redirections will be followed if neceesary. If the resource cannot be fetched null will be returned.

The $options argument is a list of options for the Microformats parser. See below for details.

Parsing from a file

\MensBeam\Microformats::fromFile(string $file, string $contentType, string $url, array $options = []): ?array

The $file argument is the path to a local file. If the file cannot be opened for reading null will be returned.

The $contentType argument is a string containing the value of the file's HTTP Content-Type header, if known. This may be used to provide the HTML parser with character encoding information.

The $url argument is a string containing the file's effective URL. This is used to resolve any relative URLs in the input.

The $options argument is a list of options for the Microformats parser. See below for details.

Parsing from a string

\MensBeam\Microformats::fromString(string $input, string $contentType, string $url, array $options = []): array

The $input argument is the string to parse for micrformats.

The $contentType argument is a string containing the value of the string's HTTP Content-Type header, if known. This may be used to provide the HTML parser with character encoding information.

The $url argument is a string containing the string's effective URL. This is used to resolve any relative URLs in the input.

The $options argument is a list of options for the Microformats parser. See below for details.

Parsing from an HTML element

\MensBeam\Microformats::fromHtmlElement(\DOMElement $input, string $url, array $options = []): array

The $input argument is the element to parse for micrformats. Typically this would be the documentElement, but any element may be parsed.

The $url argument is a string containing the string's effective URL. This is used to resolve any relative URLs in the input.

The $options argument is a list of options for the Microformats parser. See below for details.

Serializing to JSON

\MensBeam\Microformats::toJson(array $data, int $flags = 0, int $depth = 512): string

Since Microformats data is represented as a structure of nested arrays, some of which are associative ("objects" in JSON parlance) and may be empty, it is necessary to convert such empty array into PHP stdClass objects before they are serialized to JSON. This method performs these conversions before passing the result to the json_encode function. Its parameters are the same as that of json_encode.

Options

The parsing methods all optionally take an $options array as an argument. These options are all flags, either for experimental features, or for backwards-compatible features no longer used by default. The options are as followings: