nojimage/twitter-text-php

A library of PHP classes that provide auto-linking and extraction of usernames, lists, hashtags and URLs from tweets.

v3.3.1 2023-07-12 03:31 UTC

README

A library of PHP classes that provide auto-linking and extraction of usernames, lists, hashtags and URLs from tweets. Originally created from twitter-text-rb and twitter-text-java projects by Matt Sanford and ported to PHP by Mike Cochrane, this library has been improved and made more complete by Nick Pope.

Build Status Codecov Latest Stable Version

Requirements

  • PHP 7.4 or higher
  • ext-mbstring
  • ext-intl

If the necessary extensions are not installed on the server, please install it additionally or use symfony/polyfill.

Install

You can install this library into your application using Composer.

composer require nojimage/twitter-text-php

Note for Older Server

This library use intl/libICU. Some older server and PHP 7.2+ combinations may have deprecated warnings due to older ICU versions. refs #32

If you are using RHEL/CentOS 6, installing PHP using the remi repository is the best choice. If you use remi, you can use the new ICU.

Features

Autolink

  • Add links to all matching Twitter usernames (no account verification).
  • Add links to all user lists (of the form @username/list-name).
  • Add links to all valid hashtags.
  • Add links to all URLs.
  • Support for international character sets.

Extractor

  • Extract mentioned Twitter usernames (from anywhere in the tweet).
  • Extract replied to Twitter usernames (from start of the tweet).
  • Extract all user lists (of the form @username/list-name).
  • Extract all valid hashtags.
  • Extract all URLs.
  • Support for international character sets.

Hit Highlighter

  • Highlight text specifed by a range by surrounding with a tag.
  • Support for highlighting when tweet has already been autolinked.
  • Support for international character sets.

Validation

  • Validate different twitter text elements.
  • Support for international character sets.

Parser

  • Parses a given tweet text with the weighted character count configuration.

Length validation

twitter-text 3.0 updates the config file with emojiParsingEnabled config option. When true, twitter-text-php will parse and discount emoji supported by the Unicode Emoji 11.0 (NOTE: Original twitter-text supported twemoji library). The length of these emoji will be the default weight (200 or two characters) even if they contain multiple code points combined by zero-width joiners. This means that emoji with skin tone and gender modifiers no longer count as more characters than those without such modifiers.

twitter-text 2.0 introduced configuration files that define how Tweets are parsed for length. This allows for backwards compatibility and flexibility going forward. Old-style traditional 140-character parsing is defined by the v1.json configuration file, whereas v2.json is updated for "weighted" Tweets where ranges of Unicode code points can have independent weights aside from the default weight. The sum of all code points, each weighted appropriately, should not exceed the max weighted length.

Some old methods from twitter-text-php 1.0 have been marked deprecated, such as the Twitter\Text\Validator::isValidTweetText(), Twitter\Text\Validator::getTweetLength() method. The new API is based on the following method, Twitter\Text\Parser::parseTweet()

use Twitter\Text\Parser;
$parser = new Parser();
$result = $parser->parseTweet($text);

This method takes a string as input and returns a results object that contains information about the string. Twitter\Text\ParseResults object includes:

  • weightedLength: the overall length of the tweet with code points weighted per the ranges defined in the configuration file.

  • permillage: indicates the proportion (per thousand) of the weighted length in comparison to the max weighted length. A value > 1000 indicates input text that is longer than the allowable maximum.

  • valid: indicates if input text length corresponds to a valid result.

  • displayRangeStart, displayRangeEnd: An array of two unicode code point indices identifying the inclusive start and exclusive end of the displayable content of the Tweet. For more information, see the description of display_text_range here: Tweet updates

  • validRangeStart, validRangeRnd: An array of two unicode code point indices identifying the inclusive start and exclusive end of the valid content of the Tweet. For more information on the extended Tweet payload see Tweet updates

Examples

For examples, please see tests/example.php which you can view in a browser or run from the command line.

Conformance

You'll need the test data which is in YAML format from the following repository:

https://github.com/twitter/twitter-text

twitter/twitter-text already included in composer.json, so you should just need to run:

curl -s https://getcomposer.org/installer | php
php composer.phar install

There are a couple of options for testing conformance:

  • Run phpunit in from the root folder of the project.

Thanks & Contributions

The bulk of this library is from the heroic efforts of: