makarms / text-probe
Simple and extensible PHP library for text analysis and pattern matching, designed to help developers probe, parse, and manipulate text efficiently.
Requires
- php: >=8.1
- ext-mbstring: *
Requires (Dev)
- phpunit/phpunit: ^10
README
TextProbe
TextProbe is a simple and extensible PHP library for text analysis and pattern matching. It is designed to help developers probe, parse, and manipulate text efficiently using customizable rules and matchers.
Features
- π§ Easy-to-use API for text matching and parsing
- π§ Extensible architecture β write your own matchers and rules
- π‘ Suitable for parsing logs, user input, or any structured text
Installation
You can install the library via Composer:
composer require makarms/text-probe
Available Probes
The library comes with several built-in probes to detect common patterns in text:
π§βπ» Contact & Identity
-
DiscordNewUsernameProbe
β extracts Discord usernames in the new format (e.g.,@username
), enforcing Discordβs updated naming rules (length, characters, no consecutive dots). -
DiscordOldUsernameProbe
β extracts classic Discord usernames in the formatusername#1234
, ensuring proper structure and valid discriminator. -
EmailProbe
β extracts email addresses. -
PhoneProbe
β extracts phone numbers (supports various formats). -
SlackUsernameProbe
β extracts Slack usernames (e.g., @username), supporting Slack-specific username rules such as allowed characters, length limits, and no consecutive dots. -
TelegramUserLinkProbe
β extracts t.me links pointing to Telegram users. -
TelegramUsernameProbe
β extracts Telegram usernames (e.g.,@username
).
π Date & Time
-
DateProbe
β extracts dates in various formats (e.g., YYYY-MM-DD, DD/MM/YYYY, 2nd Jan 2023). -
DateTimeProbe
β extracts combined date and time in multiple common formats. -
TimeProbe
β extracts times (e.g., 14:30, 14:30:15, optional AM/PM).
π³ Finance
BankCardNumberProbe
β extracts bank card numbers in common formats: plain digits (e.g., 4111111111111111), digits separated by spaces (e.g., 4111 1111 1111 1111) or dashes (e.g., 4111-1111-1111-1111). Only Luhn-valid numbers by default.
πΊ Geolocation
GeoCoordinatesProbe
β extracts geographic coordinates in various formats (decimal or degrees/minutes/seconds, N/S/E/W).
π· Social & Tags
HashtagProbe
β extracts hashtags from text (e.g., #example), supporting Unicode letters, numbers, and underscores, detecting hashtags in any position of the text.
π UUID & Identifiers
-
UUIDProbe
β extracts any valid UUID (v1βv6) without checking the specific version. Supports standard UUID formats with hyphens. -
UUIDv1Probe
β extracts UUID version 1, matching the formatxxxxxxxx-xxxx-1xxx-xxxx-xxxxxxxxxxxx
, commonly used for time-based identifiers. -
UUIDv2Probe
β extracts UUID version 2, matching the formatxxxxxxxx-xxxx-2xxx-xxxx-xxxxxxxxxxxx
, typically used in DCE Security contexts. -
UUIDv3Probe
β extracts UUID version 3, matching the formatxxxxxxxx-xxxx-3xxx-xxxx-xxxxxxxxxxxx
, generated using MD5 hashing of names and namespaces. -
UUIDv4Probe
β extracts UUID version 4, matching the formatxxxxxxxx-xxxx-4xxx-xxxx-xxxxxxxxxxxx
, randomly generated and commonly used for unique identifiers. -
UUIDv5Probe
β extracts UUID version 5, matching the formatxxxxxxxx-xxxx-5xxx-xxxx-xxxxxxxxxxxx
, generated using SHA-1 hashing of names and namespaces. -
UUIDv6Probe
β extracts UUID version 6, matching the formatxxxxxxxx-xxxx-6xxx-xxxx-xxxxxxxxxxxx
, an ordered version for better indexing and sorting.
π Web & Network
-
DomainProbe
β extracts domain names, including internationalized (Unicode) domains. -
IPv4Probe
β extracts IPv4 addresses, supporting standard formats and excluding reserved/bogus ranges if necessary. -
IPv6Probe
β extracts IPv6 addresses, including compressed formats, IPv4-mapped addresses, and zone indexes (e.g., %eth0). -
LinkProbe
β extracts hyperlinks, including ones with IP addresses, ports, or without a protocol. -
MacAddressProbe
β extracts MAC addresses in standard formats using colons or hyphens (e.g., 00:1A:2B:3C:4D:5E or 00-1A-2B-3C-4D-5E), accurately detecting valid addresses while excluding invalid patterns. -
UserAgentProbe
β extracts User-Agent strings from text, supporting complex structures like multiple product tokens, OS information, and browser identifiers.
You can implement your own probes by creating classes that implement the IProbe
interface.
Each probe also supports using a different validator for the returned values by passing an instance of a class
implementing the IValidator
interface to the probeβs constructor. This allows you to override the default validation
logic.
For example, BankCardNumberProbe
uses a default validator based on the Luhn algorithm, but you can provide your
own validator if you want to enforce additional rules, such as limiting to specific card issuers or formats.
Usage Example
require __DIR__ . '/vendor/autoload.php'; use TextProbe\TextProbe; use TextProbe\Probes\Contact\EmailProbe; $text = "Please contact us at info@example.com for more details."; $probe = new TextProbe(); $probe->addProbe(new EmailProbe()); $results = $probe->analyze($text); foreach ($results as $result) { echo sprintf( "[%s] %s (position %d-%d)\n", $result->getProbeType()->name, $result->getResult(), $result->getStart(), $result->getEnd() ); }
Expected output:
[EMAIL] info@example.com (position 21-37)