henrik9999 / string-similarity
Finds degree of similarity between two strings, based on Dice's Coefficient, which is mostly better than Levenshtein distance.
Installs: 8 120
Dependents: 0
Suggesters: 0
Security: 0
Stars: 1
Watchers: 1
Forks: 2
Open Issues: 0
Requires
- php: ^7.4|^8.0
- ext-mbstring: *
Requires (Dev)
- phpunit/phpunit: ^9
This package is not auto-updated.
Last update: 2025-04-24 23:45:29 UTC
README
Finds degree of similarity between two strings, based on Dice's Coefficient, which is mostly better than Levenshtein distance.
This implementation actually treats multiple occurrences of a bigram as unique. The correctness of this behavior is most easily seen when getting the similarity between "GG" and "GGGGGGGG", which should obviously not be 1.
This is a PHP implemenation of the Node.js package string-similarity
Usage
Install using:
composer require henrik9999/string-similarity
In your code:
$stringSimilarity = new StringSimilarity(); $similarity = $stringSimilarity->compareTwoStrings("healed", "sealed"); $matches = $stringSimilarity->findBestMatch("healed", [ "edward", "sealed", "theatre", ]);
API
The package contains two methods:
compareTwoStrings(string $string1, string $string2, bool $casesensitive)
Returns a fraction between 0 and 1, which indicates the degree of similarity between the two strings. 0 indicates completely different strings, 1 indicates identical strings. The comparison is case-sensitive by default.
Arguments
- string1 (string): The first string
- string2 (string): The second string
- casesensitive (bool): If the comparison should be case-sensitive
Order does not make a difference.
Returns
(number): A fraction from 0 to 1, both inclusive. Higher number indicates more similarity.
Examples
$stringSimilarity->compareTwoStrings("healed", "sealed"); // → 0.8 $stringSimilarity->compareTwoStrings( "Olive-green table for sale, in extremely good condition.", "For sale: table in very good condition, olive green in colour." ); // → 0.6060606060606061 $stringSimilarity->compareTwoStrings( "Olive-green table for sale, in extremely good condition.", "For sale: green Subaru Impreza, 210,000 miles" ); // → 0.2558139534883721 $stringSimilarity->compareTwoStrings( "Olive-green table for sale, in extremely good condition.", "Wanted: mountain bike with at least 21 gears." ); // → 0.1411764705882353
findBestMatch(string mainString, array targetStrings, bool $casesensitive)
Compares mainString
against each string in targetStrings
.
Arguments
- mainString (string): The string to match each target string against.
- targetStrings (array): Each string in this array will be matched against the main string.
- casesensitive (bool): If the comparison should be case-sensitive.
Returns
(Object): An object with a ratings
property, which gives a similarity rating for each target string, a bestMatch
property, which specifies which target string was most similar to the main string, and a bestMatchIndex
property, which specifies the index of the bestMatch in the targetStrings array.
Examples
$stringSimilarity->findBestMatch('Olive-green table for sale, in extremely good condition.', [ 'For sale: green Subaru Impreza, 210,000 miles', 'For sale: table in very good condition, olive green in colour.', 'Wanted: mountain bike with at least 21 gears.' ]); // → array(3) { ["ratings"]=> array(3) { [0]=> array(2) { ["target"]=> string(45) "For sale: green Subaru Impreza, 210,000 miles" ["rating"]=> float(0.2558139534883721) } [1]=> array(2) { ["target"]=> string(62) "For sale: table in very good condition, olive green in colour." ["rating"]=> float(0.6060606060606061) } [2]=> array(2) { ["target"]=> string(45) "Wanted: mountain bike with at least 21 gears." ["rating"]=> float(0.1411764705882353) } } ["bestMatch"]=> array(2) { ["target"]=> string(62) "For sale: table in very good condition, olive green in colour." ["rating"]=> float(0.6060606060606061) } ["bestMatchIndex"]=> int(1) }
Release Notes
1.0.1
- Made some perfomance improvements
1.0.0
- Initial Release