dd / php-library-ddtextcompare
A simple tool allowing to compare two texts
Installs: 2 922
Dependents: 0
Suggesters: 0
Security: 0
Stars: 1
Watchers: 4
Forks: 0
Open Issues: 0
Requires
- php: >=5.3.0
This package is auto-updated.
Last update: 2024-11-08 19:47:06 UTC
README
A simple tool allowing to compare two strings. The library works only with UTF-8 strings for now.
What this library is
ddTextCompare can be used in situations when it is not possible to make the native string comparison in PHP. The library relies on chars number and their position, therefore, it's suitable for a comparison of strings containing typos or having different word order but the same words.
What this library isn't
ddTextCompare has nothing to do with morphological or any other analysis you would expect from an intelligent search engine. The main purpose for the library is to stay simple.
How it works
By default ddTextCompare performs a simple analysis. Each string being compared is represented as two n-dimensional vectors, where n is the number of unique chars used in both strings. The firs vector v1 shows what chars are in its string and their total number. The other vector v2 shows where those unique chars are. When those two vectors are found for each string, cosine similarity will be found for the vector pairs. Once calculated, cos(v1,1v1,2) and cos(v2,1v2,2) are multiplied by their wights respectively and the result is modified to make it belong to a range between 0 and 1.
Installation
Composer
Just add the package to your composer.json.
composer require dd/php-library-ddtextcompare
Manually
Though it's convenient to use Composer, it is also possible to place the library wherever you want and include all the classes inside the “src” folder manually.
Basic usage
Here are some examples.
Comparing strings with typos
$compare = new DDTextCompare(); $similarity = $compare->compare("Text without any typos", "Text wihtout ayn typoes"); // $similarity = 0.99076390557741
Adjusting weights
By default, the weights for all criteria are equal, but it can be changed. When a wight is changed the another weight will be adjusted automatically to make their sum equal to 1.
$compare = new DDTextCompare(); $comparator = new Comparator\Cosine(); //Change the char total criterion weight $comparator->setCharTotalWeight(0.8) $similarity = $compare->compare("Text without any typos", "Txet wihtout ayn typoes", $comparator); // $similarity = 0.99310343750374
Extending
A custom comparator class can be created by implementing the DDTextCompare\Comparator interface.
$compare = new DDTextCompare(); $comparator = new Comparator\YourCustomComparator(); $similarity = $compare->compare("Text without any typos", "Txet wihtout ayn typoes", $comparator);
Changelog
Version 0.9 (2015-11-07)
- + The first release.