tabuna / similar
Unlock the power of effortless grouping by identifying similar strings based on shared topics within a set of sentences.
Installs: 93 072
Dependents: 1
Suggesters: 0
Security: 0
Stars: 6
Watchers: 2
Forks: 0
Open Issues: 0
Requires
- php: ^7.3|^8.0
- illuminate/collections: ~8.0|~9.0
Requires (Dev)
- phpunit/phpunit: ^9.0
README
This is an elementary library for working on identifying similar strings in PHP without using machine learning. It allows you to get groups of one topic from the transferred set of sentences. For example, combine news headlines from different publications, as Google News does.
Installation
Run this at the command line:
$ composer require esplora/similar
Usage
We need to create an object by passing a closure function as an argument, which checks if two strings are similar:
use Esplora\Similar\Similar; $similar = new Similar(function (string $a, string $b) { similar_text($a, $b, $copy); return 51 < $copy; });
Note that you don't need to use
similar_text
. You can use other implementations likesoundex
or something else.
Then we have to call the findOut
method passing it a one-dimensional array with strings:
$similar->findOut([ 'Elon Musk gets mixed COVID-19 test results as SpaceX launches astronauts to the ISS', 'Elon Musk may have Covid-19, should quarantine during SpaceX astronaut launch Sunday', // Superfluous word 'Can Trump win with ‘fantasy’ electors bid? State GOP says no', ]);
As a result, there will be only one group containing headers:
'Elon Musk gets mixed COVID-19 test results as SpaceX launches astronauts to the ISS', 'Elon Musk may have Covid-19, should quarantine during SpaceX astronaut launch Sunday',
Keys
The input array stores its keys so that you can do additional processing:
$similar->findOut([ 'kos' => "Trump acknowledges Biden's win in latest tweet", 'foo' => 'Elon Musk gets mixed COVID-19 test results as SpaceX launches astronauts to the ISS', 'baz' => 'Trump says Biden won but again refuses to concede', 'bar' => 'Elon Musk may have Covid-19, should quarantine during SpaceX astronaut launch Sunday', ]);
The result will be two groups:
[ 'foo' => 'Elon Musk gets mixed COVID-19 test results as SpaceX launches astronauts to the ISS', 'bar' => 'Elon Musk may have Covid-19, should quarantine during SpaceX astronaut launch Sunday', ], [ 'baz' => 'Trump says Biden won but again refuses to concede', 'kos' => "Trump acknowledges Biden's win in latest tweet", ],
Objects
It is also possible to pass objects to evaluate more complex conditions. Each passed object must be able to cast to a string via the __toString()
method.
$similar->findOut([ new FixtureStingObject('Lorem ipsum dolor sit amet, consectetur adipiscing elit.'), ]);
License
The MIT License (MIT). Please see License File for more information.