gpupo/similarity

Calculate the similarity between strings and numbers working in a different way from a diff tool

Installs: 46

Dependents: 0

Stars: 3

Watchers: 1

Language: PHP

1.2 2015-08-13 12:57 UTC

README

Calculate the similarity between strings or numbers

  • Supports Stopwords
  • Working in a different way from a diff tool

Build Status Scrutinizer Code Quality Code Climate Test Coverage

Usage

Example 1, with stopwords:

    $stringA = 'Av. Padre Anchieta 1873 - Champagnat - Curitiba - Brasil';
    $stringB = 'Brasil - Parana - Curitiba - Champagnat - '
        .'Rua Padre Anchieta 1873 - Perto da Avenida';
    $stopwordsList = explode(',', 'Av,Rua,Avenida,perto,da,de,e,em,o');
    $s = new Similarity();
    $s->setValues($stringA, $stringB);
    $s->setAccuracy(80); // 1-100 accuracy value
    $s->setStopwords($stopwordsList);
    $similar = $s->hasSimilarity(); //true

Example 2, chain method calls:

    $s = new Similarity();
    $result = $s->setValues($stringA, $stringB)->setAccuracy(60)
        ->setStopwords($stopwordsList)->hasSimilarity();

Example 3, numbers:

    $s = new Similarity();
    $resultA = $s->setNumberValues('1530D',1510)->hasSimilarity(); // true
    $resultB = $s->setNumberValues('3D',4)->hasSimilarity(); // true
    $resultC = $s->setNumberValues('100B',205)->hasSimilarity(); // false
    $resultD = $s->setNumberValues('20',2)->hasSimilarity(); // false
    $resultE = $s->setNumberValues('3 - D 4',34)->hasSimilarity(); // true

Install

The recommended way to install is through composer.

{
    "require": {
        "gpupo/similarity": "dev-master"
    }
}

Tests

All tests are run automatically at each commit, on OSx and Linux environment in PHP versions 5.3, 5.4, 5.5, 5.6, 7.0 and hhvm using Travis.

To run localy the test suite:

$ phpunit

See the testdox output

$ phpunit --testdox

Contributors

License

MIT, see LICENSE.

Links

Test Docs

Input\Decorator

  • Clean characters
  • Clean numbers

Input\InputNumber

  • Clean ignored characters

Input\InputString

  • Clean ignored characters
  • Clean stopwords

SimilarNumber

  • Success to find similarity
  • Success to find proximity
  • Success to find proximity with distant numbers

SimilarText

  • Success to find percentage similarity
  • Success to find percentage with texts with no similarity
  • Success to find the levenshtein distance

Similarity

  • Success on assert similarities with strings
  • Success in asserting that the phrase is different
  • Success on assert similarities with numbers
  • Success on assert similarities with approximate numbers
  • Success on assert with different numbers
  • Ability to increase the accuracy
  • Ability to decrease the accuracy
  • Ability to inject stopwords