deevee15/text-similarity-php

php service for text similarity

Maintainers

Package info

github.com/deevee15/text-similarity-php

Homepage

pkg:composer/deevee15/text-similarity-php

Statistics

Installs: 1

Dependents: 0

Suggesters: 0

Stars: 0

Open Issues: 0

v1.0.0 2026-04-15 17:31 UTC

This package is auto-updated.

Last update: 2026-04-15 17:35:37 UTC


README

TextSimilarity is pure PHP library for detecting duplicate news articles using morphological analysis and entity-weighted scoring.

Read more about TextSimilarity:

Features

  • Morphological text processing via phpMorphy
  • Named entity extraction (names, locations, organizations, abbreviations)
  • Weighted scoring system with configurable coefficients
  • File-based word cache for performance
  • English and Russian language support

Installation

TextSimilarity is installed via Composer. To add a dependency to TextSimilarity in your project,

Run the following to use the latest stable version

composer require deevee15/text-similarity-php

Requirements

Getting started

use PHPTextSimilarity\TextSimilarity;

$result = TextSimilarity::compare(
    'en',
    'First article`s text...',
    'Second article`s text...',
    ['first' => "Article's title", 'second' => "Article's title"]
);

How it works

The TextSimilarity library divides all words from compared texts and article titles into entities (proper names, common nouns, locations, abbreviations, organizations), converts them to the nominative case, then retains only the matching ones, assigns points based on the matches, and multiplies them by the importance coefficients specified in src/Config/WeightConfig.php.

Demo

Here is the link to the demo website

License

Apache 2.0