Library for mining

Installs: 1 226

Dependents: 0

Stars: 18

Watchers: 3

Forks: 1

1.0.0 2014-03-24 09:19 UTC


Build Status Code Coverage Scrutinizer Quality Score

This library should make it easier to find recommendations and similarities between different things. There are a couple of use cases that I developed it for:

  • Recommend a list of music albums/artists to a user
  • Recommend an article that is similar to the current one that a user is reading
  • Find other users that have the same values as another user (think matchmaking ;)


The easiest way to get this installed in your project is by using composer

$ composer require stojg/recommend


Assuming that we have some data where users have rated music artists within a scale of one to five:

$artistRatings = array(
    "Abe" => array(
        "Blues Traveler" => 3,
        "Broken Bells" => 2,
        "Norah Jones" => 4,
        "Phoenix" => 5,
        "Slightly Stoopid" => 1,
        "The Strokes" => 2,
        "Vampire Weekend" => 2
    "Blair" => array(
        "Blues Traveler" => 2,
        "Broken Bells" => 3,
        "Deadmau5" => 4,
        "Phoenix" => 2,
        "Slightly Stoopid" => 3,
        "Vampire Weekend" => 3
    "Clair" => array(
        "Blues Traveler" => 5,
        "Broken Bells" => 1,
        "Deadmau5" => 1,
        "Norah Jones" => 3,
        "Phoenix" => 5,
        "Slightly Stoopid" => 1

We then load this data into the Data class

$data = new \stojg\recommend\Data($artistRatings);

If we want to find artists that Blair might like, we execute the recommend method.

$recommendations = $data->recommend('Blair', new \stojg\recommend\strategy\Manhattan());

The result of that computation would be:

array (
  0 => array (
    'key' => 'Norah Jones',
    'value' => 4,
  1 => array (
    'key' => 'The Strokes',
    'value' => 2,

This means that Blair might like Norah Jones and not like The Strokes.

The Recommender works by finding someone in the $artistRatings that have rated artist similar to to Blair. In this case it turns out to be Abe, so it then tries to find artists that Abe have rated but not Blair and return them as a list of recommendations.

How the 'nearest neighbour' is found depends on which strategy that is chosen and how big and dense the dataset is.

The Dataset

The general rule is that the bigger the dataset is, the better. It have to be formatted as an array in the following format:

    'uniqueID' => array(
        'objectID' => (int)'rating'

Where in the case of the previous artist rating example

* uniqueID = Blair
* objectID = Music Artist
* rating = an numeric value


There are currently three strategies and which one to pick depends on how the data is organized and populated.


If the data is dense (almost all objectIDs in the full data set have a non null rating) and the magnitude (rating) of the attributes values are important, this is a good strategy.

I.e. all users have rated all music artists and they all agree on the same scale.


Use this strategy if the data is dense but the ratings are subject to grade-inflation.

I.e. if user A have rated all artists between 2-4 and user B have rated artists between 4-5 this strategy tries to compensate for the fact that the user A’s rating of 2 is equal to Users B’s 4.


This is the strategy to pick if the data is sparse.

I.e. If there is a list with ten thousand artists, it quite likely that the users only listened and rated a few of them.


There is a provided helper class for recommending articles that are similar to another article. The implementation is quite stupid, but it should give you a hint on how to expand this library with your own datasets.


$articleData = new \stojg\recommend\ArticleData();
$allArticles = getFromDatabase();
foreach($allArticles as $article) {
       $articleData->push($article->id, $article->content);
  $recommendedArticle = $articleData->recommend($articleID = 4);