php-science/pagerank

OOP Pagerank implementation with strict type.

1.0.1 2020-09-20 13:56 UTC

This package is auto-updated.

Last update: 2021-07-24 20:13:19 UTC


README

badge.svg 68747470733a2f2f636f6465636f762e696f2f67682f5048502d536369656e63652f5061676552616e6b2f6272616e63682f6d61737465722f67726170682f62616467652e737667 68747470733a2f2f6170692e636f6465636c696d6174652e636f6d2f76312f6261646765732f34386131306462323634366164346365383930632f6d61696e7461696e6162696c697479 68747470733a2f2f7363727574696e697a65722d63692e636f6d2f672f5048502d536369656e63652f5061676552616e6b2f6261646765732f7175616c6974792d73636f72652e706e673f623d6d6173746572 68747470733a2f2f706f7365722e707567782e6f72672f7068702d736369656e63652f7061676572616e6b2f762f737461626c652e737667

This source code is an OOP implementation of the PageRank algorithm.

About

This implementation is based on Larry Page's PageRank algorithm. It weights the connections between the nodes in a graph. In theory, the nodes can be websites, words, people, etc. As the number of the nodes are increasing the algorithm is becoming slower. To scale the size and handle millions of nodes, the accuracy of the calculation can be limited, and the long-running calculation can be scheduled in batches using the Strategy OOP pattern in the implementation.

Workflow

  • It calculates the initial ranking values. At the first iteration, all the nodes have the same rank.
  • Iterates the nodes using the power method/iteration technique over and over again until it reaches the maximum iteration number.
  • However, the iteration stops when the ranks are accurate enough even if the max iteration didn't reach its limit.
  • The accuracy measured by the float epsilon constant.
  • At the end, the algorithm normalizes the ranks between 0 and 1 and then scale them between 1 and 10. The scaling range is configurable.
  • Getting, setting, updating the nodes from the resource is a responsibility of the NodeDataSourceStrategyInterface.
  • The package provides a simple implementation of the NodeDataSourceStrategyInterface that only keeps the nodes in the memory. Another way of implementing the NodeDataSourceStrategyInterface could be a simple class that uses an ORM to handle the node collection.

Install

composer require php-science/pagerank

Example

$dataSource = $this->getYourDataSource();

$nodeBuilder = new NodeBuilder();
$nodeCollectionBuilder = new NodeCollectionBuilder();
$strategy = new MemorySourceStrategy(
    $nodeBuilder,
    $nodeCollectionBuilder,
    $dataSource
);

$rankComparator = new RankComparator();
$ranking = new Ranking(
    $rankComparator,
    $strategy
);

$normalizer = new Normalizer();

$pageRankAlgorithm = new PageRankAlgorithm(
    $ranking,
    $strategy,
    $normalizer
);

$maxIteration = 100;
$nodeCollection = $pageRankAlgorithm->run($maxIteration);

var_dump($nodeCollection->getNodes());

Functional Sample