php-science/pagerank

OOP Pagerank implementation with strict type.

1.0.1 2020-09-20 13:56 UTC

This package is auto-updated.

Last update: 2024-07-25 03:31:41 UTC


README

PageRank

This source code is an OOP implementation of the PageRank algorithm.

About

This implementation is based on Larry Page's PageRank algorithm. It weights the connections between the nodes in a graph. In theory, the nodes can be websites, words, people, etc. As the number of the nodes are increasing the algorithm is becoming slower. To scale the size and handle millions of nodes, the accuracy of the calculation can be limited, and the long-running calculation can be scheduled in batches using the Strategy OOP pattern in the implementation.

Workflow

  • It calculates the initial ranking values. At the first iteration, all the nodes have the same rank.
  • Iterates the nodes using the power method/iteration technique over and over again until it reaches the maximum iteration number.
  • However, the iteration stops when the ranks are accurate enough even if the max iteration didn't reach its limit.
  • The accuracy measured by the float epsilon constant.
  • At the end, the algorithm normalizes the ranks between 0 and 1 and then scale them between 1 and 10. The scaling range is configurable.
  • Getting, setting, updating the nodes from the resource is a responsibility of the NodeDataSourceStrategyInterface.
  • The package provides a simple implementation of the NodeDataSourceStrategyInterface that only keeps the nodes in the memory. Another way of implementing the NodeDataSourceStrategyInterface could be a simple class that uses an ORM to handle the node collection.

Install

composer require php-science/pagerank

Example

$dataSource = $this->getYourDataSource();

$nodeBuilder = new NodeBuilder();
$nodeCollectionBuilder = new NodeCollectionBuilder();
$strategy = new MemorySourceStrategy(
    $nodeBuilder,
    $nodeCollectionBuilder,
    $dataSource
);

$rankComparator = new RankComparator();
$ranking = new Ranking(
    $rankComparator,
    $strategy
);

$normalizer = new Normalizer();

$pageRankAlgorithm = new PageRankAlgorithm(
    $ranking,
    $strategy,
    $normalizer
);

$maxIteration = 100;
$nodeCollection = $pageRankAlgorithm->run($maxIteration);

var_dump($nodeCollection->getNodes());

Functional Sample