jotaelesalinas / php-mapreduce
A local implementation of the map-reduce strategy in PHP
Requires
- php: >=8.0
Requires (Dev)
- phpunit/phpunit: ^9.5
- squizlabs/php_codesniffer: ^3.7
This package is auto-updated.
Last update: 2024-12-12 19:07:41 UTC
README
PHP PSR-4 compliant library to easily do non-distributed local map-reduce.
Install
Via Composer
$ composer require jotaelesalinas/php-mapreduce
Basic usage
require_once __DIR__ . '/vendor/autoload.php'; $source = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]; $mapper = fn($item) => $item * 2; $reducer = fn($carry, $item) => ($carry ?? 0) + $item; $result = MapReduce\MapReduce::create() ->setInput($source) ->setMapper($mapper) ->setReducer($reducer) ->run(); print_r($result);
The output is:
Array
(
[0] => 110
)
Filters
$odd_numbers = fn($item) => $item % 2 === 0; $greater_than_10 = fn($item) => $item > 10; $result = MapReduce\MapReduce::create([ "input" => $source, "mapper" => $mapper, "reducer" => $reducer, ]) // only odd numbers are passed to the mapper function ->setPreFilter($odd_numbers) // only numbers greater than 10 are passed to the reducer function ->setPostFilter($greater_than_10) ->run(); print_r($result);
The output is:
Array
(
[0] => 48
)
Groups
Group by the value of a field (valid for arrays and objects):
$source = [ [ "first_name" => "Susanna", "last_name" => "Connor", "member" => "y", "age" => 20], [ "first_name" => "Adrian", "last_name" => "Smith", "member" => "n", "age" => 22], [ "first_name" => "Mike", "last_name" => "Mendoza", "member" => "n", "age" => 24], [ "first_name" => "Linda", "last_name" => "Duguin", "member" => "y", "age" => 26], [ "first_name" => "Bob", "last_name" => "Svenson", "member" => "n", "age" => 28], [ "first_name" => "Nancy", "last_name" => "Potier", "member" => "y", "age" => 30], [ "first_name" => "Pete", "last_name" => "Adams", "member" => "n", "age" => 32], [ "first_name" => "Susana", "last_name" => "Zommers", "member" => "y", "age" => 34], [ "first_name" => "Adrian", "last_name" => "Deville", "member" => "n", "age" => 36], [ "first_name" => "Mike", "last_name" => "Cole", "member" => "n", "age" => 38], [ "first_name" => "Mike", "last_name" => "Angus", "member" => "n", "age" => 40], ]; // mapper does nothing $mapper = fn($x) => $x; // number of persons and sum of ages $reduceAgeSum = function ($carry, $item) { if (is_null($carry)) { return [ 'count' => 1, 'age_sum' => $item['age'], ]; } $count = $carry['count'] + 1; $age_sum = $carry['age_sum'] + $item['age']; return compact('count', 'age_sum'); }; $result = MapReduce\MapReduce::create([ "input" => $source, "mapper" => $mapper, "reducer" => $reduceAgeSum, ]) // group by field 'member' ->setGroupBy('member') ->run(); print_r($result);
The output is:
Array
(
[y] => Array
(
[count] => 4
[age_sum] => 110
)
[n] => Array
(
[count] => 7
[age_sum] => 220
)
)
Group by a custom value generated from each item:
$closestTen = fn($x) => floor($x['age'] / 10) * 10; $result = MapReduce\MapReduce::create([ "input" => $source, "mapper" => $mapper, "reducer" => $reduceAgeSum, ]) // group by age ranges of 10 ->setGroupBy($closestTen) ->run(); print_r($result);
The output is:
Array
(
[20] => Array
(
[count] => 5
[age_sum] => 120
)
[30] => Array
(
[count] => 5
[age_sum] => 170
)
[40] => Array
(
[count] => 1
[age_sum] => 40
)
)
Input
MapReduce
accepts as input any data of type iterable
. That means, arrays and traversables, e.g. generators.
This is very handy when reading from big files that do not fit in memory.
$result = MapReduce\MapReduce::create([ "mapper" => $mapper, "reducer" => $reducer, ]) ->setInput(csvReadGenerator('myfile.csv')) ->run();
Multiple inputs can be specified, passing several arguments to setInput()
, as long as all of them are iterable:
$result = MapReduce\MapReduce::create([ "mapper" => $mapper, "reducer" => $reducer, ]) ->setInput($arrayData, csvReadGenerator('myfile.csv')) ->run();
Output
MapReduce
can be configured to write the final data to one or more destinations.
Each destination has to be a Generator
:
$result = MapReduce\MapReduce::create([ "mapper" => $mapper, "reducer" => $reducer, ]) ->setOutput(csvWriteGenerator('results.csv')) ->run();
Multiple outputs can be specified as well:
$result = MapReduce\MapReduce::create([ "mapper" => $mapper, "reducer" => $reducer, ]) ->setOutput(csvWriteGenerator('results.csv'), consoleGenerator()) ->run();
To help working with input and output generators, it is recommended to use the package jotaelesalinas/php-generators
, but it is not mandatory.
You can see more elaborated examples under the folder examples.
Change log
Please see CHANGELOG for more information what has changed recently.
Testing
$ composer test
Contributing
Please see CONTRIBUTING and CONDUCT for details.
Security
If you discover any security related issues, please DM me to @jotaelesalinas instead of using the issue tracker.
To do
- Add events to help see progress in large batches
- Add docs
- Insurance example
- adapt to new library
- add insured values
- improve kml output (info, markers)
Credits
License
The MIT License (MIT). Please see License File for more information.