cylab-be / wowa-training
Requires
- php: >=7.4
- cylab-be/php-roc: ^1.1.0
- psr/log: ^1.0
- webd/aggregation: *
Requires (Dev)
- monolog/monolog: ^1.23
- phpstan/phpstan: ^0.12.5
- phpunit/phpunit: ^7
- squizlabs/php_codesniffer: ^3.3
This package is auto-updated.
Last update: 2024-10-27 19:43:03 UTC
README
The WOWA operator (Torra) is a powerfull aggregation operator that allows to combine multiple input values into a single score. This is particulary interesting for detection and ranking systems that rely on multiple heuristics. The system can use WOWA to produce a single meaningfull score.
A PHP implementation of WOWA is available at https://github.com/tdebatty/php-aggregation-operators
The WOWA operator requires two sets of parameters: p weights and w weights. In this project we use a genetic algorithm to compute the best values for p and w weights. For the training, the algorithm uses a dataset of input vectors together with the expected aggregated score of each vector.
Installation
composer require cylab-be/wowa-training
Usage
Example
require __DIR__ . "/vendor/autoload.php"
use RUCD\Training\Trainer;
use RUCD\Training\TrainerParameters;
use Monolog\Logger;
use Monolog\Handler\StreamHandler;
$populationSize = 100;
$crossoverRate = 60;
$mutationRate = 3;
$selectionMethod = TrainerParemeters::SELECTION_METHOD_RWS;
$maxGeneration = 100;
$populationInitializationMethod = TrainerParameters::INITIAL_POPULATION_GENERATION_RANDOM;
// For logging you can use any implementation of PSR Logger
$logger = new Logger('wowa-training-test');
$logger->pushHandler(new StreamHandler('php://stdout', Logger::DEBUG));
$parameters = new TrainerParameters(
$logger, $populationSize, $crossoverRate, $mutationRate, $selectionMethod, $maxGeneration, $populationInitilizationMethod);
$trainer = new Trainer($parameters);
// Input data
$data = [
[0.1, 0.2, 0.3, 0.4],
[0.1, 0.8, 0.3, 0.4],
[0.2, 0.6, 0.3, 0.4],
[0.1, 0.2, 0.5, 0.8],
[0.5, 0.1, 0.2, 0.3],
[0.1, 0.1, 0.1, 0.1],
];
// expected aggregated value for each data vector
$expected = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6];
var_dump($trainer->run($data, $expected));
The example above will produce something like:
class RUCD\Training\Solution#56 (3) {
public $weights_w =>
array(4) {
[0] =>
double(0.31568310640557)
[1] =>
double(0.37517587135019)
[2] =>
double(0.23165073663557)
[3] =>
double(0.077490285608666)
}
public $weights_p =>
array(4) {
[0] =>
double(0.67852325915809)
[1] =>
double(0.0083157109614166)
[2] =>
double(0.082353710617992)
[3] =>
double(0.2308073192625)
}
public $distance =>
double(0.51636277259465)
}
The run method returns a solution object, consisting of p weights and w weights to use with the WOWA operator, plus the total distance between the expected aggregated values that were given as parameter, and the aggregated values computed by WOWA using these weights.
Parameters description
- populationSize : size of the population in the algorithm. Suggested value : 100
- crossoverRate : defines the percentage of population generated by crossover. Must be between 1 and 100. Suggested value : 60
- mutationRate : define the number of random element change in the population. Must be between 1 and 100. Suggested value : 15
- selectionMethod : Determine the method used to select element in the population (for generate the next generation). SELECTION_METHOD_RWS for Roulette Wheel Selection and SELECTION_METHOD_TOS for Tournament Selection.
- maxGeneration : Determine the maximum number of iteration of the algorithm.
- populationInitilizationMethod: Determine the method used to generate the initial population. INITIAL_POPULATION_GENERATION_RANDOM for a random generation, INITIAL_POPULATION_GENERATION_QUASI_RANDOM for a "quasi"-random generation.
- solutionType: Specify the class of the Solution object. The solution objects must extend the SolutionAbstract class. The class of Solution object defines the criterion used to evaluate the performance of individuals in the population. SolutionDistance impmements a distance criterion while SolutionAUC uses a criterion based on the Area Under the Curve (AUC) computation. Note that SolutionAUC is designed for binary classification, the expected vector can only contains 0 or 1 values.
Cross validation
Example
require __DIR__ . "/vendor/autoload.php";
use RUCD\Training\Trainer;
use RUCD\Training\TrainerParameters;
use RUCD\Training\SolutionDistance;
use RUCD\Training\SolutionAUC;
use Monolog\Logger;
use Monolog\Handler\StreamHandler;
$populationSize = 100;
$crossoverRate = 60;
$mutationRate = 3;
$selectionMethod = TrainerParameters::SELECTION_METHOD_RWS;
$maxGeneration = 100;
$populationInitializationMethod = TrainerParameters::INITIAL_POPULATION_GENERATION_RANDOM;
$solutionType = new SolutionDistance();
// For logging you can use any implementation of PSR Logger
$logger = new Logger('wowa-training-test');
$logger->pushHandler(new StreamHandler('php://stdout', Logger::WARNING));
$parameters = new TrainerParameters(
$logger, $populationSize, $crossoverRate, $mutationRate, $selectionMethod, $maxGeneration, $populationInitializationMethod);
$trainer = new Trainer($parameters, $solutionType);
// Input data
$data = [
[0.1, 0.2, 0.3, 0.4],
[0.1, 0.8, 0.3, 0.4],
[0.2, 0.6, 0.3, 0.4],
[0.1, 0.2, 0.5, 0.8],
[0.5, 0.1, 0.2, 0.3],
[0.1, 0.1, 0.1, 0.1],
[0.1, 0.2, 0.3, 0.4],
[0.1, 0.8, 0.3, 0.4],
[0.2, 0.6, 0.3, 0.4],
[0.1, 0.2, 0.5, 0.8],
[0.5, 0.1, 0.2, 0.3],
[0.1, 0.1, 0.1, 0.1],
];
// expected aggregated value for each data vector
$expected = [1,0,0,1,0,1,0,0,0,1,0,0];
var_dump($trainer->runKFold($data, $expected, 3));
The method runKFold runs a k folds cross-validation. Concretely, it separates the dataset in k folds. For each folds, a single fold is retained as the validation data for testing the model, and the remaining k − 1 folds are used as training data. The cross-validation process is then repeated k times, with each of the k folds used exactly once as the validation data. The k results can then be averaged to produce a single estimation. For each tested fold, the Area Under the Curve is also computed to evaluate the classification efficiency (works only expected vector that contains 0 and 1).
As output, the method generates an array that contains the w and p vectors and the AUC value for each fold.
The example above produces result similar to:
array(3) {
[0]=>
array(2) {
["auc"]=>
float(0.5)
["solution"]=>
object(RUCD\Training\SolutionDistance)#133 (3) {
["weights_w"]=>
array(4) {
[0]=>
float(0.16573697533351)
[1]=>
float(0.76165292950897)
[2]=>
float(0.024253730247718)
[3]=>
float(0.048356364909798)
}
["weights_p"]=>
array(4) {
[0]=>
float(0.20097150002833)
[1]=>
float(0.020364990979043)
[2]=>
float(0.17636230606784)
[3]=>
float(0.60230120292479)
}
["distance"]=>
float(1.7892117370011)
}
}
[1]=>
array(2) {
["auc"]=>
float(0)
["solution"]=>
object(RUCD\Training\SolutionDistance)#146 (3) {
["weights_w"]=>
array(4) {
[0]=>
float(0.18742088232865)
[1]=>
float(0.57233147854378)
[2]=>
float(0.22507083815429)
[3]=>
float(0.015176800973267)
}
["weights_p"]=>
array(4) {
[0]=>
float(0.076670559592882)
[1]=>
float(0.019193144442706)
[2]=>
float(0.18316950831007)
[3]=>
float(0.72096678765435)
}
["distance"]=>
float(1.3403524893715)
}
}
[2]=>
array(2) {
["auc"]=>
float(1)
["solution"]=>
object(RUCD\Training\SolutionDistance)#12 (3) {
["weights_w"]=>
array(4) {
[0]=>
float(0.16274887804484)
[1]=>
float(0.527446888854)
[2]=>
float(0.21225455965351)
[3]=>
float(0.097549673447646)
}
["weights_p"]=>
array(4) {
[0]=>
float(0.10891441031576)
[1]=>
float(0.023649196569852)
[2]=>
float(0.24106562811561)
[3]=>
float(0.62637076499877)
}
["distance"]=>
float(2.0314776184856)
}
}
}