richjenks / stats
Statistics library for non-statistical people
Installs: 108 040
Dependents: 1
Suggesters: 0
Security: 0
Stars: 23
Watchers: 2
Forks: 3
Open Issues: 1
Requires
- php: >=7.2.0
This package is auto-updated.
Last update: 2024-10-29 05:47:23 UTC
README
Statistics library for non-statistical people
Introduction
If you're into statistics then PHP will not be your language of choice (try R instead) but if for any reason you, a non-statistician, need to do some stats then this library aims to provide a simple set of methods for common statistical functions.
By design, with the exception of statistical tests, functions generally accept a single series of data at a time. This is to keep the library simple to use
Many of the methods in this library are available from the Statistics Extension, however this is not included in PHP by default. If possible, I'd recommend using this extension rather than my stats library.
Installation
- Install with Composer:
composer require richjenks/stats
- Include autoloader:
require 'vendor/autoload.php';
- All static methods are available from the
RichJenks\Stats\Stats
class
Quickstart
<?php require 'vendor/autoload.php'; use RichJenks\Stats\Stats; echo Stats::mean([1, 2, 3]); // 2
Stats will generally return either a
float
or anarray
, whichever is most appropriate for the function
Usage
Mean/Average
Calculates the mean/average of given data:
Stats::mean([1, 2, 3]); // 2 Stats::mean([15, 1000, 68.5, 9]); // 273.125
The
average
function aliasesmean
, e.g.Stats::average([1, 2, 3]);
also returns2
Median
Calculates the median (middle value) of given data:
Stats::median([1, 2, 3, 4]); // 2.5 Stats::median([3.141, 1.618, 1.234]); // 1.618
Mode
Calculates the mode(s) — most common value(s) — of given data:
Stats::mode([1, 2, 2, 3]); // [2] `Stats::mode([1, 2, 2, 3, 3]); // [2, 3]
This function always return an array because it is able to handle multi-modal data and an empty array would mean there is no mode
Frequencies
Constructs a sorted array of frequencies for each value in a series:
Stats::frequencies([1, 2, 3]); // [ // 1 => 1, // 2 => 1, // 3 => 1, // ] Stats::frequencies([10, 20, 20]); // [ // 20 => 2, // 10 => 1, // ]
Range
Determines the range (highest minus lowest) of given data:
Stats::range([1, 9]); // 8 Stats::range([-41, 1.61803]); // 42.61803
Variance & Standard Deviation
These functions calculate:
- Variance: square of average variation from the mean
- Standard Deviation: average variation from the mean (square root of Variance)
$data = [1, 2, 3, 4, 5]; Stats::variance($data); // 2.5 Stats::sd($data); // 1.5811388301
Individual Deviations
The deviations
function is also available if you require the deviations for each individual value, for example:
Stats::deviations([1, 2, 3, 4, 5]); // [ // 1 => 4, // 2 => 1, // 3 => 0, // 4 => 1, // 5 => 4, // ] Stats::deviations([42, 75, 101, 22.5, 18]); // [ // 42 => 94.09, // 75 => 542.89, // 101 => 2430.49, // 22.5 => 852.64, // 18 => 1135.69, // ]
Sample or Population
Sample
is the default mode for Variance and Standard Deviation but if you're unsure of the effect this decision has on your data then you probably don't need it and can skip this section.
Definitions
Population Every subject applicable, e.g. people who wear glasses or non-extinct species of frog
Sample The subset of subjects for which data is available, e.g. 100 glass-wearing subjects or a dozen species of frog
You can optionally pass the constants Stats::Sample
or Stats::POPULATION
as second parameters to determine whether your data is for a sample or a whole population:
$data = [1, 2, 3, 4, 5]; Stats::variance($data, Stats::POPULATION); // 2 Stats::sd($data, Stats::POPULATION); // 1.4142135624
Standard Error of the Mean
Estimates how well the sample mean approximates the population mean:
Stats::sem([1, 2, 3, 4, 5]); // 0.70710678118655
Quartiles, Interquartile Range & Outliers
These functions calculate the data required to construct a Box Plot which, when you understand what each data point means, is a concise way of displaying and comparing data sets.
Quartiles
Calculates Quartiles 0—4, where:
- 0 is the lowest data point
- 1 is Q¹
- 2 is Q² (the median)
- 3 is Q³
- 4 is the highest data point
Stats::quartiles([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]); // [ // 0 => 1, // 1 => 3.5, // 2 => 6.5, // 3 => 9.5, // 4 => 12, // ] Stats::quartiles([839, 560, 607, 828, 875, 805, 646, 450, 930, 443]) // [ // 0 => 443, // 1 => 560, // 2 => 725.5, // 3 => 839, // 4 => 930, // ]
Interquartile Range
Calculates the range between Q¹ and Q³ (the middle 50% of data):
Stats::iqr([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]); // 6 Stats::iqr([839, 560, 607, 828, 875, 805, 646, 450, 930, 443]) // 279
Outliers
Determines which values in a series are outliers (too far from the other values so sometimes omitted from the data set, possibly due to experimental error):
Stats::outliers([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]); // [] Stats::outliers([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 999]) // [999]
Inliers
Determines which values in a series are not outliers, i.e. removes outliers:
Stats::inliers([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 999]) // [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Whiskers
Determines the lower and upper limit for identifying outliers:
Stats::whiskers([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 999]) // ['lower' => -6, 'upper' => 18]
Percentiles
All percentile functions accept an optional additional parameter for rounding that works as follows:
- If omitted, percentages are rounded to the nearest whole
- If a positive integer, percentages are rounded to that many decimal places
- If a negative integer (e.g.
-1
), percentages are not rounded
All Percentiles
Determines the percentile of each value:
// Closest Rank Stats::percentiles([15, 20, 35, 40, 50]); // [ // 15 => 0, // 20 => 14, // 35 => 57, // 40 => 71, // 50 => 100, // ]
Single Percentile
Determines the value closest to the given percentile:
Stats::percentile([15, 20, 35, 40, 50], 75); // [ // 'value' => 40, // 'percentile' => 71, // ]
Intra-Percentile
Determines the values that fall in the given percentile, i.e. the lowest x% of all values:
Stats::intrapercentile([15, 20, 35, 40, 50], 60); // [ // 15 => 0, // 20 => 14, // 35 => 57, // ]
CLI
CLI usage is supported via the included scli
(Stats Command Line Interface) file and simply expects the name of the required method followed by its arguments:
./scli mean 1 2 3 # 2 ./scli inliers 1 2 3 4 5 999 # 1,2,3,4,5
In cases where the result is a set (i.e. an array) it is presented as comma-separated
Unit Tests
phpunit --bootstrap Stats.php tests/StatsTest