scienide/okbloomer

An autoscaling Bloom filter with ultra-low memory usage for PHP.

1.0.0 2021-12-08 21:44 UTC

This package is auto-updated.

Last update: 2022-01-08 22:01:21 UTC


README

An autoscaling Bloom filter with ultra-low memory footprint for PHP. Ok Bloomer employs a layered filtering strategy that allows it to expand while maintaining an upper bound on the false positive rate. Each layer is comprised of a bitmap that remembers the hash signatures of the items inserted so far. If an item gets caught in the filter, then it has probably been seen before. However, if an item passes through the filter, then it definitely has never been seen before. Bloom filters find uses in caching systems, stream deduplication, DNA sequence counting, and many more.

  • Ultra-low memory footprint
  • Autoscaling works on streaming data
  • Bounded maximum false positive rate
  • Open-source and free to use commercially

Installation

Install into your project using Composer:

$ composer require scienide/okbloomer

Requirements

  • PHP 7.4 or above

Bloom Filter

A probabilistic data structure that estimates the prior occurrence of a given item with a maximum false positive rate.

Parameters

# Name Default Type Description
1 maxFalsePositiveRate 0.01 float The false positive rate to remain below.
2 numHashes 4 int, null The number of hash functions used, i.e. the number of slices per layer. Set to null for auto.
3 layerSize 32000000 int The size of each layer of the filter in bits.

Example

use OkBloomer\BloomFilter;

$filter = new BloomFilter(0.01, 4, 32000000);

$filter->insert('foo');

echo $filter->exists('foo');

echo $filter->existsOrInsert('bar');

echo $filter->exists('bar');
true 

false

true

Testing

To run the unit tests:

$ composer test

Static Analysis

To run static code analysis:

$ composer analyze

Benchmarks

To run the benchmarks:

$ composer benchmark

References

  • [1] P. S. Almeida et al. (2007). Scalable Bloom Filters.