andrewdalpino/okbloomer

An autoscaling Bloom filter with ultra-low memory usage for PHP.

1.0.0 2022-01-24 03:41 UTC

This package is auto-updated.

Last update: 2024-04-05 04:16:00 UTC


README

An autoscaling Bloom filter with ultra-low memory footprint for PHP. Ok Bloomer employs a novel layered filtering strategy that allows it to expand while maintaining an upper bound on the false positive rate. Each layer is comprised of a bitmap that remembers the hash signatures of the items inserted so far. If an item gets caught in the filter, then it has probably been seen before. However, if an item passes through the filter, then it definitely has never been seen before.

  • Ultra-low memory footprint
  • Autoscaling works on streaming data
  • Bounded maximum false positive rate
  • Open-source and free to use commercially

Installation

Install into your project using Composer:

$ composer require andrewdalpino/okbloomer

Requirements

  • PHP 7.4 or above

Bloom Filter

A probabilistic data structure that estimates the prior occurrence of a given item with a maximum false positive rate.

Parameters

# Name Default Type Description
1 maxFalsePositiveRate 0.01 float The false positive rate to remain below.
2 numHashes 4 int, null The number of hash functions used, i.e. the number of slices per layer. Set to null for auto.
3 layerSize 32000000 int The size of each layer of the filter in bits.
4 hashFn callable 'crc32' The hash function that accepts a string token and returns an integer.

Example

use OkBloomer\BloomFilter;

$filter = new BloomFilter(0.01, 4, 32000000);

$filter->insert('foo');

echo $filter->exists('foo');

echo $filter->existsOrInsert('bar');

echo $filter->exists('bar');
true 

false

true

Testing

To run the unit tests:

$ composer test

Static Analysis

To run static code analysis:

$ composer analyze

Benchmarks

To run the benchmarks:

$ composer benchmark

References

  • [1] P. S. Almeida et al. (2007). Scalable Bloom Filters.