coral-media / ext-ir
Information retrieval primitives for PHP
Package info
Language:C
Type:php-ext
Ext name:ext-ir
pkg:composer/coral-media/ext-ir
v0.2.0
2026-03-10 15:04 UTC
Requires
- php: >=8.1
README
Information retrieval primitives for PHP.
Build requirements:
- System OpenBLAS is required.
- Linux/macOS: install OpenBLAS development package (for example
libopenblas-dev). - Windows: set
OPENBLAS_ROOTto a prefix containingincludeandlib\\openblas.lib. - Snowball is vendored under
lib/libstemmerand built as part of the extension.
Install with PIE:
# from repository root cd ext-ir pie install -j"$(nproc)" # from packagist pie install coral-media/ext-ir
If OpenBLAS is installed in a non-default location:
cd ext-ir OPENBLAS_PREFIX=/custom/prefix pie install -j"$(nproc)"
Build from source (manual):
cd ext-ir phpize ./configure --with-php-config="$(command -v php-config)" make -j"$(nproc)" make test sudo make install
Enable extension:
extension=ir
Windows notes:
- Ensure OpenBLAS is installed and
OPENBLAS_ROOTpoints to the prefix. - Build with the same PHP toolchain version/arch/thread-safety as target PHP.
- Prebuilt libraries availables under Releases section.
Current scaffold (0.1.0) includes:
ir_version()CoralMedia\IR\LinearAlgebra::{dot,normL2}()CoralMedia\IR\Text::{tokenize,stem}()CoralMedia\IR\Vectorizer::{frequency,vocabulary,fit,transform,fitTransform,tfIdf}()CoralMedia\IR\Similarity::{pearson,cosine,euclidean,nearest,topK}()
Usage example:
<?php $items = [ "The quick brown fox jumps", "A quick fox is running", "Neural search with sparse vectors", ]; $tokenized = CoralMedia\IR\Text::tokenize( $items, pattern: '/\s+/u', lowercase: true, stripDiacritics: true, stem: true, language: 'english', stopwords: ['the', 'a', 'is', 'with'] ); $model = CoralMedia\IR\Vectorizer::fit($tokenized); $matrix = CoralMedia\IR\Vectorizer::transform($tokenized, $model); $query = $matrix[0]; $best = CoralMedia\IR\Similarity::nearest($query, $matrix, 'cosine'); $top2 = CoralMedia\IR\Similarity::topK($query, $matrix, 2, 'cosine');
Next planned steps:
- add ranking helpers (for example bm25)
- add packed-vector paths for dense workloads
- expand test coverage for edge cases and large inputs