coral-media/ext-ir

Information retrieval primitives for PHP

Maintainers

Package info

github.com/coral-media/ext-ir

Language:C

Type:php-ext

Ext name:ext-ir

pkg:composer/coral-media/ext-ir

Statistics

Installs: 2

Dependents: 0

Suggesters: 0

Stars: 0

Open Issues: 0

v0.2.0 2026-03-10 15:04 UTC

This package is auto-updated.

Last update: 2026-03-10 21:32:11 UTC


README

Information retrieval primitives for PHP.

Build requirements:

  • System OpenBLAS is required.
  • Linux/macOS: install OpenBLAS development package (for example libopenblas-dev).
  • Windows: set OPENBLAS_ROOT to a prefix containing include and lib\\openblas.lib.
  • Snowball is vendored under lib/libstemmer and built as part of the extension.

Install with PIE:

# from repository root
cd ext-ir
pie install -j"$(nproc)"
# from packagist
pie install coral-media/ext-ir

If OpenBLAS is installed in a non-default location:

cd ext-ir
OPENBLAS_PREFIX=/custom/prefix pie install -j"$(nproc)"

Build from source (manual):

cd ext-ir
phpize
./configure --with-php-config="$(command -v php-config)"
make -j"$(nproc)"
make test
sudo make install

Enable extension:

extension=ir

Windows notes:

  • Ensure OpenBLAS is installed and OPENBLAS_ROOT points to the prefix.
  • Build with the same PHP toolchain version/arch/thread-safety as target PHP.
  • Prebuilt libraries availables under Releases section.

Current scaffold (0.1.0) includes:

  • ir_version()
  • CoralMedia\IR\LinearAlgebra::{dot,normL2}()
  • CoralMedia\IR\Text::{tokenize,stem}()
  • CoralMedia\IR\Vectorizer::{frequency,vocabulary,fit,transform,fitTransform,tfIdf}()
  • CoralMedia\IR\Similarity::{pearson,cosine,euclidean,nearest,topK}()

Usage example:

<?php

$items = [
    "The quick brown fox jumps",
    "A quick fox is running",
    "Neural search with sparse vectors",
];

$tokenized = CoralMedia\IR\Text::tokenize(
    $items,
    pattern: '/\s+/u',
    lowercase: true,
    stripDiacritics: true,
    stem: true,
    language: 'english',
    stopwords: ['the', 'a', 'is', 'with']
);

$model = CoralMedia\IR\Vectorizer::fit($tokenized);
$matrix = CoralMedia\IR\Vectorizer::transform($tokenized, $model);

$query = $matrix[0];
$best = CoralMedia\IR\Similarity::nearest($query, $matrix, 'cosine');
$top2 = CoralMedia\IR\Similarity::topK($query, $matrix, 2, 'cosine');

Next planned steps:

  • add ranking helpers (for example bm25)
  • add packed-vector paths for dense workloads
  • expand test coverage for edge cases and large inputs