rizanola/draconic

A drop-in package for full-text searches. Supports typo correction and word stemming.

1.1.1 2024-01-15 00:00 UTC

This package is auto-updated.

Last update: 2024-04-15 00:35:30 UTC


README

Draconic is a simple and reasonably lightweight full-text search system for websites. The only extension it requires is sqlite, which is included in PHP by default.

Installation

Draconic is available via Composer:

composer require rizanola/draconic

Usage

Usage is fairly straightforward:

<?php
use Rizanola\Draconic\Draconic;
use Rizanola\Draconic\Entry;
use Rizanola\Draconic\Section;

// Create a new entry to track
$entry = new Entry
(
    // This is the ID, it can be a string, int or float. This is used to uniquely identify the entry
    "test-entry", 
    
     // This is the entry type, results can be filtered by type e.g. you might have a "product" or an "article"
    "test",
    
    // Sections consist of a heading, a priority and an optional label. Sections with a higher priority are weighted
    // higher in search results, so a query that matches the title of one article and the content of another can display
    // the article with the matched title higher in the search results
    new Section("Test Heading", 2, "heading"), 
    new Section("Test content", 1, "content")
);

// Make a new draconic object
$draconic = new Draconic("/path/to/store/the/sqlite/database.db");

// Insert the new entry. If an entry with that ID already exists in the database, it will be replaced with the new
// entry.
$draconic->addOrUpdateEntries([$entry]);

// Do a search. Draconic will automatically manage typos and stemming. Draconic also supports quoted words, for exact
// matches. The second parameter is the type, which you can use to filter for just one type of entry.
$results = $draconic->search('"test" content', "test");

// Remove the entry from the database
$draconic->removeEntries([$entry->id]);

Notes

Draconic is designed to extract words from plain text. If you're inserting HTML, consider stripping tags and decoding entities first. e.g. html_entity_decode(strip_tags($content)).

Draconic detects typos by removing individual characters from each word in the inserted content and the searched query. This allows us to catch the four kinds of typo:

Additional characters : If the user types gfram and the content contains gram, then one of the variants of the search query word will be gram.

Missing characters : If the user types gam and the content contains gram, then one of the variants of the content word will be gram.

Substituted characters : If the user types fram and the content contains gram, then one of the variants of the search query word will be ram, and one of the variants of the content word will also be ram.

Transposed characters : If the user types garm and the content contains gram, then one of the variants of the search query word will be gam, and one of the variants of the content word will also be gam.

Draconic supports quoted words, excluded words and alternate words:

  • "test search": Results for this query must contain the words "test" and "search" in that order, and spelled exactly the same way.
  • test -search: Results for this query must contain "test", and must not contain "search".
  • test|search: Results for this query must contain either "test" or "search".

Customisation

Draconic uses its own logic to filter and sort results, but sometimes you need something a little bit more custom:

Metadata

You can add metadata to an Entry, which can be used for custom filtering and sorting:

<?php
use Rizanola\Draconic\Entry;
use Rizanola\Draconic\Section;

$entry = new Entry("test-entry", null, 
    new Section("test-section")
);

// Metadata is stored as a JSON object, so setMetadata() accepts most values for the second argument
$entry->setMetadata("important", true);

Filtering

The default functionality for filtering is to filter out results which don't contain all the searched words, results that don't contain all the subphrases, or results that contain any excluded words. You may, however, wish to write your own filter logic:

<?php
use Rizanola\Draconic\Draconic;
use Rizanola\Draconic\Matching\Result;

$draconic = new Draconic(":memory:");

// Add a filter that will hide results that aren't important
$draconic->filterCallable = function(array $words, Result $result) use($draconic): bool
{
    // If a result isn't important, return false
    if(!$result->metadata->important) return false;
    
    // Otherwise, use Draconic's native filtering to filter out poor matches
    return $draconic->filterResult($words, $result);
};

Sorting

The default functionality for sorting is to sort first by how close the matched words are together, then by the priority of the sections that the words are found in. You may, however, wish to write your own sorting logic:

<?php
use Rizanola\Draconic\Draconic;
use Rizanola\Draconic\Matching\Result;

$draconic = new Draconic(":memory:");

// Add a filter that will display important results first
$draconic->sortCallable = function(array $words, Result $first, Result $second) use($draconic): int
{
    // If one result is more important, then that result should come first
    $importanceComparison = $second->metadata->important <=> $first->metadata->important;
    if($importanceComparison !== 0) return $importanceComparison;
    
    // Otherwise, use Draconic's default sorting to sort matches
    return $draconic->sortResults($words, $first, $second);
};