goodm4ven/arabicable

A unified Arabic language support package crafted in pure PHP with smooth Laravel integration

Maintainers

Package info

github.com/GoodM4ven/PACKAGE_LARAVEL_arabicable

Homepage

pkg:composer/goodm4ven/arabicable

Statistics

Installs: 11

Dependents: 0

Suggesters: 0

Stars: 0

Open Issues: 2


README

بسم الله الرحمن الرحيم

Arabicable

Practical Arabic text processing for Laravel, focused on fast and predictable Arabic search with database-backed searchable variants.

Latest Version on Packagist GitHub Tests Action Status Coverage Status Total Downloads

Description

Arabicable primarily stores Arabic field variants in dedicated database columns, so indexing and querying remain consistent and fast through normal Eloquent workflows.

For each Arabicable text column, the package maintains:

  • <column>
  • <column>_with_harakat
  • <column>_searchable
  • <column>_stemmed

And all in all, this package provides all of these features:

Anaylsis and Search

Al-Qur'an

Date

Numbers

Voice

Installation

  1. Install via Composer:
composer require goodm4ven/arabicable
  1. Run installer:
php artisan arabicable:install --seed

This publishes config + migrations, runs migration (with prompt unless --testing), and imports dictionaries when --seed is provided.

  1. Publish the assets:
php artisan vendor:publish --tag=arabicable-assets

Raw data resolution defaults are customizable if you wish:

  • arabicable.raw_data_path is auto-resolved from vendor/goodm4ven/arabicable/resources/raw-data, then resources/raw-data.
  • You can publish package raw datasets to your app with:
php artisan vendor:publish --tag="arabicable-raw-data" --force

Upgrading

Refresh published package files:

php artisan vendor:publish --tag="arabicable-config" --force
php artisan vendor:publish --tag="arabicable-migrations" --force
php artisan vendor:publish --tag="arabicable-raw-data" --force
php artisan migrate

If local dictionary sources changed:

php artisan arabicable:compile-data
php artisan arabicable:seed --all --truncate

Usage

Arabicable Model & Migration

  1. Add an Arabicable migration macro:

    use Illuminate\Database\Schema\Blueprint;
    use Illuminate\Support\Facades\Schema;
    
    Schema::create('notes', function (Blueprint $table): void {
        $table->id();
        $table->arabicText('content');
        $table->timestamps();
    });
  2. Use Arabicable trait on your model:

    use GoodMaven\Arabicable\Traits\Arabicable;
    use Illuminate\Database\Eloquent\Model;
    
    class Note extends Model
    {
        use Arabicable;
    
        protected $fillable = ['content'];
    }

Observer-managed columns are now updated automatically when content changes:

  • $note->content
  • $note->content_with_harakat
  • $note->content_searchable
  • $note->content_stemmed

If using Spatie Translatable, migration macros accept ?bool $isTranslatable:

$table->arabicText('content', isTranslatable: true);

Note

Arabicable trait also provides getSearchableTranslations() for flattened keys like content_searchable_ar, content_searchable_en, etc.

Search Scopes

Arabicable trait includes:

  • scopeSearchArabic
  • scopeWhereArabicLike
  • scopeSearchArabicComprehensive
  • scopeWhereArabicComprehensive
  • scopeOrderByArabicRelevance

Example:

Post::query()
    ->searchArabic('content', $query)
    ->limit(20)
    ->get();

Comprehensive mode builds terms from:

  • normalized query text
  • tokens
  • stop-word removal
  • stems
  • lexical variants (roots, stems, original_words, or all)

Text Processing Helpers

use GoodMaven\Arabicable\Facades\Arabic;
use GoodMaven\Arabicable\Facades\ArabicFilter;

$searchable = ArabicFilter::forSearch($text);
$stemmed = ArabicFilter::forStem($text);
$withoutHarakat = ArabicFilter::withoutHarakat($text);
$clean = Arabic::stripWeirdCharacters($text, keepHarakat: true, keepPunctuation: true);
$keywords = Arabic::extractKeywords($text);
$plan = Arabic::buildComprehensiveSearchPlan($query);
$variants = Arabic::expandWordVariants($tokens, mode: 'all', stripStopWords: true);

ArabicFilter Facade

  • withHarakat(string $text): string
  • withoutHarakat(string $text): string
  • withoutDiacritics(string $text, bool $keepShadda = false): string
  • forSearch(string $text): string
  • forStem(string $text): string
  • forMemorizationComparison(string $text, bool $stripCommons = true, bool $stripConnectors = true): string

Arabic Facade (Common Methods)

  • Harakat/diacritics: removeHarakat, removeDiacritics, addHarakat
  • Normalization: normalizeHuroof, normalizeNumeralsForSearch, stripWeirdCharacters
  • Keywords/search: tokenize, stemWord, stemWords, removeStopWords, extractKeywords, buildComprehensiveSearchPlan, expandWordVariants
  • Commons/cache: removeCommons, clearConceptCache
  • Punctuation/spacing: toTightPunctuationStyle, toLoosePunctuationStyle, removeAllPunctuationMarks, normalizeSpaces
  • Date conversion: gregorianToHijri, hijriToGregorian

Quran Indexing Tables

Enable Quran features in config when needed:

'features' => [
    'quran' => true,
],

After running migrations, Arabicable creates and imports:

  • quran_verses: one row per ayah with surah_number, ayah_number, ayah_index, text_uthmani, text_searchable, text_sanitized, text_without_harakat, text_without_diacritics, text_normalized_huroof.
  • quran_words: one row per word occurrence with verse_id, word_position, global_word_index, token_uthmani, token_sanitized, token_searchable, token_without_harakat, token_without_diacritics, token_normalized_huroof, token_stem, token_root, token_lemma.
  • quran_verse_explanations: ayah-linked tafsir/i'rab records from SQLite sources (source_key, content_kind, content_html, content_text).
  • quran_word_annotations: optional word-level notes/translation payloads linked to exact quran_words rows.

For repeat words, target the occurrence with:

  • verse_id + word_position for stable position inside ayah.
  • global_word_index for a single canonical word pointer across the full Quran.

This structure is ready for later tafsir/translation attachments at ayah level or exact word occurrence level.

Default source config keys:

  • arabicable.raw_data_path
  • arabicable.data_sources.quran_othmani_surahs_dir
  • arabicable.data_sources.quran_exegesis_databases_dir
  • arabicable.data_sources.quran_layout_databases_dir
  • arabicable.data_sources.quran_lexicon_databases_dir
  • arabicable.data_sources.quran_fonts_dir
  • arabicable.data_sources.quran_surah_headers_fonts_dir

For tafsir / i'rab SQLite data:

  • Put files like ar-tafsir-al-tabari.db and al-i-rab-al-muyassar.db in <raw_data_path>/quran/exegesis.
  • Keep required exegesis SQLite files inside <raw_data_path>/quran/exegesis (or your configured data source path).
  • Explanations are stored for display and retrieval, and are not part of Arabicable search indexing.

Gregorian/Hijri Date Helpers

use GoodMaven\Arabicable\Facades\Arabic;

$hijri = Arabic::gregorianToHijri(2025, 1, 1); // ['year' => ..., 'month' => ..., 'day' => ...]
$gregorian = Arabic::hijriToGregorian($hijri['year'], $hijri['month'], $hijri['day']);

JavaScript Number Helper

Published asset: public/vendor/arabicable/arabicable.js

window.ArabicableNumbers.toArabicIndic('123'); // "١٢٣"
window.ArabicableNumbers.toAscii('١٢٣'); // "123"
window.ArabicableNumbers.normalizeForBackendSearch('رقم ١٢٣', 'arabic'); // "رقم 123"
window.ArabicableNumbers.normalizeForBackendSearch('رقم 123', 'indian'); // "رقم ١٢٣"
window.ArabicableNumbers.normalizeForBackendSearch('123', 'both'); // "123 ١٢٣"

DigitalKhatt / Quran Font Setup

Arabicable can be paired with DigitalKhatt-style Quran rendering in your app frontend.

  1. Publish package assets (and optionally Quran raw-data files if you want app-local copies):
php artisan vendor:publish --tag=arabicable-assets --force
php artisan vendor:publish --tag=arabicable-raw-data --force
  1. Use the included Quran font file (published under public/vendor/arabicable/madina.woff2), or replace it with your preferred DigitalKhatt-compatible font build.

    Surah header fonts are also bundled and available in both locations:

    • resources/raw-data/quran/fonts/surah-headers/QCF_SurahHeader_COLOR-Regular.woff2
    • resources/raw-data/quran/fonts/surah-headers/surah-name-v2.woff2
    • public/vendor/arabicable/QCF_SurahHeader_COLOR-Regular.woff2
    • public/vendor/arabicable/surah-name-v2.woff2
  2. Define your Quran text class:

@font-face {
  font-family: 'MadinaQuran';
  src: url('/vendor/arabicable/madina.woff2') format('woff2');
  font-display: swap;
}

.font-quran {
  font-family: 'MadinaQuran', 'Amiri', serif;
}

.font-quran-surah-header {
  font-family: 'QcfSurahHeaderColor', 'SurahNameV2', 'MadinaQuran', 'Amiri', serif;
}
  1. Render Uthmani text with .font-quran, while using text_searchable/token_searchable fields for search queries.

  2. Optional package config for surah header font selection:

'quran_fonts' => [
    'surah_headers' => [
        'preferred' => 'qcf-surah-header-color-regular',
        'available' => [
            'qcf-surah-header-color-regular' => [
                'family' => 'QcfSurahHeaderColor',
                'filename' => 'QCF_SurahHeader_COLOR-Regular.woff2',
                'format' => 'woff2',
            ],
            'surah-name-v2' => [
                'family' => 'SurahNameV2',
                'filename' => 'surah-name-v2.woff2',
                'format' => 'woff2',
            ],
        ],
    ],
],

If you integrate the external digitalkhatt.js stack, keep Arabicable as the search/index layer and use DigitalKhatt purely for display shaping.

RunAnywhere STT/TTS Companion

Install frontend packages:

npm install @runanywhere/web @runanywhere/web-onnx

Initialize and register your bridge:

import { RunAnywhere, SDKEnvironment } from '@runanywhere/web';
import { ONNX } from '@runanywhere/web-onnx';

await RunAnywhere.initialize({ environment: SDKEnvironment.Production, debug: false });
await ONNX.register();

window.ArabicableRunAnywhere.setBridge({
  async speechToText(audioInput, options = {}) {
    // Return string or object containing transcript text
    return { text: '' };
  },
  async textToSpeech(text, options = {}) {
    // Return Float32Array, Blob, or { audio/blob/url, sampleRate }
    return { audio: new Float32Array(), sampleRate: 24000 };
  },
});

Bridge API methods:

  • speechToText(audioInput, options)
  • textToSpeech(text, options)
  • voiceToText(audioInput, options)
  • textToVoice(text, options)
  • voiceToVoice(audioInput, options)
  • transformArabic({ text, audio, target }, options)
  • decodeAudioBlob(blob, targetSampleRate?)
  • playFloat32Audio(audio, sampleRate?)
  • playAudio(result)

Dictionary Workflow (arabicable.raw_data_path)

compiled-* files are runtime dictionaries. source-* files are local raw assets.

Compile:

php artisan arabicable:compile-data
php artisan arabicable:compile-data --raw-data-path=/absolute/path/to/raw-data
php artisan arabicable:compile-data --without-extra-stopwords

Seed DB dictionaries:

php artisan arabicable:seed --all
php artisan arabicable:seed --common-texts --stop-words
php artisan arabicable:seed --all --truncate

Current DB imports:

  • common_arabic_texts
  • arabic_stop_words

Current file-backed lexical expansion sources:

  • <raw_data_path>/verbs/compiled-word-variants.tsv
  • <raw_data_path>/quran/compiled-quran-word-index.tsv

All source paths are configurable via config/arabicable.php using:

  • arabicable.raw_data_path (global base path)
  • arabicable.data_sources.* (per-file/per-directory overrides)

API

Validation Rules

  • GoodMaven\Arabicable\Rules\Arabic
  • GoodMaven\Arabicable\Rules\ArabicWithSpecialCharacters
  • GoodMaven\Arabicable\Rules\UncommonArabic
  • GoodMaven\Arabicable\Rules\UniqueArabicWithSpecialCharacters

Artisan Commands

Command Purpose
arabicable:install Publish config/migrations, migrate, optional seed (--testing, --seed)
arabicable:compile-data Compile local datasets (--raw-data-path, --without-extra-stopwords)
arabicable:seed Import configured dictionaries (--all, --common-texts, --stop-words, --truncate)

Migration Macros

Macro Purpose
indianDate($columnName, $isNullable = false, $isUnique = false) Creates date column and <column>_indian
arabicString($columnName, $length = 255, $isNullable = false, $isUnique = false, $supportsFullSearch = false, $isTranslatable = null) String + Arabicable variant columns
arabicTinyText($columnName, $isNullable = false, $isUnique = false, $supportsFullSearch = false, $isTranslatable = null) TinyText + variants
arabicText($columnName, $isNullable = false, $isUnique = false, $isTranslatable = null) Text + variants
arabicMediumText($columnName, $isNullable = false, $isUnique = false, $isTranslatable = null) MediumText + variants
arabicLongText($columnName, $isNullable = false, $isUnique = false, $isTranslatable = null) LongText + variants

Global Functions

  • ar_indian(string $property): string
  • ar_with_harakat(string $property): string
  • ar_searchable(string $property): string
  • ar_stem(string $property): string
  • ar_expand_variants(...)
  • arabicable_special_characters(...)
  • camel_tools() and camel_* helpers

CamelTools Facade (Pure-PHP Utility Port)

Key utilities:

  • Builtin mapping/transliteration: mapWithBuiltin, transliterateWithBuiltin, arclean
  • Unicode/orthographic normalization: normalizeUnicode, normalizeAlef*, normalizeAlefMaksura*, normalizeTehMarbuta*, normalizeOrthography
  • Dediacritization: dediac*
  • Tokenization: simpleWordTokenize

Contribution

  • Always target dev branch for your PRs.

License

This package is open-sourced software licensed under the MIT license.

Credits


والحمد لله رب العالمين