goodm4ven / arabicable
A unified Arabic language support package crafted in pure PHP with smooth Laravel integration
Requires
- php: ^8.4||^8.3
- goodm4ven/anvil: ^1.0
- illuminate/contracts: ^11.0||^12.0
- illuminate/support: ^11.0||^12.0
- spatie/laravel-package-tools: ^1.16
Requires (Dev)
- goodm4ven/tailwind-merge: ^1.0
- larastan/larastan: ^3.0
- laravel/boost: ^2.0
- laravel/pint: ^1.14
- livewire/livewire: ^4.1
- nunomaduro/collision: ^8.8
- orchestra/testbench: ^10.0.0||^9.0.0
- pestphp/pest: ^4.0
- pestphp/pest-plugin-arch: ^4.0
- pestphp/pest-plugin-browser: ^4.0
- pestphp/pest-plugin-laravel: ^4.0
- phpstan/extension-installer: ^1.4
- phpstan/phpstan-deprecation-rules: ^2.0
- phpstan/phpstan-phpunit: ^2.0
README
بسم الله الرحمن الرحيمArabicable
Practical Arabic text processing for Laravel, focused on fast and predictable Arabic search with database-backed searchable variants.
Description
Arabicable primarily stores Arabic field variants in dedicated database columns, so indexing and querying remain consistent and fast through normal Eloquent workflows.
For each Arabicable text column, the package maintains:
<column><column>_with_harakat<column>_searchable<column>_stemmed
And all in all, this package provides all of these features:
Anaylsis and Search
- Generate Arabic-ready database columns and keep searchable variants in sync automatically with Arabicable model and migration setup.
- Run exact, like, and relevance-ranked Arabic matching with search scopes and text processing helpers.
- Build comprehensive query plans using normalization, tokenization, stop-word filtering, stemming, and lexical expansion via text processing helpers and search scopes.
- Control text with or without harakat and diacritics using the ArabicFilter facade, Arabic facade common methods, and CamelTools facade.
- Normalize letters, punctuation, spacing, and keywords using the Arabic facade common methods and CamelTools facade.
- Compile and seed local lexical dictionaries for variants and stop-words using the dictionary workflow.
- Use pure-PHP transliteration, mapping, normalization, dediacritization, and tokenization through the CamelTools facade.
Al-Qur'an
- Query Quran data at both ayah and exact word-occurrence levels with Quran indexing tables.
- Render Quran text with DigitalKhatt-compatible fonts while searching normalized fields via DigitalKhatt / Quran font setup.
Date
- Convert dates between Gregorian and Hijri calendars with Gregorian/Hijri date helpers and Arabic facade common methods.
Numbers
- Convert numerals between ASCII digits (
123) and Arabic-Indic digits (١٢٣) from backend and browser using Arabic facade common methods and the JavaScript number helper.
Voice
- Add browser speech-to-text, text-to-speech, and voice transforms with the RunAnywhere STT/TTS companion.
Installation
- Install via Composer:
composer require goodm4ven/arabicable
- Run installer:
php artisan arabicable:install --seed
This publishes config + migrations, runs migration (with prompt unless --testing), and imports dictionaries when --seed is provided.
- Publish the assets:
php artisan vendor:publish --tag=arabicable-assets
Raw data resolution defaults are customizable if you wish:
arabicable.raw_data_pathis auto-resolved fromvendor/goodm4ven/arabicable/resources/raw-data, thenresources/raw-data.- You can publish package raw datasets to your app with:
php artisan vendor:publish --tag="arabicable-raw-data" --force
Upgrading
Refresh published package files:
php artisan vendor:publish --tag="arabicable-config" --force php artisan vendor:publish --tag="arabicable-migrations" --force php artisan vendor:publish --tag="arabicable-raw-data" --force php artisan migrate
If local dictionary sources changed:
php artisan arabicable:compile-data php artisan arabicable:seed --all --truncate
Usage
Arabicable Model & Migration
-
Add an Arabicable migration macro:
use Illuminate\Database\Schema\Blueprint; use Illuminate\Support\Facades\Schema; Schema::create('notes', function (Blueprint $table): void { $table->id(); $table->arabicText('content'); $table->timestamps(); });
-
Use
Arabicabletrait on your model:use GoodMaven\Arabicable\Traits\Arabicable; use Illuminate\Database\Eloquent\Model; class Note extends Model { use Arabicable; protected $fillable = ['content']; }
Observer-managed columns are now updated automatically when content changes:
$note->content$note->content_with_harakat$note->content_searchable$note->content_stemmed
If using Spatie Translatable, migration macros accept ?bool $isTranslatable:
$table->arabicText('content', isTranslatable: true);
Note
Arabicable trait also provides getSearchableTranslations() for flattened keys like content_searchable_ar, content_searchable_en, etc.
Search Scopes
Arabicable trait includes:
scopeSearchArabicscopeWhereArabicLikescopeSearchArabicComprehensivescopeWhereArabicComprehensivescopeOrderByArabicRelevance
Example:
Post::query() ->searchArabic('content', $query) ->limit(20) ->get();
Comprehensive mode builds terms from:
- normalized query text
- tokens
- stop-word removal
- stems
- lexical variants (
roots,stems,original_words, orall)
Text Processing Helpers
use GoodMaven\Arabicable\Facades\Arabic; use GoodMaven\Arabicable\Facades\ArabicFilter; $searchable = ArabicFilter::forSearch($text); $stemmed = ArabicFilter::forStem($text); $withoutHarakat = ArabicFilter::withoutHarakat($text); $clean = Arabic::stripWeirdCharacters($text, keepHarakat: true, keepPunctuation: true); $keywords = Arabic::extractKeywords($text); $plan = Arabic::buildComprehensiveSearchPlan($query); $variants = Arabic::expandWordVariants($tokens, mode: 'all', stripStopWords: true);
ArabicFilter Facade
withHarakat(string $text): stringwithoutHarakat(string $text): stringwithoutDiacritics(string $text, bool $keepShadda = false): stringforSearch(string $text): stringforStem(string $text): stringforMemorizationComparison(string $text, bool $stripCommons = true, bool $stripConnectors = true): string
Arabic Facade (Common Methods)
- Harakat/diacritics:
removeHarakat,removeDiacritics,addHarakat - Normalization:
normalizeHuroof,normalizeNumeralsForSearch,stripWeirdCharacters - Keywords/search:
tokenize,stemWord,stemWords,removeStopWords,extractKeywords,buildComprehensiveSearchPlan,expandWordVariants - Commons/cache:
removeCommons,clearConceptCache - Punctuation/spacing:
toTightPunctuationStyle,toLoosePunctuationStyle,removeAllPunctuationMarks,normalizeSpaces - Date conversion:
gregorianToHijri,hijriToGregorian
Quran Indexing Tables
Enable Quran features in config when needed:
'features' => [ 'quran' => true, ],
After running migrations, Arabicable creates and imports:
quran_verses: one row per ayah withsurah_number,ayah_number,ayah_index,text_uthmani,text_searchable,text_sanitized,text_without_harakat,text_without_diacritics,text_normalized_huroof.quran_words: one row per word occurrence withverse_id,word_position,global_word_index,token_uthmani,token_sanitized,token_searchable,token_without_harakat,token_without_diacritics,token_normalized_huroof,token_stem,token_root,token_lemma.quran_verse_explanations: ayah-linked tafsir/i'rab records from SQLite sources (source_key,content_kind,content_html,content_text).quran_word_annotations: optional word-level notes/translation payloads linked to exactquran_wordsrows.
For repeat words, target the occurrence with:
verse_id + word_positionfor stable position inside ayah.global_word_indexfor a single canonical word pointer across the full Quran.
This structure is ready for later tafsir/translation attachments at ayah level or exact word occurrence level.
Default source config keys:
arabicable.raw_data_patharabicable.data_sources.quran_othmani_surahs_dirarabicable.data_sources.quran_exegesis_databases_dirarabicable.data_sources.quran_layout_databases_dirarabicable.data_sources.quran_lexicon_databases_dirarabicable.data_sources.quran_fonts_dirarabicable.data_sources.quran_surah_headers_fonts_dir
For tafsir / i'rab SQLite data:
- Put files like
ar-tafsir-al-tabari.dbandal-i-rab-al-muyassar.dbin<raw_data_path>/quran/exegesis. - Keep required exegesis SQLite files inside
<raw_data_path>/quran/exegesis(or your configured data source path). - Explanations are stored for display and retrieval, and are not part of Arabicable search indexing.
Gregorian/Hijri Date Helpers
use GoodMaven\Arabicable\Facades\Arabic; $hijri = Arabic::gregorianToHijri(2025, 1, 1); // ['year' => ..., 'month' => ..., 'day' => ...] $gregorian = Arabic::hijriToGregorian($hijri['year'], $hijri['month'], $hijri['day']);
JavaScript Number Helper
Published asset: public/vendor/arabicable/arabicable.js
window.ArabicableNumbers.toArabicIndic('123'); // "١٢٣" window.ArabicableNumbers.toAscii('١٢٣'); // "123" window.ArabicableNumbers.normalizeForBackendSearch('رقم ١٢٣', 'arabic'); // "رقم 123" window.ArabicableNumbers.normalizeForBackendSearch('رقم 123', 'indian'); // "رقم ١٢٣" window.ArabicableNumbers.normalizeForBackendSearch('123', 'both'); // "123 ١٢٣"
DigitalKhatt / Quran Font Setup
Arabicable can be paired with DigitalKhatt-style Quran rendering in your app frontend.
- Publish package assets (and optionally Quran raw-data files if you want app-local copies):
php artisan vendor:publish --tag=arabicable-assets --force php artisan vendor:publish --tag=arabicable-raw-data --force
-
Use the included Quran font file (published under
public/vendor/arabicable/madina.woff2), or replace it with your preferred DigitalKhatt-compatible font build.Surah header fonts are also bundled and available in both locations:
resources/raw-data/quran/fonts/surah-headers/QCF_SurahHeader_COLOR-Regular.woff2resources/raw-data/quran/fonts/surah-headers/surah-name-v2.woff2public/vendor/arabicable/QCF_SurahHeader_COLOR-Regular.woff2public/vendor/arabicable/surah-name-v2.woff2
-
Define your Quran text class:
@font-face { font-family: 'MadinaQuran'; src: url('/vendor/arabicable/madina.woff2') format('woff2'); font-display: swap; } .font-quran { font-family: 'MadinaQuran', 'Amiri', serif; } .font-quran-surah-header { font-family: 'QcfSurahHeaderColor', 'SurahNameV2', 'MadinaQuran', 'Amiri', serif; }
-
Render Uthmani text with
.font-quran, while usingtext_searchable/token_searchablefields for search queries. -
Optional package config for surah header font selection:
'quran_fonts' => [ 'surah_headers' => [ 'preferred' => 'qcf-surah-header-color-regular', 'available' => [ 'qcf-surah-header-color-regular' => [ 'family' => 'QcfSurahHeaderColor', 'filename' => 'QCF_SurahHeader_COLOR-Regular.woff2', 'format' => 'woff2', ], 'surah-name-v2' => [ 'family' => 'SurahNameV2', 'filename' => 'surah-name-v2.woff2', 'format' => 'woff2', ], ], ], ],
If you integrate the external digitalkhatt.js stack, keep Arabicable as the search/index layer and use DigitalKhatt purely for display shaping.
RunAnywhere STT/TTS Companion
Install frontend packages:
npm install @runanywhere/web @runanywhere/web-onnx
Initialize and register your bridge:
import { RunAnywhere, SDKEnvironment } from '@runanywhere/web'; import { ONNX } from '@runanywhere/web-onnx'; await RunAnywhere.initialize({ environment: SDKEnvironment.Production, debug: false }); await ONNX.register(); window.ArabicableRunAnywhere.setBridge({ async speechToText(audioInput, options = {}) { // Return string or object containing transcript text return { text: '' }; }, async textToSpeech(text, options = {}) { // Return Float32Array, Blob, or { audio/blob/url, sampleRate } return { audio: new Float32Array(), sampleRate: 24000 }; }, });
Bridge API methods:
speechToText(audioInput, options)textToSpeech(text, options)voiceToText(audioInput, options)textToVoice(text, options)voiceToVoice(audioInput, options)transformArabic({ text, audio, target }, options)decodeAudioBlob(blob, targetSampleRate?)playFloat32Audio(audio, sampleRate?)playAudio(result)
Dictionary Workflow (arabicable.raw_data_path)
compiled-* files are runtime dictionaries. source-* files are local raw assets.
Compile:
php artisan arabicable:compile-data php artisan arabicable:compile-data --raw-data-path=/absolute/path/to/raw-data php artisan arabicable:compile-data --without-extra-stopwords
Seed DB dictionaries:
php artisan arabicable:seed --all php artisan arabicable:seed --common-texts --stop-words php artisan arabicable:seed --all --truncate
Current DB imports:
common_arabic_textsarabic_stop_words
Current file-backed lexical expansion sources:
<raw_data_path>/verbs/compiled-word-variants.tsv<raw_data_path>/quran/compiled-quran-word-index.tsv
All source paths are configurable via config/arabicable.php using:
arabicable.raw_data_path(global base path)arabicable.data_sources.*(per-file/per-directory overrides)
API
Validation Rules
GoodMaven\Arabicable\Rules\ArabicGoodMaven\Arabicable\Rules\ArabicWithSpecialCharactersGoodMaven\Arabicable\Rules\UncommonArabicGoodMaven\Arabicable\Rules\UniqueArabicWithSpecialCharacters
Artisan Commands
| Command | Purpose |
|---|---|
arabicable:install |
Publish config/migrations, migrate, optional seed (--testing, --seed) |
arabicable:compile-data |
Compile local datasets (--raw-data-path, --without-extra-stopwords) |
arabicable:seed |
Import configured dictionaries (--all, --common-texts, --stop-words, --truncate) |
Migration Macros
| Macro | Purpose |
|---|---|
indianDate($columnName, $isNullable = false, $isUnique = false) |
Creates date column and <column>_indian |
arabicString($columnName, $length = 255, $isNullable = false, $isUnique = false, $supportsFullSearch = false, $isTranslatable = null) |
String + Arabicable variant columns |
arabicTinyText($columnName, $isNullable = false, $isUnique = false, $supportsFullSearch = false, $isTranslatable = null) |
TinyText + variants |
arabicText($columnName, $isNullable = false, $isUnique = false, $isTranslatable = null) |
Text + variants |
arabicMediumText($columnName, $isNullable = false, $isUnique = false, $isTranslatable = null) |
MediumText + variants |
arabicLongText($columnName, $isNullable = false, $isUnique = false, $isTranslatable = null) |
LongText + variants |
Global Functions
ar_indian(string $property): stringar_with_harakat(string $property): stringar_searchable(string $property): stringar_stem(string $property): stringar_expand_variants(...)arabicable_special_characters(...)camel_tools()andcamel_*helpers
CamelTools Facade (Pure-PHP Utility Port)
Key utilities:
- Builtin mapping/transliteration:
mapWithBuiltin,transliterateWithBuiltin,arclean - Unicode/orthographic normalization:
normalizeUnicode,normalizeAlef*,normalizeAlefMaksura*,normalizeTehMarbuta*,normalizeOrthography - Dediacritization:
dediac* - Tokenization:
simpleWordTokenize
Contribution
- Always target
devbranch for your PRs.
License
This package is open-sourced software licensed under the MIT license.
Credits
- Youssif Shaaban Alsager (yshalsager)
- Linuxscout
- CAMeL Tools
- ar-php
- Qul by Tarteel
- Nuqaya
والحمد لله رب العالمين