octoberfa / virastar
Virastar is a Persian text cleaner.
Installs: 5
Dependents: 0
Suggesters: 0
Security: 0
Stars: 4
Watchers: 2
Forks: 1
Open Issues: 0
pkg:composer/octoberfa/virastar
Requires
- php: >=7.1
Requires (Dev)
- phpunit/phpunit: 6.*
This package is auto-updated.
Last update: 2025-10-10 22:57:43 UTC
README
Virastar is a Persian text cleaner.
A php port of juvee/virastar
Install
composer require octoberfa/virastar
Usage
require "./vendor/autoload.php"; echo virastar("فارسي را كمی درست تر می نويسيم"); // Outputs: "فارسی را کمی درستتر مینویسیم"
virastar([text] [,options])
text
Type: string
String of Persian source to be cleaned.
options
Type: array
virastar("سلام 123" ,["fix_english_numbers":false]); // Outputs:"سلام 123"
Options and Specifications
Virastar comes with a list of options to control its behavior.
all options are enabled by default.
-
normalize_eol- replace Windows end of lines with Unix EOL (
\n)
- replace Windows end of lines with Unix EOL (
-
decode_htmlentities- converts all HTML characterSets into original characters
-
fix_dashes- replace double dash to ndash and triple dash to mdash
-
fix_three_dots- replace three dots with ellipsis
-
fix_english_quotes_pairs- replace English quotes pairs (
“”) with their Persian equivalent («»)
- replace English quotes pairs (
-
fix_english_quotes- replace English quotes, commas and semicolons with their Persian equivalent
-
fix_hamzeh- convert
ه یtoهٔ
- convert
-
cleanup_rlm- converting Right-to-left marks followed by Persian characters to zero-width non-joiners (ZWNJ)
-
cleanup_zwnj- remove more than one zwnj chars
- remove unnecessary zwnj chars that are succeeded/preceded by a space
- clean zwnj chars after Persian characters that don't conncet to the next letter
- clean zwnj chars before English characters
- clean zwnj chars after and before punctuation
-
fix_arabic_numbers- replace Arabic numbers with their Persian equivalent
-
fix_english_numbers- replace English numbers with their Persian equivalent
- should not replace English numbers in English phrases
-
skip_markdown_ordered_lists_numbers_conversion- skip converting English numbers of ordered lists in markdown
-
fix_misc_non_persian_chars- replace Arabic kaf and Yeh with its Persian equivalent
-
fix_question_mark- replace question marks with its Persian equivalent
-
fix_perfix_spacing- put zwnj between word and prefix (
mi*nemi*)
- put zwnj between word and prefix (
-
fix_suffix_spacing- put zwnj between word and suffix (
*tar*tarin*ha*haye)
- put zwnj between word and suffix (
-
fix_spacing_for_braces_and_quotes- fix spacing for
()[]{}“”«»(one space outside, no space inside) - correct
:;,.?!spacing (one space after and no space before)
- fix spacing for
-
cleanup_spacing- replace more than one space with just a single one
-
cleanup_begin_and_end- remove spaces, tabs, and new lines from the beginning and end of text
-
cleanup_extra_marks- replace more than one
!or?mark with just one
- replace more than one
-
kashidas_as_parenthetic- replace kashidas to ndash in parenthetic
-
cleanup_kashidas- remove all kashidas
-
preserve_HTML- preserve all HTML tags
-
preserve_URIs- preserve all URI links in the text
-
preserve_brackets- preserve strings inside square brackets (
[])
- preserve strings inside square brackets (
-
preserve_braces- preserve strings inside curly braces (
{})
- preserve strings inside curly braces (
-
preserve_code- preserve strings inside html code tag and markdown "```"
-
preserve_pre- preserve strings inside html pre tag
License
This software is licensed under the MIT License. View the license.