Parsing Arabic text in Dart (Partial) Port of PyArabic http://github.com/linuxscout/pyarabic
Import 'package:dartarabic/dartarabic.dart'
and access the static methods in DartArabic
class, and Arabic
class.
Get strings of Arabic language characters by accessing Arabic
class and Arabic.Symbols
. for example:
print(Arabic.ALEF);
print(Arabic.BEH);
print(Arabic.TEH);
print(Arabic.Symbols.QUESTION);
print(Arabic.Symbols.SEMICOLON);
print(Arabic.Symbols.SHADDA);
Outputs:
ا
ب
ت
؟
؛
ّ
Some helping lists/maps are also provided:
print(Arabic.NAMES);
Output:
{ا: ألف, ب: باء, ت: تاء, ة: تاء مربوطة, ث: ثاء, ج: جيم, ح: حاء, خ: خاء, د: دال, ذ: ذال, ر: راء, ز: زاي, س: سين, ش: شين, ص: صاد, ض: ضاد, ط: طاء, ظ: ظاء, ع: عين, غ: غين, ف: فاء, ق: قاف, ك: كاف, ل: لام, م: ميم, ن: نون, ه: هاء, و: واو, ي: ياء, ء: همزة, ـ: تطويل, آ: ألف ممدودة, ى: ألف مقصورة, أ: همزة على الألف, ؤ: همزة على الواو, إ: همزة تحت الألف, ئ: همزة على الياء, ً: فتحتان, ٌ: ضمتان, ٍ: كسرتان, َ: فتحة, ُ: ضمة, ِ: كسرة, ّ: شدة, ْ: سكون}
See Api Documentation for all available fields.
See all string literals at DartArabic Showcase
- stripHarakat
- stripTashkeel
- stripDiacritics
- stripTatweel
- stripShadda
- normalizeLigature
- normalizeHamzaUniform
- normalizeHamzaTasheel
- normalizeAlef
- normalizeLetters
Strip Harakat from arabic word except Shadda. The striped marks are :
- FATHA, DAMMA, KASRA
- SUKUN
- FATHATAN, DAMMATAN, KASRATAN
Example:
print(DartArabic.stripHarakat("الْعَرَبِيّةُ"));
Outputs: العربيّة
Strip vowels from a text, include Shadda. The striped marks are :
- FATHA, DAMMA, KASRA
- SUKUN
- SHADDA
- FATHATAN, DAMMATAN, KASRATAN
Example:
print(DartArabic.stripTashkeel("الْعَرَبِيّةُُ"));
Outputs: العربية
Strip arabic diacritics from a text. The striped marks are :
- Small Alef
- Harakat + Shadda
- Quranic marks
- Extended arabic diacritics
Example:
print(DartArabic.stripDiacritics("الْعَرَبِيّةُُ"));
Outputs: العربية
Strip tatweel from a text and return a result text. Example:
print(DartArabic.stripTatweel("العـــــربيةُ"));
Outputs: العربيةُ
Strip Shadda from a text and return a result text.
Example:
print(DartArabic.stripShadda("الشّمسيّة"));
Outputs: الشمسية
Normalize Lam Alef ligatures into two letters (LAM and ALEF),and and return a result text. Some systems present lamAlef ligature as a single letter, this function convert it into two letters, The converted letters into LAM and ALEF are :
- LAM_ALEF
- LAM_ALEF_HAMZA_ABOVE
- LAM_ALEF_HAMZA_BELOW
- LAM_ALEF_MADDA_ABOVE
Example:
print(DartArabic.normalizeLigature("ﻻنحالي ﻷﻹﻵ"));
Outputs: لانحالي لالالا
Standardize the Hamzat into one form of hamza(uniform method), replace Madda by hamza and alef. Replace the LamAlefs by simplified letters.
Example:
print(DartArabic.normalizeHamzaUniform("جاء سؤال الأئمة عن الإسلام آجلا"));
Outputs: جاء سءال الءءمة عن الءسلام ءءجلا
Standardize the Hamzat into one form of hamza(Tasheel method), replace Madda by hamza and alef. Replace the LamAlefs by simplified letters.
Example:
print(DartArabic.normalizeHamzaTasheel("جاء سؤال الأئمة عن الإسلام آجلا"));
Outputs: جاء سوال الايمة عن الاسلام اجلا
Converts all alefs to ALEF_MAMDODA with the exception of Alef maksura
Example:
print(DartArabic.normalizeAlef("بِٱلْهُدَىٰ"));
Outputs: بِالْهُدَا
converts non standard letter characters to single letters. e.g HEH_START ﻫ is converted to ه
Example:
print(DartArabic.normalizeLetters("ﻫﻞ"));
Outputs: هل