Skip to Main Content
This paper introduces an Arabic text annotation tool called Fassiehreg. Via a sophisticated interactive GUI application, Fassiehreg makes it easy to build structured large standard written Arabic corpora, then allows the production of fundamental linguistic analyses; i.e., language factorizations, at high coverage and accuracy rates over such corpora. Arabic morphological analysis, part-of-speech (PoS)-tagging, full phonetic transcription (diacritization), and lexical semantics analysis are the most significant Arabic language factorizations currently supported by Fassiehreg. The high inherent ambiguity of these analyses is statistically resolved in Fassiehreg which also affords a multitude of auxiliary features enabling a guided, normalized, and efficient proofreading of any part of the factorized corpus. The paper first reviews the highly inflective and derivative nature of Arabic language, our Arabic language factorization models, and the associated statistical disambiguation methodology. Afterwards, we present Fassiehreg which is not only a text annotation tool, but is also an evaluation, demonstrative, and tutorial means of Arabic natural language processing (NLP).