Skip to Main Content
Preprocessing is the most important stage in the Arabic OCR system; it has a direct effect on the reliability and efficiency of the segmentation and feature extraction stages. It is worth mentioning that Arabic language is cursively written, and its characters have between two to four shapes. An Arabic word likely consists of two or more characters which are connected through an imaginary line called baseline. Detecting baseline is one of the main majorities in preprocessing Arabic OCR system. The baseline can be used for both skew normalization and character segmentation. In this paper the challenges of the Arabic baseline detection methods are listed and clarified. Also this paper aims to provide a brief comparison between the methods of Arabic baseline detection. The comparison has been done based on each of the natures of the Arabic language written, and the diacritics, such as dots and zigzag, and the word slop, and the subwords found.