Skip to Main Content
This paper presents a new method for localization of digit strings with a specific syntax in Farsi/ Arabic document images. First, some features are extracted from all connected components in each text line. These features, are provided for Farsi/ Arabic scripts, and have the ability to differentiate between digits and non-digit connected components. Then, these features are classified, and the probabilities of being in each of four classes digit, slash, double-digit, and non-digit, is assigned to each connected component. Next, discrete hidden Marcov model as syntactic analyzer, localize digit strings with desired syntaxes. The results which are presented for handwritten and machine-printed text lines, separately, are very promising.