Skip to Main Content
In this paper, we propose a new approach for detecting and recognizing numerical strings in Farsi/Arabic handwritten or machine-printed document images. We assign a label to each of the connected components as they belong to a numerical string or not. First, in order to differentiate between digit and non-digit connected components, some simple features are extracted from all connected components in each text line. Then, these features are classified with a fuzzy rule-based classifier to extract some candidate strings. After using a digit recognizer, syntax of the numerical strings are validated by a syntactic verifier. Experimental results show an acceptable detection rate with low false positive rate.