Skip to Main Content
This paper aims at the script identification problem of handwritten document images, which facilitates many important applications such as sorting, transcription of multilingual documents and indexing of large collection of such images, or as a precursor to optical character recognition (OCR). The script identification scheme proposed in this paper has two phases. First phase reports the script identification of text words using global and local features, extracted by morphological filters and regional descriptors of three major Indian languages/scripts: Kannada, Roman and Devnagari. In the second phase Kannada and Roman handwritten numerals script identification is carried out. For classification of text words and numerals, a K nearest neighbour algorithm is used. The proposed algorithm achieves an average maximum recognition accuracy is 96.05% and 99% respectively for text words and numerals with five fold cross validation test. The data set containing 3000 text words and 400 numerals collected from 250 writers. The novelty of the proposed algorithm is robust for noise, writer style, size and ink etc.