Skip to Main Content
Though designing of classifies for Indic script handwriting recognition has been researched with enough attention, use of language model has so far received little exposure. This paper attempts to develop a weighted finite-state transducer (WFST) based language model for improving the current recognition accuracy. Both the recognition hypothesis (i.e. the segmentation lattice) and the lexicon are modeled as two WFSTs. Concatenation of these two FSTs accept a valid word(s) which is (are) present in the recognition lattice. A third FST called error FST is also introduced to retrieve certain words which were missing in the previous concatenation operation. The proposed model has been tested for online Bangla handwriting recognition though the underlying principle can equally be applied for recognition of offline or printed words. Experiment on a part of ISI-Bangla handwriting database shows that while the present classifiers (without using any language model) can recognize about 73% word, use of recognition and lexicon FSTs improve this result by about 9% giving an average word-level accuracy of 82%. Introduction of error FST further improves this accuracy to 93%. This remarkable improvement in word recognition accuracy by using FST-based language model would serve as a significant revelation for the research in handwriting recognition, in general and Indic script handwriting recognition, in particular.
Date of Conference: 18-21 Sept. 2011