Loading [MathJax]/extensions/MathMenu.js
Improving Classical OCRs for Brahmic Scripts Using Script Grammar Learning | IEEE Conference Publication | IEEE Xplore

Improving Classical OCRs for Brahmic Scripts Using Script Grammar Learning


Abstract:

Classical OCRs based on isolated character (symbol) recognition have been the fundamental way of generating textual representations, particularly for Indian scripts, unti...Show More

Abstract:

Classical OCRs based on isolated character (symbol) recognition have been the fundamental way of generating textual representations, particularly for Indian scripts, until the time transcription-based approaches gained momentum. Though the former approaches have been criticized as prone to failures, their accuracy has nevertheless been fairly decent in comparison with the newer transcription-based approaches. Analysis of isolated character recognition OCRs for Hindi and Bangla revealed most errors were generated in converting the output of the classifier to valid Unicode sequences, i.e., script grammar generation. Linguistic rules to generate scripts are inadequately integrated, thus resulting in a rigid Unicode generation scheme which is cumbersome to understand and error prone in adapting to new Indian scripts. In this paper we propose a machine learning-based classifier symbols to Unicode generation scheme which outperforms the existing generation scheme and improves accuracy for Devanagari and Bangla scripts.
Date of Conference: 09-15 November 2017
Date Added to IEEE Xplore: 29 January 2018
ISBN Information:
Electronic ISSN: 2379-2140
Conference Location: Kyoto, Japan

Contact IEEE to Subscribe

References

References is not available for this document.