By Topic

Converting Myanmar printed document image into machine understandable text format

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Htwe Pa Pa Win ; University of Computer Studies, Yangon, Myanmar ; Phyo Thu Thu Khine ; Khin Nwe Ni Tun

The large amount of Myanmar document images are getting archived by the Digital Libraries, an efficient strategy is needed to convert document image into machine understandable text format. The state of the art OCR systems can't do for Myanmar scripts as our language pose many challenges for document understanding. Therefore, this paper plans an OCR system for Myanmar Printed Document (OCRMPD) with several proposed methods that can automatically convert Myanmar printed text to machine understandable text. Firstly, the input image is enhanced by making some correction on noise variants. Then, the characters are segmented with a novel segmentation method. The features of the isolated characters are extracted with a hybrid feature extraction method to overcome the similarity problems of the Myanmar scripts. Finally, hierarchical mechanism is used for SVM classifier for recognition of the character image. The experiments are carried out on a variety of Myanmar printed documents and results show the efficiency of the proposed algorithms.

Published in:

Digital Information Management (ICDIM), 2011 Sixth International Conference on

Date of Conference:

26-28 Sept. 2011