By Topic

Rule-based middle-level character detection for simplifying Thai document layout analysis

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Yingsaeree, C. ; Dept. of Comput. Eng., Kasetsart Univ., Bangkok, Thailand ; Kawtrakul, A.

Although research interest in machine printed Thai character recognition has been an intense research area in the past decade, there are only a few results available for Thai document layout analysis. In addition, directly using the method proposed for other languages with Thai documents is not possible since Thai documents have a unique characteristic (i.e., Thai characters can be placed in four different levels). This paper proposed an approach to eliminate that characteristic by removing nonmiddle-level characters from the image based on heuristic rules derived from Thai language properties: nonmiddle-level characters are usually smaller than middle-level characters, and the gap between each level is smaller than the gap between two consecutive lines. After they are removed, one can use any existing methods with Thai documents without any modification. The experimental results show that the proposed method can effectively remove nonmiddle-level characters from 200 test images with 99.46% accuracy even when the image contains various font sizes.

Published in:

Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on

Date of Conference:

29 Aug.-1 Sept. 2005