By Topic

Improving Chinese/English OCR performance by using MCE-based character-pair modeling and negative training

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Qiang Huo ; dept. of Comput. Sci. & Inf. Syst., Hong Kong Univ., China ; Feng, Zhi-Dan

In the past several years, we've been developing a high performance OCR engine for machine printed Chinese/ English documents. We have reported previously (1) how to use character modeling techniques based on MCE (minimum classification error) training to achieve the high recognition accuracy, and (2) how to use confidence-guided progressive search and fast match techniques to achieve the high recognition efficiency. In this paper, we present two more techniques that help reduce search errors and improve the robustness of our character recognizer. They are (1) to use MCE-trained character-pair models to avoid error-prone character-level segmentation for some trouble cases, and (2) to perform a MCE-based negative training to improve the rejection capability of the recognition models on the hypothesized garbage images during recognition process. The efficacy of the proposed techniques is confirmed by experiments in a benchmark test.

Published in:

Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on

Date of Conference:

3-6 Aug. 2003