By Topic

High performance Chinese OCR based on Gabor features, discriminative feature extraction and model training

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Qiang Huo ; Dept. of Comput. Sci. & Inf. Syst., Hong Kong Univ., China ; Ge, Yong ; Feng, Zhi-Dan

We have developed a Chinese OCR engine for machine printed documents. Currently, our OCR engine can support a vocabulary of 6921 characters which include 6707 simplified Chinese characters in GB2312-80, 12 frequently used GBK Chinese characters, 62 alphanumeric characters, 140 punctuation marks and symbols. The supported font styles include Song, Fang Song, Kat, He, Yuan, LiShu, WeiBei, XingKai, etc. The averaged character recognition accuracy is above 99% for newspaper quality documents with a recognition speed of about 250 characters per second on a Pentium III-450 MHz PC yet only consuming less than 2 MB memory. We describe the key technologies we used to construct the above recognizer. Among them, we highlight three key techniques contributing to the high recognition accuracy, namely the use of Gabor features, the use of discriminative feature extraction, and the use of minimum classification error as a criterion for model training

Published in:

Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP '01). 2001 IEEE International Conference on  (Volume:3 )

Date of Conference:

2001