Skip to Main Content
In the past several years, we've been developing a high performance OCR engine for machine printed Chinese/ English documents. We have reported previously (1) how to use character modeling techniques based on MCE (minimum classification error) training to achieve the high recognition accuracy, and (2) how to use confidence-guided progressive search and fast match techniques to achieve the high recognition efficiency. In this paper, we present two more techniques that help reduce search errors and improve the robustness of our character recognizer. They are (1) to use MCE-trained character-pair models to avoid error-prone character-level segmentation for some trouble cases, and (2) to perform a MCE-based negative training to improve the rejection capability of the recognition models on the hypothesized garbage images during recognition process. The efficacy of the proposed techniques is confirmed by experiments in a benchmark test.