There are document image retrieval methods that are robust to character recognition errors. Some of them tolerate recognition errors by having multiple candidates for a character image, but they are intolerant of segmentation errors of characters. In addition, these methods cannot retrieve documents that do not contain the correct character code. We propose a method that overcomes these problems. This method uses multiple candidates and “shape-feature” which describes the outline of the character shape for uncertain characters. Documents are retrieved using both “shape-feature” and multiple candidate techniques. Our experimental results reveal that the method has a high recall rate compared with that of conventional methods
Published in:
Document Analysis and Recognition, 1999. ICDAR '99. Proceedings of the Fifth International Conference on
Date of Conference: 20-22 Sep 1999