Skip to Main Content
A new method is described and tested for using an unreliable character recognition device to produce a reliable index for a collection of documents. All highly likely substitution errors of the recognition device are handled by transforming characters which confuse readily into the same pseudocharacter. An analysis of the method is done showing the expected precision (fraction of words correctly found to words present) and recall (fraction of words retrieved properly to those which were retrieved). Published substitution error matrices were employed, along with a large file of words and word frequencies to evaluate the method. Performance was surprisingly good. Suggestions for further enhancements are given.