Skip to Main Content
A fast approach to Chinese document image filtering is presented. Garbage models are built by keyword clustering prior to keyword searching. The retrieval process is accelerated by the Boyer-Moore algorithm. A character is classified as accepted or rejected by the distance from the garbage models. A confidence measure ensures precision. Document vectors are built, based on keyword spotting from the document image. We obtain the score of the document image by means of a vector space model. Experimental results confirmed the robustness of the proposed approach over a wide range of degradations.