Skip to Main Content
With the growth of on-line handwriting technologies, managing facilities for handwritten documents, such as retrieval of documents by topic, are required. These documents can contain graphics, equations or text for instance. This work reports experiments on categorization of on-line handwritten documents based on their textual contents. We assume that handwritten text blocks have been extracted from the documents, and as a first step of the proposed system, we process them with an existing handwritten recognition engine. We analyse the effect of the word recognition rate on the categorization performances, and we compare them with those obtained with the same texts available as ground truth. Two categorization algorithms (kNN and SVM) are compared in this work. The handwritten texts are a subset of the Reuters-21578 corpus collected from more than 1500 writers. Results show that there is no significant categorization performance loss when the word error rate stands below 22%.
Date of Conference: 16-19 Sept. 2008