Interactive Definition and Tuning of One-Class Classifiers for Document Image Classification | IEEE Conference Publication | IEEE Xplore

Interactive Definition and Tuning of One-Class Classifiers for Document Image Classification


Abstract:

With mass of data, document image classification systems have to face new trends like being able to process heterogeneous data streams efficiently. Generally, when proces...Show More

Abstract:

With mass of data, document image classification systems have to face new trends like being able to process heterogeneous data streams efficiently. Generally, when processing data streams, few knowledge is available about the content of the possible streams. Furthermore, as getting labelled data is costly, the classification model has to be learned from few available labelled examples. To handle such specific context, we think that combining one-class classifiers could be a very interesting alternative to quickly define and tune classification systems dedicated to different document streams. The main interest of one-class classifiers is that no interdependence occurs between each classifier model allowing easy removal, addition or modification of classes of documents. Such reconfiguration will not have any impact on the other classifiers. It is also noticeable that each classifier can use a different set of features compared to the other to handle the same class or even different classes. In return, as only one class is well-specified during the learning step, one-class classifiers have to be defined carefully to obtain good performances. It is more difficult to select the representative training examples and the discriminative features with only positive examples. To overcome these difficulties, we have defined a complete framework offering different methods that can help a system designer to define and tune one-class classifier models. The aims are to make easier the selection of good training examples and of suitable features depending on the class to recognize into the document stream. For that purpose, the proposed methods compute different measures to evaluate the relevance of the available features and training examples. Moreover, a visualization of the decision space according to selected examples and features is proposed to help such a choice and, an automatic tuning is proposed for the parameters of the models according to the class to recognize when a val...
Date of Conference: 11-14 April 2016
Date Added to IEEE Xplore: 13 June 2016
Electronic ISBN:978-1-5090-1792-8
Conference Location: Santorini, Greece
References is not available for this document.

I. Introduction

Since several years companies are interested in document dematerialization process, for different reasons like sharing information, ecological purposes as well as space storage reduction. A typical example of dematerialization procedure is the digitization of administrative documents to facilitate the processing and indexing of received mail streams for example. In this context, this article focuses on facilitating the creation, adaptation and tuning of document image classification (DIC) methods.

Select All
1.
S. S. Khan and M. G. Madden, “One-class classification: taxonomy of study and review of techniques,” The Knowledge Engineering Review, vol. 29, pp. 345–374, 6 2014.
2.
B. Schölkopf, J. C. Platt, J. C. Shawe-Taylor, A. J. Smola, and R. C. Williamson, “Estimating the support of a high-dimensional distribution,” Neural Computation, vol. 13, no. 7, pp. 1443–1471, Jul. 2001.
3.
D. M. J. Tax and R. P. W. Duin, “Support vector data description,” Mach. Learn., vol. 54, no. 1, pp. 45–66, Jan. 2004.
4.
D. Tax, “One-class classification: Concept learning in the absence of counter-examples,” Ph.D. dissertation, Technische Universiteit Delft, 2001.
5.
M. Kemmler, E. Rodner, E.-S. Wacker, and J. Denzler, “One-class classification with gaussian processes,” Pattern Recognition, vol. 46, no. 12, pp. 3507–3518, 2013.
6.
T. Lane, “Hidden markov models for humancomputer interface modeling,” in Proceedings of IJCAI-99 Workshop on Learning About Users, 1999, pp. 35–44.
7.
D. Tax and R. Duin, “Data description in subspaces,” in Pattern Recognition, 2000. Proceedings. 15th International Conference on, vol. 2, 2000, pp. 672–675 vol. 2.
8.
I. Guyon and A. Elisseeff, “An introduction to variable and feature selection,” Journal of Machine Learning Research, vol. 3, pp. 1157–1182, Mar. 2003.
9.
A. L. Blum and P. Langley, “Selection of relevant features and examples in machine learning,” Artificial Intelligence, vol. 97, no. 1-2, pp. 245–271, Dec. 1997.
10.
L. Lorena, A. Carvalho, and A. Lorena, “Filter feature selection for one-class classification,” Journal of Intelligent and Robotic Systems, pp. 1–17, 2014.
11.
P. Mitra, C. Murthy, and S. Pal, “Unsupervised feature selection using feature similarity,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 24, no. 3, pp. 301–312, Mar 2002.
12.
Y.-S. Jeong, I.-H. Kang, M.-K. Jeong, and D. Kong, “A new feature selection method for one-class classification problems,” Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, vol. 42, no. 6, pp. 1500–1509, Nov 2012.
13.
K. Bache and M. Lichman, “UCI machine learning repository,” 2013. [Online]. Available: http://archive.ics.uci.edu/ml
14.
P. R. Laboratory, “One-class data sets.” [Online]. Available: http://homepage.tudelft.nl/n9d04/occ/index.html
15.
B. Matthews, “Comparison of the predicted and observed secondary structure of {T4} phage lysozyme,” Biochimica et Biophysica Acta (BBA) - Protein Structure, vol. 405, no. 2, pp. 442–451, 1975.
16.
F. Esposito, D. Malerba, and F. A. Lisi, “Machine learning for intelligent processing of printed documents,” J. Intell. Inf. Syst., vol. 14, no. 2-3, pp. 175–198, Mar. 2000.
17.
M. Diligenti, P. Frasconi, and M. Gori, “Hidden tree markov models for document image classification,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 25, no. 4, pp. 519–523, April 2003.
18.
J. Liang, D. Doermann, M. Ma, and J. Guo, “Page classification through logical labelling,” in Pattern Recognition, 2002. Proceedings. 16th International Conference on, vol. 3, 2002, pp. 477–480 vol. 3.
19.
V. Eglin and S. Bres, “Document page similarity based on layout visual saliency: application to query by example and document classification,” in Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on, Aug 2003, pp. 1208–1212.
20.
H. Ogata, S. Watanabe, A. Imaizumi, T. Yasue, N. Furukawa, H. Sato, and H. Fujisawa, “Form type identification for banking applications and its implementation issues,” in Proc. SPIE, vol. 5010, 2003, pp. 208–218.
21.
H. Sako, M. Seki, N. Furukawa, H. Ikeda, and A. Imaizumi, “Form reading based on form-type identification and form-data recognition,” in Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on, Aug 2003, pp. 926–930.
22.
Y. Byun and Y. Lee, “Form classification using dp matching,” in Proceedings of the 2000 ACM Symposium on Applied Computing - Volume 1, ser. SAC ‘00. New York, NY, USA : ACM, 2000, pp. 1–4.
23.
C. Shin, D. Doermann, and A. Rosenfeld, “Classification of document pages using structure-based features,” International Journal on Document Analysis and Recognition, vol. 3, no. 4, pp. 232–247, 2001.

Contact IEEE to Subscribe

References

References is not available for this document.