Skip to Main Content
In this paper we describe a model for classifying binary data using classifiers based on Bernoulli mixture models. We show how Bernoulli mixtures can be used for feature extraction and dimensionality reduction of raw input data. The extracted features are then used for training a classifier for supervised labeling of individual sample points. We have applied this method to two different types of datasets, i.e., one from the text mining domain and one from the handwriting recognition area. Empirical experiments demonstrate that we can obtain up to 99.9% reduction in the dimensionality of the original feature set for sparse binary features. Classification accuracy also increases considerably when the combined model is used. This paper compares the performance of different classification algorithms when used in conjunction with the new feature set generated by Bernoulli mixtures. Using this hybrid model of learning we have achieved one of the best accuracy rates on the NOVA and GINA datasets of the dasiaagnostic vs. prior knowledgepsila competition held by the International Joint Conference on Neural Networks in 2007.