Linear discriminant analysis(LDA) is a traditional dimension reduction method which finds projective directions to maximize separability between classes. However, when the number of labeled data points is small, the performance of LDA is degraded severely. In this paper, we propose two improved methods for LDA which utilizes abundant unlabeled data. Instead of using all the unlabeled data as in most of semi-supervised dimension reduction methods, we select confident unlabeled data and develop extended LDA algorithms. In the first method, a graph-based LDA method is developed to utilize confidence scores for chosen unlabeled data so that unlabeled data with a low confidence score contributes smaller than unlabeled data with a high confidence score. In the second method, selected unlabeled data points are used to modify the centroids of classes in an objective function of LDA. Extensive experimental results in text classification demonstrates the effectiveness of the proposed methods compared with other semi-supervised dimension reduction methods.
Published in:
Computer and Information Technology (CIT), 2011 IEEE 11th International Conference on
Date of Conference: Aug. 31 2011-Sept. 2 2011