Clustering support vector machines (CSVM) is proposed in this paper for unlabeled data classification. It is often for us to deal with a large number of data which are wholly unlabeled, e.g., classifying them, and it is impractical for us to label these data manually. Clustering algorithms can be used to generate labels for this kind of data. The global k-means clustering algorithm, the fast global k-means algorithm and another global k-means clustering algorithm using k-d trees are combined respectively with the statistical method F-distribution in our paper to generate labels for those wholly unlabeled data, and then the labeled data are trained with SVM for classification. Our proposed approach (CSVM) is tested on four different synthetically generated data sets, which was wholly unlabeled. The experiment results show that our CSVM is efficient to classify the wholly unlabeled data.
Published in:
Test and Measurement, 2009. ICTM '09. International Conference on
(Volume:2
)
Date of Conference: 5-6 Dec. 2009