Skip to Main Content
Clustering technique is widely used in data mining like gene-microarray analysis and natural language processing. When there are sufficient data samples and good representations, traditional clustering algorithms such as K-means can work well. But when the number of samples is small and the data representation is bad, direct use of clustering may yield bad results. In this paper we propose a new algorithm TCTC(Topic-Constraint Transfer Clustering), which is an instance of unsupervised transfer learning, to cluster a small number of unlabeled data with the help of sufficient and better represented auxiliary data. First several latent topics are extracted from the clusters of the auxiliary data. Then the affinities between target data samples and topics are discovered to “guide” the disseminated data clustering. Finally semi-supervised clustering algorithm is applied on target data. The experiments demonstrate our method is quite effective to solve the problem of disseminated and ill-presented data clustering.