Skip to Main Content
Domain adaptation is the process of transferring the knowledge to a different domain from a source domain but they are related. In this paper, we first apply `Consensus Regularization' based algorithm to merge multiple source domain to a single source domain. Then we propose multi-domain adaptation in document clustering using Seeds affinity propagation and Consensus Regularization Algorithm. A semi-supervised document clustering algorithm, called Seeds Affinity Propagation (SAP) is applied based on an effective clustering algorithm Affinity Propagation (AP). The labeled and unlabeled documents are preprocessed through various processes such as stop words removal, word stemming and finding word frequency and given as the input. After pre-processing, structured documents are obtained. Tri-set Computation, a feature extraction technique is used to find out the features through Co-feature set, Unilateral feature set and Significant Co-feature set methods. Then calculate the similarity measure of the documents and assigning the label to the documents if they are matched. Finally clustered documents are obtained through seeds affinity propagation via similarity measurement. Further the performance of the algorithm can be evaluated and improved.
Date of Conference: 19-21 April 2012