Skip to Main Content
This paper originally proposes a three-setp algorithm. First, CoTraining is employed for filtering out the likely positive data from the unlabeled dataset U. Second, we got vectors of documents in positive set using semantic-based feature extraction, then found the strong positive from likely positive set which is produced in first step. Those data picked out can be supplied to positive dataset P. Finally, a linear one-class SVM will learn from both the purified U as negative and the expanded P as positive. Because of the algorithm's characteristic of automatic expanding positive dataset, the proposed algorithm especially performs well in situations where given positive dataset P is insufficient. A comprehensive experiment had proved that our algorithm is preferable to the existing ones.