Skip to Main Content
The goal of semi-supervised learning (SSL) methods is to reduce the amount of labeled training data required by learning from both labeled and unlabeled instances. Macskassy and Provost (2007) proposed the weighted-vote relational neighbor classifier (wvRN) as a simple yet effective baseline for semi-supervised learning on network data. It is similar to many recent graph-based SSL methods and is shown to be essentially the same as the Gaussian-field harmonic functions classifier proposed by Zhu et al. (2003) and proves to be very effective on some benchmark network datasets. We describe another simple and intuitive semi-supervised learning method based on random graph walk that outperforms wvRN by a large margin on several benchmark datasets when very few labels are available. Additionally, we show that using authoritative instances as training seeds --- instances that arguably cost much less to label --- dramatically reduces the amount of labeled data required to achieve the same classification accuracy. For some existing state-of-the-art semi-supervised learning methods the labeled data needed is reduced by a factor of 50.