Skip to Main Content
We study the visual learning models that could work efficiently with little ground-truth annotation and a mass of noisy unlabeled data for large scale Web image applications, following the subroutine of semi-supervised learning (SSL) that has been deeply investigated in various visual classification tasks. However, most previous SSL approaches are not able to incorporate multiple descriptions for enhancing the model capacity. Furthermore, sample selection on unlabeled data was not advocated in previous studies, which may lead to unpredictable risk brought by real-world noisy data corpse. We propose a learning strategy for solving these two problems. As a core contribution, we propose a scalable semi-supervised multiple kernel learning method (S3MKL) to deal with the first problem. The aim is to minimize an overall objective function composed of log-likelihood empirical loss, conditional expectation consensus (CEC) on the unlabeled data and group LASSO regularization on model coefficients. We further adapt CEC into a group-wise formulation so as to better deal with the intrinsic visual property of real-world images. We propose a fast block coordinate gradient descent method with several acceleration techniques for model solution. Compared with previous approaches, our model better makes use of large scale unlabeled images with multiple feature representation with lower time complexity. Moreover, to address the issue of reducing the risk of using unlabeled data, we design a multiple kernel hashing scheme to identify the “informative” and “compact” unlabeled training data subset. Comprehensive experiments are conducted and the results show that the proposed learning framework provides promising power for real-world image applications, such as image categorization and personalized Web image re-ranking with very little user interaction.