Skip to Main Content
In many classification tasks, the use of expert-labeled data for training is often prohibitively expensive. The use of weakly-labeled data is an attractive solution but raises the problem of label noise. Multiple instance learning, whereby training samples are “bagged” instead of treated as singletons, offers a possible approach to mitigating the effects of label noise. In this paper, we propose the use of MILBoost  in a large-scale video taxonomic classification system comprised of hundreds of binary classifiers to handle noisy training data. We test on data with both artificial and real-world noise and compare against the state-of-the-art classifiers based on AdaBoost. We also explore the effects of different bag sizes on different levels of noise on the final classifier performance. Experiments show that when training classifiers with noisy data, MILBoost provides an improvement in performance.