Skip to Main Content
Most learning-based video semantic analysis methods require a large training set to achieve good performances. However, annotating a large video is laborintensive. This paper introduces how to construct the training set and reduce user involvement. There are four selection schemes proposed: clustering-based, spatial dispersiveness, temporal dispersiveness, and sample-based which can be used construct a small size and effective training set. If the selected training data can represent the characteristic of the whole video data, the classification performance will be better even when the size of the training set is smaller than that of the whole video data. To verify the best selected samples for training a semantic model, we use SVM to classify the category of the test samples. Five different categories: person, landscape, cityscape, map and others are tested. Experimental results show that these methods are effective for training set selection in video annotation, and outperform random selection.
Date of Conference: 14-17 Nov. 2010