Skip to Main Content
In this paper we propose a novel approach for introducing semantic relations into the bag-of-words framework for recognizing human actions. We represent visual words in two different views: the original features and the document co-occurrence representation. The latter view conveys semantic relations but is large, sparse and noisy. We use canonical correlation analysis between the two views to find a subspace in which the words are more semantically distributed. We apply k-means clustering in the computed space to find semantically meaningful clusters and use them as the semantic visual vocabulary. Incorporating the semantic visual vocabulary the features are quantized to form more discriminative histograms. Eventually the histograms are classified using an SVM classifier. We have tested our approach on KTH action dataset and achieved promising results.