Abstract:
Action recognition is an important precursor for understanding human activities in videos. The current paradigm of action recognition is to classify a video sequence as a...Show MoreMetadata
Abstract:
Action recognition is an important precursor for understanding human activities in videos. The current paradigm of action recognition is to classify a video sequence as a whole. However, actions usually occur only in part of a video sequence, rendering the rest of the video irrelevant for action recognition. In this paper, we propose a method for learning a subsequence classifier which can detect and classify part of a video that corresponds to the action. The subsequence classifier is trained from weakly labeled training videos whose subsequence labels are not provided, but need to be inferred during learning. We use the framework of multiple instance learning to solve two problems jointly: i) find the action subsequences in training videos, ii) train the subsequence classifier using the inferred action subsequences. To obtain a robust solution to the MIL problem, we propose a sequential algorithm that consecutively decreases the number of inferred action subsequences per video and trims their length until only one short subsequence is used as the action representative in each video. We evaluate the combination of the automatically trained subsequence classifier and the full sequence classifier on the very challenging Hollywood2 benchmark set and observe a significant gain in the performance over the baseline full sequence classifier. Moreover, a favorable performance of the subsequence classifier for temporal localization of actions in videos is evidenced on two categories of the Hollywood2 dataset.
Date of Conference: 02-08 December 2013
Date Added to IEEE Xplore: 06 March 2014
Electronic ISBN:978-1-4799-3022-7