By Topic

Learning a hierarchy of discriminative space-time neighborhood features for human action recognition

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Adriana Kovashka ; Department of Computer Science, University of Texas at Austin ; Kristen Grauman

Recent work shows how to use local spatio-temporal features to learn models of realistic human actions from video. However, existing methods typically rely on a predefined spatial binning of the local descriptors to impose spatial information beyond a pure “bag-of-words” model, and thus may fail to capture the most informative space-time relationships. We propose to learn the shapes of space-time feature neighborhoods that are most discriminative for a given action category. Given a set of training videos, our method first extracts local motion and appearance features, quantizes them to a visual vocabulary, and then forms candidate neighborhoods consisting of the words associated with nearby points and their orientation with respect to the central interest point. Rather than dictate a particular scaling of the spatial and temporal dimensions to determine which points are near, we show how to learn the class-specific distance functions that form the most informative configurations. Descriptors for these variable-sized neighborhoods are then recursively mapped to higher-level vocabularies, producing a hierarchy of space-time configurations at successively broader scales. Our approach yields state-of-the-art performance on the UCF Sports and KTH datasets.

Published in:

Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on

Date of Conference:

13-18 June 2010