Cart (Loading....) | Create Account
Close category search window

Spatio-Temporal Frames in a Bag-of-Visual-Features Approach for Human Actions Recognition

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Lopes, A. ; Comput. Sci. Dept., Fed. Univ. of Minas Gerais - UFMG, Belo Horizonte, Brazil ; Oliveira, R.S. ; de Almeida, J.M. ; De A Araujo, A.

The recognition of human actions from videos has several interesting and important applications, and a vast amount of different approaches has been proposed for this task in different settings. Such approaches can be broadly categorized in model-based and model-free. Typically, model-based approaches work only in very constrained settings, and because of that, a number of model-free approaches appeared in the last years. Among them, those based in bag-of-visual-features (BoVF) have been proving to be the most consistently successful, being used by several independent authors. For videos to be represented by BoVFs, though, an important issue that arises is how to represent dynamic information. Most existing proposals consider the video as a spatio-temporal volume and then describe "volumetric patches" around 3D interest points. In this work, we propose to build a BoVF representation for videos by collecting 2D interest points directly. The basic idea is to gather such points not only from the traditional frames (xy planes), but also from those planes along the time axis, which we call the spatio-temporal frames. Our assumption is that such features are able to capture dynamic information from the videos, and are therefore well-suited to recognize human actions from them, without the need of 3D extentions for the descriptors. In our experiments, this approach achieved state-of-the-art recognition rates on a well-known human actions database, even when compared to more sophisticated schemes.

Published in:

Computer Graphics and Image Processing (SIBGRAPI), 2009 XXII Brazilian Symposium on

Date of Conference:

11-15 Oct. 2009

Need Help?

IEEE Advancing Technology for Humanity About IEEE Xplore | Contact | Help | Terms of Use | Nondiscrimination Policy | Site Map | Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest professional association for the advancement of technology.
© Copyright 2014 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.