By Topic

Comparing alternatives for capturing dynamic information in Bag-of-Visual-Features approaches applied to human actions recognition

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Lopes, A. ; Exact & Technol. Sci. Dept., State Univ. of Santa Cruz, Santa Cruz, Brazil ; Oliveira, R.S. ; de Almeida, J.M. ; de A Araujo, A.

Bag-of-Visual-Features (BoVF) representations have achieved a great success when used for object recognition, mainly because of their robustness to several kinds of variations and occlusion. Recently, a number of BoVF approaches has been proposed also for recognition of human actions from videos. One important issue that arises when using BoVF for videos is how to take dynamic information into account, and most proposals rely on 3D extensions of 2D visual descriptors for this. However, we envision alternative approaches based on 2D descriptors applied to the spatio-temporal video planes, instead of to the traditionally explored by previous work. Thus, in this paper, we address the following question: what is the cost-effectiveness of a BoVF approach built from such 2D descriptors when compared to one based on the state-of-the-art 3D Spatio-Temporal Interest Points (STIPs) descriptor? We evaluate the recognition rate and time complexity of alternative 2D descriptors applied to different sets of spatio-temporal planes, and the state-of-the-art STIPs. Experimental results show that, with proper settings, 2D descriptors can yield the same recognition results as those provided by STIP, but at a significantly higher time complexity.

Published in:

Multimedia Signal Processing, 2009. MMSP '09. IEEE International Workshop on

Date of Conference:

5-7 Oct. 2009