By Topic

Realistic Human Action Recognition With Multimodal Feature Selection and Fusion

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

5 Author(s)
Qiuxia Wu ; College of Automation Science and Engineering, South China University of Technology , Guangzhou, China ; Zhiyong Wang ; Feiqi Deng ; Zheru Chi
more authors

Although promising results have been achieved for human action recognition under well-controlled conditions, it is very challenging to recognize human actions in realistic scenarios due to increased difficulties such as dynamic backgrounds. In this paper, we propose to take multimodal (i.e., audiovisual) characteristics of realistic human action videos into account in human action recognition for the first time, since, in realistic scenarios, audio signals accompanying an action generally provide a cue to the nature of the action, such as phone ringing to answering the phone . In order to cope with diverse audio cues of an action in realistic scenarios, we propose to identify effective features from a large number of audio features with the generalized multiple kernel learning algorithm. The widely used space-time interest point descriptors are utilized as visual features, and a support vector machine is employed for both audio- and video-based classifications. At the final stage, fuzzy integral is utilized to fuse recognition results of both audio and visual modalities. Experimental results on the challenging Hollywood-2 Human Action data set demonstrate that the proposed approach is able to achieve better recognition performance improvement than that of integrating scene context. It is also discovered how audio context influences realistic action recognition from our comprehensive experiments.

Published in:

IEEE Transactions on Systems, Man, and Cybernetics: Systems  (Volume:43 ,  Issue: 4 )