Loading [MathJax]/extensions/MathMenu.js
BDNet: A Method-based Forward and Backward Convolutional Networks for Action Recognition in Videos | IEEE Conference Publication | IEEE Xplore

BDNet: A Method-based Forward and Backward Convolutional Networks for Action Recognition in Videos


Abstract:

Human action recognition analyses the behaviour in a scene according to the spatial-temporal features carried in a series of image sequences. The critical challenge is to...Show More

Abstract:

Human action recognition analyses the behaviour in a scene according to the spatial-temporal features carried in a series of image sequences. The critical challenge is to extract informative spatial-temporal features in a limited-length video that frequently constrains the receptive field of the 3D Convolutional Neural Network(CNN). However, present methods mainly consider modeling the action’s spatial-temporal features along a single direction and ignore the information in the opposite. Moreover, the fixed-weight fusion of spatial and temporal features does not distinguish their importance for each action sequence. To attack the problems above, we propose a bi-directional network (BDNet) to combinate the features from both directions of action for recognizing action. Two CNNs are set up to extract spatial-temporal features along the forward and backward action, respectively. Then, a dynamic fusion strategy is adopted to measure the importance of spatial and temporal features for each action. We conducted many experiments on the commonly used action recognition dataset UCF101. Compared with other work, the proposed method achieves promising performance in accuracy and efficiency.
Date of Conference: 20-22 May 2023
Date Added to IEEE Xplore: 01 December 2023
ISBN Information:

ISSN Information:

Conference Location: Yichang, China

Funding Agency:


I. Introduction

The rapid development of imaging equipment results in massive video generation, presenting a requirement to analyze human action in videos for searching, ranking, and intelligent recommendation tasks. The primary action recognition methods can be categorized as deep learning and hand-crafted feature methods. In the last decade, depth learning methods have been widely used in human action recognition because they can automatically extract spatiotemporal features from image sequences and significantly improve recognition accuracy compared with traditional methods. The commonly used depth network structures include Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Graph Convolution Networks (GCNs), among which CNNs are widely used in action recognition because they can directly extract features from images.

Contact IEEE to Subscribe

References

References is not available for this document.