Abstract:
In this paper, we address the problem of cross-view action recognition from a monocular RGB camera. This topic has been considered extremely challenging due to the lack o...Show MoreMetadata
Abstract:
In this paper, we address the problem of cross-view action recognition from a monocular RGB camera. This topic has been considered extremely challenging due to the lack of 3D information in 2D images. Exploiting the advances in 3D pose estimation from a single RGB camera, we propose a new framework termed DeepVI, for cross-view action recognition without the need for pose alignment. Virtual viewpoints are used to augment the variability of training data along with the use of an end-to-end Deep Neural Network (DNN). The proposed network is composed of two modules. The first one, called SmoothNet, implicitly smooths skeleton joint trajectories using revisited temporal convolution in order to reduce the noise in the estimated 3D skeletons. The second module consists of a state-of-the-art approach designed for action recognition based on Spatial Temporal Graph Convolutional Networks (ST-GCN [40]). Experiments have been conducted in cross-view settings on two datasets, namely, NTU RGB-D and Northwestern-UCLA. The obtained results show the effectiveness of the proposed framework.
Published in: 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020)
Date of Conference: 16-20 November 2020
Date Added to IEEE Xplore: 18 January 2021
ISBN Information: