Loading [MathJax]/extensions/MathMenu.js
DeepVI: A Novel Framework for Learning Deep View-Invariant Human Action Representations using a Single RGB Camera | IEEE Conference Publication | IEEE Xplore

DeepVI: A Novel Framework for Learning Deep View-Invariant Human Action Representations using a Single RGB Camera


Abstract:

In this paper, we address the problem of cross-view action recognition from a monocular RGB camera. This topic has been considered extremely challenging due to the lack o...Show More

Abstract:

In this paper, we address the problem of cross-view action recognition from a monocular RGB camera. This topic has been considered extremely challenging due to the lack of 3D information in 2D images. Exploiting the advances in 3D pose estimation from a single RGB camera, we propose a new framework termed DeepVI, for cross-view action recognition without the need for pose alignment. Virtual viewpoints are used to augment the variability of training data along with the use of an end-to-end Deep Neural Network (DNN). The proposed network is composed of two modules. The first one, called SmoothNet, implicitly smooths skeleton joint trajectories using revisited temporal convolution in order to reduce the noise in the estimated 3D skeletons. The second module consists of a state-of-the-art approach designed for action recognition based on Spatial Temporal Graph Convolutional Networks (ST-GCN [40]). Experiments have been conducted in cross-view settings on two datasets, namely, NTU RGB-D and Northwestern-UCLA. The obtained results show the effectiveness of the proposed framework.
Date of Conference: 16-20 November 2020
Date Added to IEEE Xplore: 18 January 2021
ISBN Information:
Conference Location: Buenos Aires, Argentina

Contact IEEE to Subscribe

References

References is not available for this document.