Loading [MathJax]/extensions/MathZoom.js
A Unified Deep Framework for Hand Pose Estimation and Dynamic Hand Action Recognition from First-Person RGB Videos | IEEE Conference Publication | IEEE Xplore

A Unified Deep Framework for Hand Pose Estimation and Dynamic Hand Action Recognition from First-Person RGB Videos


Abstract:

Understanding hand action from the first-person video has emerged recently thanks to its wide potential applications such as hand rehabilitation, augmented reality. The m...Show More

Abstract:

Understanding hand action from the first-person video has emerged recently thanks to its wide potential applications such as hand rehabilitation, augmented reality. The majority of works mainly reply on RGB images. Compared with RGB images, hand joints have certain advantages as they are robust to illuminations and appearance variation. However, previous works for hand action recognition usually employed hand joints that are manually determined. This paper presents a unified framework for both hand pose estimation and hand action recognition from first-person RGB images. First, our framework estimates 3D hand joints from every RGB image using a combination of Resnet and a Graphical convolutional network. Then, an adaptation of a SOTA method PA-ResGCN for the human skeleton is proposed for hand action recognition from estimated hand joints. Our framework takes advantage of efficient graphical networks to model graph-like human hand structure in both phases: hand pose estimation and hand action recognition. We evaluate the proposed framework on the First Person Hand Action Benchmark (FPHAB). The experiments show that the proposed framework outperforms different SOTA methods on both hand pose estimation and hand action recognition tasks.
Date of Conference: 15-16 October 2021
Date Added to IEEE Xplore: 29 October 2021
ISBN Information:
Conference Location: Hanoi, Vietnam

Funding Agency:


I. Introduction

Hand is one of the most crucial means that humans use to interact with the world. Hence, the task of estimating human hand pose as well as understanding hand action from images (or video) play an important role in the field of computer vision. There are many applications for these tasks ranging such as smart home devices controlling [1], rehabilitation assessment in medicine [2].

References

References is not available for this document.