Loading [MathJax]/extensions/MathMenu.js
Global Co-occurrence Feature Learning and Active Coordinate System Conversion for Skeleton-based Action Recognition | IEEE Conference Publication | IEEE Xplore

Global Co-occurrence Feature Learning and Active Coordinate System Conversion for Skeleton-based Action Recognition


Abstract:

Skeleton-based action recognition has attracted more and more attention in recent years. Besides, the rapid development of deep learning has greatly improved the performa...Show More

Abstract:

Skeleton-based action recognition has attracted more and more attention in recent years. Besides, the rapid development of deep learning has greatly improved the performance. However, the current exploration of action co-occurrence is still not comprehensive enough. Most existing works only mine co-occurrence features from the temporal or spatial domain seperately, and it's common to combine them in the end. Different from previous works, our approach is able to learn temporal and spatial co-occurrence features integratedly and globally, which is called spatio-temporal-unit feature enhancement (STUFE). In order to better align the skeleton data, we introduce a novel method for skeleton data preprocessing called active coordinate system conversion (ACSC). A coordinate system can be learned automatically to transform skeleton samples for alignment. By the way, the proposed methods are compatible with current two types of mainstream models, the CNN-based and GCN-based models. Finally, on the two benchmarks of NTU-RGB+D and SBU Kinect Interaction, we validated our methods based on two mainstream models. The results show that our methods achieve the state-of-the-art.
Date of Conference: 01-05 March 2020
Date Added to IEEE Xplore: 14 May 2020
ISBN Information:

ISSN Information:

Conference Location: Snowmass, CO, USA

1. Introduction

In the past few years, human action recognition has become an active area of research, due to its wide applications, ranging from surveillance to human-computer interaction and virtual reality. Human pose, also known as skeleton, can be used as a kind of data modality for action recognition. Unlike RGB video, human skeleton sequences can provide very effective information only with a limited amount of data. [9] first verified the validity of skeletal sequence on discriminant actions from a biological perspective. Now there are many devices can directly provide solutions for real-time skeleton sequence output. Intel RealSense [11] and Microsoft Kinect [36] are the most commonly used. The popularity of these devices has greatly enhanced the utility of skeleton-based action recognition.

Contact IEEE to Subscribe

References

References is not available for this document.