1. Introduction
In the past few years, human action recognition has become an active area of research, due to its wide applications, ranging from surveillance to human-computer interaction and virtual reality. Human pose, also known as skeleton, can be used as a kind of data modality for action recognition. Unlike RGB video, human skeleton sequences can provide very effective information only with a limited amount of data. [9] first verified the validity of skeletal sequence on discriminant actions from a biological perspective. Now there are many devices can directly provide solutions for real-time skeleton sequence output. Intel RealSense [11] and Microsoft Kinect [36] are the most commonly used. The popularity of these devices has greatly enhanced the utility of skeleton-based action recognition.