I. Introduction
Recently, with the development of depth sensors such as Nintendo Wii, Microsoft Kinect and PlayStation Move controllers, depth information can be readily obtained. This has facilitated a new trend in research on 3D action recognition. In fact, the early work of Johansson et al. [1] has suggested that the motion of human skeleton is discriminative enough to be used for identifying different human gestures. In particular, Shotton et al. [2] proposed a method to estimate joints from depth maps and provide the 3D positions of joints, from which discriminative features are extracted to describe the motion of human skeleton. Based on this method, much work has been conducted focusing on 3D human action recognition using depth maps [3]–[13]. Those methods are driven by high recognition accuracy and some of them need to access the entire observation data stream for reliable recognition. However, most applications of depth sensors are oriented for interaction systems such as human-computer interaction, electronic entertainment, and smart houses technologies, which usually require prompt responses after user actions for system control. That is to say, we need to build a low latency system to recognize human actions. Here, we discuss low latency from two aspects, i.e., the computational latency and the observational latency. Different from computational latency which is influenced by the performance of computers, the observational latency can be caused by the algorithm itself if the recognition system needs to access too much data stream. If high latency exists, it may cause system lag and thus not only significantly decreases the interactivity of user experiences, but also makes these certain interaction systems unattractive. Therefore, the success of these technologies requires flexible algorithms which satisfy the two fundamental properties, high recognition accuracy and low latency. Only a few systems paid attention to the observational latency and spent efforts identifying the action accurately long before it ends ([3], [9], [14]).