Loading [MathJax]/extensions/MathZoom.js
Effective Active Skeleton Representation for Low Latency Human Action Recognition | IEEE Journals & Magazine | IEEE Xplore

Effective Active Skeleton Representation for Low Latency Human Action Recognition


Abstract:

With the development of depth sensors, low latency 3D human action recognition has become increasingly important in various interaction systems, where response with minim...Show More

Abstract:

With the development of depth sensors, low latency 3D human action recognition has become increasingly important in various interaction systems, where response with minimal latency is a critical process. High latency not only significantly degrades the interaction experience of users, but also makes certain interaction systems, e.g., gesture control or electronic gaming, unattractive. In this paper, we propose a novel active skeleton representation towards low latency human action recognition . First, we encode each limb of the human skeleton into a state through a Markov random field. The active skeleton is then represented by aggregating the encoded features of individual limbs. Finally, we propose a multi-channel multiple instance learning with maximum-pattern-margin to further boost the performance of the existing model. Our method is robust in calculating features related to joint positions, and effective in handling the unsegmented sequences. Experiments on the MSR Action3D, the MSR DailyActivity3D, and the Huawei/3DLife-2013 dataset demonstrate the effectiveness of the model with the proposed novel representation, and its superiority over the state-of-the-art low latency recognition approaches.
Published in: IEEE Transactions on Multimedia ( Volume: 18, Issue: 2, February 2016)
Page(s): 141 - 154
Date of Publication: 03 December 2015

ISSN Information:

References is not available for this document.

I. Introduction

Recently, with the development of depth sensors such as Nintendo Wii, Microsoft Kinect and PlayStation Move controllers, depth information can be readily obtained. This has facilitated a new trend in research on 3D action recognition. In fact, the early work of Johansson et al. [1] has suggested that the motion of human skeleton is discriminative enough to be used for identifying different human gestures. In particular, Shotton et al. [2] proposed a method to estimate joints from depth maps and provide the 3D positions of joints, from which discriminative features are extracted to describe the motion of human skeleton. Based on this method, much work has been conducted focusing on 3D human action recognition using depth maps [3]–[13]. Those methods are driven by high recognition accuracy and some of them need to access the entire observation data stream for reliable recognition. However, most applications of depth sensors are oriented for interaction systems such as human-computer interaction, electronic entertainment, and smart houses technologies, which usually require prompt responses after user actions for system control. That is to say, we need to build a low latency system to recognize human actions. Here, we discuss low latency from two aspects, i.e., the computational latency and the observational latency. Different from computational latency which is influenced by the performance of computers, the observational latency can be caused by the algorithm itself if the recognition system needs to access too much data stream. If high latency exists, it may cause system lag and thus not only significantly decreases the interactivity of user experiences, but also makes these certain interaction systems unattractive. Therefore, the success of these technologies requires flexible algorithms which satisfy the two fundamental properties, high recognition accuracy and low latency. Only a few systems paid attention to the observational latency and spent efforts identifying the action accurately long before it ends ([3], [9], [14]).

Select All
1.
G. Johansson, "Visual motion perception", Sci. Amer., vol. 232, no. 6, pp. 76-88, 1975.
2.
J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, et al., "Real-time human poserecognition in parts from single depth images", Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 1297-1304, 2011-Jun.
3.
C. Ellis, S. Z. Masood, M. F. Tappen, J. J. LaViola and R. Sukthankar, "Exploring the trade-off between accuracy and observationallatency in action recognition", Int. J. Comput. Vis., vol. 101, no. 3, pp. 420-436, 2013.
4.
X. Cai, W. Zhou and H. Li, "An effective representationfor action recognition with human skeleton joints", Proc. SPIE 9273 Optoelectron. Imaging Multimedia Technol. III, 2014.
5.
L. Xia, C. Chen and J. K. Aggarwal, "View invariant humanaction recognition using histograms of 3D joints", Proc. IEEE Conf. Comput. Vis. Pattern Recog. Workshop, pp. 20-27, 2012-Jun.
6.
Y. Song, J. Tang, F. Liu and S. Yan, "Body surface context:A new robust feature for action recognition from depth videos", IEEE Trans. Circuits Syst. Video Technol., vol. 24, no. 6, pp. 952-964, Jun. 2014.
7.
J. Wang, Z. Liu, Y. Wu and J. Yuan, "Mining actionlet ensemblefor action recognition with depth cameras", Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 1290-1297, 2012-Jun.
8.
B. Ni, P. Moulin and S. Yan, "Order-Preserving sparsecoding for sequence classification", Proc. Eur. Conf. Comput. Vis., pp. 173-187, 2012.
9.
M. Zanfir, M. Leordeanu and C. Sminchisescu, "The moving pose: Anefficient 3D kinematics descriptor for low-latency action recognitionand detection", Proc. IEEE Int. Conf. Comput. Vis., pp. 2752-2759, 2013-Dec.
10.
O. Oreifej, Z. Liu and W. A. Redmond, "HON4D: Histogram oforiented 4D normals for activity recognition from depth sequences", Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 716-723, 2013-Jun.
11.
Y. Pang, S. Wang and Y. Yuan, "Learning regularized LDA by clustering", IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 12, pp. 2191-2201, Dec. 2014.
12.
W. Zhou, M. Yang, H. Li, X. Wang, Y. Lin and Q. Tian, "Towards codebook-free: Scalable cascaded hashingfor mobile image search", IEEE Trans. Multimedia, vol. 16, no. 3, pp. 601-611, Apr. 2014.
13.
C. Wang, Y. Wang and A. L. Yuille, "An approach to pose-basedaction recognition", Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 915-922, 2013-Jun.
14.
X. Zhao, X. Li, C. Pang, X. Zhu and Q. Z. Sheng, "Online human gesturerecognition from motion data streams", Proc. ACM Int. Conf. Multimedia, pp. 23-32, 2013.
15.
W. Li, Z. Zhang and Z. Liu, "Action recognitionbased on a bag of 3D points", Proc. IEEE Conf. Comput. Vis. Pattern Recog. Workshop, pp. 9-14, 2010-Jun.
16.
J. Luo, W. Wang and H. Qi, "Group sparsity andgeometry constrained dictionary learning for action recognition fromdepth maps", Proc. IEEE Int. Conf. Comput. Vis., pp. 1809-1816, 2013-Dec.
17.
M. Raptis, D. Kirovski and H. Hoppe, "Real-time classificationof dance gestures from skeleton animation", Proc. ACM Int. Conf. Multimedia, pp. 147-156, 2013.
18.
X. Zhao, Y. Liu and Y. Fu, "Exploring discriminativepose sub-patterns for effective action classification", Proc. ACM Int. Conf. Multimedia, pp. 273-282, 2013.
19.
J. Zhu, B. Wang, X. Yang, W. Zhang and Z. Tu, "Action recognitionwith actons", Proc. IEEE Int. Conf. Comput. Vis., pp. 3559-3566, 2013-Dec.
20.
S. Andrews, I. Tsochantaridis and T. Hofmann, "Support vector machinesfor multiple-instance learning", Proc. Adv. Neural Inf. Process. Syst., pp. 561-568, 2003.
21.
J. K. Aggarwal and M. S. Ryoo, "Human activity analysis: A review", ACM Comput. Surveys, vol. 43, no. 3, pp. 1-43, 2011.
22.
M. Blank, L. Gorelick, E. Shechtman, M. Irani and R. Basri, "Actions as space-timeshapes", Proc. IEEE Int. Conf. Comput. Vis., vol. 2, pp. 1395-1402, 2005-Oct.
23.
W. Zhou, H. Li, R. Hong, Y. Lu and Q. Tian, "BSIFT: Towards data-independent codebook for largescale image search", IEEE Trans. Image Process., vol. 24, no. 3, pp. 967-979, Mar. 2015.
24.
X. Wu, D. Xu, L. Duan and J. Luo, "Action recognitionusing context and appearance distribution features", Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 489-496, 2011-Jun.
25.
X. Zhen, L. Shao, D. Tao and X. Li, "Embedding motion and structure features for actionrecognition", IEEE Trans. Circuits Syst. Video Technol., vol. 23, no. 7, pp. 1182-1190, Jul. 2013.
26.
I. N. Junejo, E. Dexter, I. Laptev and P. Pérez, "View-independent action recognition from temporalself-similarities", IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 1, pp. 172-185, Jan. 2011.
27.
M. Raptis, I. Kokkinos and S. Soatto, "Discovering discriminativeaction parts from mid-level video representations", Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 1242-1249, 2012-Jun.
28.
W. Zhou, M. Yang, X. Wang and H. Li, "Scalable feature matchingby dual cascaded scalar quantization for image retrieval", IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 1, pp. 159-171, Jan. 2016.
29.
H. Wang, A. Klaser, C. Schmid and C. Liu, "Action recognitionby dense trajectories", Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 3169-3176, 2011-Jun.
30.
Q. V. Le, W. Y. Zou, S. Y. Yeung and A. Y. Ng, "Learning hierarchical invariant spatio-temporalfeatures for action recognition with independent subspace analysis", Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 3361-3368, 2011-Jun.

Contact IEEE to Subscribe

References

References is not available for this document.