Skip to Main Content
We present an approach to recognizing single actor human actions in complex backgrounds. We adopt a Joint Tracking and Recognition approach, which track the actor pose by sampling from 3D action models. Most existing such approaches require large training data or MoCAP to handle multiple viewpoints, and often rely on clean actor silhouettes. The action models in our approach are obtained by annotating keyposes in 2D, lifting them to 3D stick figures and then computing the transformation matrices between the 3D keypose figures. Poses sampled from coarse action models may not fit the observations well; to overcome this difficulty, we propose an approach for efficiently localizing a pose by generating a Pose-Specific Part Model (PSPM) which captures appropriate kinematic and occlusion constraints in a tree-structure. In addition, our approach also does not require pose silhouettes. We show improvements to previous results on two publicly available datasets as well as on a novel, augmented dataset with dynamic backgrounds.