Skip to Main Content
3D human pose recovery is considered as a fundamental step in view-invariant human action recognition. However, inferring 3D poses from a single view usually is slow due to the large number of parameters that need to be estimated and recovered poses are often ambiguous due to the perspective projection. We present an approach that does not explicitly infer 3D pose at each frame. Instead, from existing action models we search for a series of actions that best match the input sequence. In our approach, each action is modeled as a series of synthetic 2D human poses rendered from a wide range of viewpoints. The constraints on transition of the synthetic poses is represented by a graph model called Action Net. Given the input, silhouette matching between the input frames and the key poses is performed first using an enhanced Pyramid Match Kernel algorithm. The best matched sequence of actions is then tracked using the Viterbi algorithm. We demonstrate this approach on a challenging video sets consisting of 15 complex action classes.