Skip to Main Content
Action recognition has attracted much attention for human behavior analysis in recent years. Local spatial-temporal (ST) features are widely adopted in many works. However, most existing works which represent action video by histogram of ST words fail to have a deep insight into a fine structure of actions because of the local nature of these features. In this paper, we propose a novel method to simultaneously localize and recognize action units (AU) by regarding them as 3D (x,y,t) objects. Firstly, we record all of the local ST features in a codebook with the information of action class labels and relative positions to the respective AU centers. This simulates the probability distribution of class label and relative position in a non-parameter manner. When a novel video comes, we match its ST features to the codebook entries and cast votes for positions of its AU centers. And we utilize the localization result to recognize these AUs. The presented experiments on a public dataset demonstrate that our method performs well.