Skip to Main Content
Automatic human action recognition has been a challenging issue in the field of machine vision. Some high-level features such as SIFT, although with promising performance for action recognition, are computationally complex to some extent. To deal with this problem, we construct the features based on the Distance Transform of body contours, which is relatively simple and computationally efficient, to represent human action in the video. After extracting the features from videos, we adopt the Conditional Random Field for modeling the temporal action sequences. The proposed method is tested with an available standard dataset. We also testify the robustness of our method on various realistic conditions, such as body occlusion or intersection.