Skip to Main Content
Action recognition with cluttered and moving background is a challenging problem. One main difficulty lies in the fact that the motion field in an action region is contaminated by the background motions. We propose a hierarchical filtered motion (HFM) method to recognize actions in crowded videos by the use of motion history image (MHI) as basic representations of motion because of its robustness and efficiency. First, we detect interest points as the two-dimensional Harris corners with recent motion, e.g., locations with high intensities in the MHI. Then, a global spatial motion smoothing filter is applied to the gradients of the MHI to eliminate isolated unreliable or noisy motions. At each interest point, a local motion field filter is applied to the smoothed gradients of the MHI by computing structure proximity between any pixel in the local region and the interest point. Thus, the motion at a pixel is enhanced or weakened based on its structure proximity with the interest point. To validate its effectiveness, we characterize the spatial and temporal features by histograms of oriented gradient in the intensity image and the MHI, respectively, and use a Gaussian-mixture-model-based classifier for action recognition. The performance of the proposed approach achieves the state-of-the-art results on the KTH dataset that has clean background. More importantly, we perform cross-dataset action classification and detection experiments, where the KTH dataset is used for training, while the microsoft research (MSR) action dataset II that consists of crowded videos with people moving in the background is used for testing. Our experiments show that the proposed HFM method significantly outperforms existing techniques.