Skip to Main Content
Spatio-Temporal Interest Point (STIP) has been widely used for human action recognition. However, the performance of the STIP based methods are still limited in realistic datasets which often include large variations in illuminations, viewpoints and camera motions. One reason of the low performance is that the STIPs only reflect the local change in videos, which is not enough to obtain stable informative features for action representation in realistic scene. To tackle the problem, we proposed an approach to selecting the "stable STIPs" with the spatio-temporal distribution of STIPs in neighbor region. Then, BoW feature is constructed to represent actions with these selected points. The experimental results on KTH dataset and HMDB (the largest realistic human action dataset) demonstrate that the proposed approach has obvious effect on improving the recognition rates of realistic data.