Skip to Main Content
Automatic categorization of human actions in the real world is very challenging due to the great intra-class differences. In this paper, we present a new method for robust recognition of human actions. We first cluster each video in the training set into temporal semantic segments by a dense descriptor. Each segment in the training set is represented by a concatenated histogram of sparse and dense descriptors. These histograms of segments are used to train a classifier. In the recognition stage, a query video is also divided into temporal semantic segments by clustering. Each segment will obtain a confidence evaluated by the trained classifier. Combining the confidence of each segment, we classify this query video. To evaluate our approach, we perform experiments on two challenging datasets, i.e., the Olympic Sports Dataset (OSD) and Hollywood Human Action dataset (HOHA). We also test our method on the benchmark KTH human action dataset. Experimental results confirm that our algorithm performs better than the state-of-the-art methods.