Skip to Main Content
In this paper, we propose a framework that fuses global and local features for action recognition in videos sequences. The combination of multiple features is important as single feature based method is not able to capture imaging variation (illumination changes, view point orientation etc. and attributes of individuals (age, size etc.). Hence, we use two types of features: i) a quantized local spatial-temporal (ST) volumes (or cuboids), and ii) a quantized global features, which aims to capture the shape deformation of the actor by considering actions as 3D objects (x, y, t). For ST features, we extracted 100 interest cuboids from each video. Then, we used k-means algorithm to generate the code books with sizes 200 and 2,000. For global features, we uniformly sample interest points from each action volume. The k-means algorithm is applied again to quantize the feature vectors. Finally, all the classification experiments were carried out by using K-Nearest Neighborhood (KNN) classifier. The performance of the proposed framework is tested on publicly available dataset. The results demonstrate that fusion of multiple features helps in achieving improved performance, and allows recognition of meaningful daily-live actions.