Skip to Main Content
In this paper, we propose a graph theoretic technique for recognizing human actions at a distance in a video by modeling the visual senses associated with poses. The proposed methodology follows a bag-of-word approach that starts with a large vocabulary of poses (visual words) and derives a refined and compact codebook of key poses using centrality measure of graph connectivity. We introduce a “meaningful” threshold on centrality measure that selects key poses for each action type. Our contribution includes a novel pose descriptor based on histogram of oriented optical flow evaluated in a hierarchical fashion on a video frame. This pose descriptor combines both pose information and motion pattern of the human performer into a multidimensional feature vector. We evaluate our methodology on four standard activity-recognition datasets demonstrating the superiority of our method over the state-of-the-art.