Skip to Main Content
This paper presents a novel approach to locate action objects in video and recognize their action types simultaneously using an associative memory model. The system uses a preprocessing procedure to extract key-frames from a video sequence and provide a compact representation for this video. Every training key-frame is partitioned into multiple overlapping patches in which image and motion features are extracted to generate an appearance-motion codebook. The training procedure also constructs a two-directional associative memory based on the learnt codebook to facilitate the system detecting and recognizing video action events using salient fragments, patch groups with common motion vectors. Our approach proposes the recently-developed Hough voting model as a framework for human action learning and memory. For each key-frame, the Hough voting framework employs Generalized Hough Transform (GHT) which constructs a graphical structure based on key-frame codewords to learn the mapping between action objects and a Hough space. To determine which patches explicitly represent an action object, the system detects salient fragments whose member patches are used to infer the associative memory and retrieve matched patches from the Hough model. These model patches are then used to locate the target action object and classify the action type simultaneously using a probabilistic Hough voting scheme. Results show that the proposed method gives good performance on several publicly available datasets in terms of detection accuracy and recognition rate.