Skip to Main Content
Because actions can be small video objects, it is a challenging problem to search for similar actions in crowded and dynamic scenes when a single query example is provided. We propose a fast action search method that can efficiently locate similar actions spatiotemporally. Both the query action and the video datasets are characterized by spatio-temporal interest points. Instead of using a unified visual vocabulary to index all interest points in the database, we propose randomized visual vocabularies to enable fast and robust interest point matching. To accelerate action localization, we have developed a coarse-to-fine video subvolume search scheme, which is several orders of magnitude faster than the existing spatio-temporal branch and bound search. Our experiments on cross-dataset action search show promising results when compared with the state of the arts. Additional experiments on a 5-h versatile video dataset validate the efficiency of our method, where an action search can be finished in just 37.6 s on a regular desktop machine.