Skip to Main Content
The aim of this paper is to address the problem of recognizing human group activities in surveillance videos. This task has great potentials in practice, however was rarely studied due to the lack of benchmark database and the difficulties caused by large intra-class variations. Our contributions are two-fold. Firstly, we propose to encode the group-activities with three types of localized causalities, namely self-causality, pair-causality, and group-causality, which characterize the local interaction/reasoning relations within, between, and among motion trajectories of different humans respectively. Each type of causality is expressed as a specific digital filter, whose frequency responses then constitute the feature representation space. Finally, each video clip of certain group activity is encoded as a bag of localized causalities/filters. We also collect a human group-activity video database, which involves six popular group activity categories with about 80 video clips for each in average, captured in five different sessions with varying numbers of participants. Extensive experiments on this database based on our proposed features and different classifiers show the promising results on this challenging task.