Skip to Main Content
In this paper, we present a novel human activity recognition approach that only requires a single video example per activity. We introduce the paradigm of active video composition, which enables one-example recognition of complex activities. The idea is to automatically create a large number of semi-artificial training videos called composed videos by manipulating an original human activity video. A methodology to automatically compose activity videos having different backgrounds, translations, scales, actors, and movement structures is described in this paper. Furthermore, an active learning algorithm to model the temporal structure of the human activity has been designed, preventing the generation of composed training videos violating the structural constraints of the activity. The intention is to generate composed videos having correct organizations, and take advantage of them for the training of the recognition system. In contrast to previous passive recognition systems relying only on given training videos, our methodology actively composes necessary training videos that the system is expected to observe in its environment. Experimental results illustrate that a single fully labeled video per activity is sufficient for our methodology to reliably recognize human activities by utilizing composed training videos.