Skip to Main Content
This paper considers the problem of automatically learning an activity-based semantic scene model from a stream of video data. A scene model is proposed that labels regions according to an identifiable activity in each region, such as entry/exit zones, junctions, paths, and stop zones. We present several unsupervised methods that learn these scene elements and present results that show the efficiency of our approach. Finally, we describe how the models can be used to support the interpretation of moving objects in a visual surveillance environment.