Skip to Main Content
Processing of image sequences has progressed from simple structure from motion paradigm to the recognition of actions/interactions as events. Understanding human activities in video has many potential applications including automated surveillance, video archival/retrieval, medical diagnosis, sports analysis, and human-computer interaction. Understanding human activities involves various steps of low-level vision processing such as segmentation, tracking, pose recovery, and trajectory estimation as well as high-level processing tasks such as body modeling and representation of action. While low-level processing has been actively studied, high-level processing is just beginning to receive attention. This is partly because high-level processing depends on the results of low-level processing. However, high-level processing also requires some independent and additional approaches and methodologies. We focus on the following aspects of high-level processing: (1) human body modeling, (2) level of detail needed to understand human actions, (3) approaches to human action recognition, and (4) high-level recognition schemes with domain knowledge. The review is illustrated by examples of each of the areas discussed, including recent developments in our work on understanding human activities.