Skip to Main Content
Understanding video events, i.e., the translation of low-level content in video sequences into high-level semantic concepts, is a research topic that has received much interest in recent years. Important applications of this paper include smart surveillance systems, semantic video database indexing, and interactive systems. This technology can be applied to several video domains including airport terminal, parking lot, traffic, subway stations, aerial surveillance, and sign language data. In this paper, we identify the two main components of the event understanding process: abstraction and event modeling. Abstraction is the process of molding the data into informative units to be used as input to the event model. Due to space restrictions, we will limit the discussion on the topic of abstraction. See the study by Lavee et al. (Understanding video events: A survey of methods for automatic interpretation of semantic occurrences in video, Technion-Israel Inst. Technol., Haifa, Israel, Tech. Rep. CIS-2009-06, 2009) for a more complete discussion. Event modeling is devoted to describing events of interest formally and enabling recognition of these events as they occur in the video sequence. Event modeling can be further decomposed in the categories of pattern-recognition methods, state event models, and semantic event models. In this survey, we discuss this proposed taxonomy of the literature, offer a unifying terminology, and discuss popular event modeling formalisms (e.g., hidden Markov model) and their use in video event understanding using extensive examples from the literature. Finally, we consider the application domain of video event understanding in light of the proposed taxonomy, and propose future directions for research in this field.