Skip to Main Content
Video processing and computer vision communities usually employ shot-based or object-based structural video models and associate low-level (color, texture, shape, and motion) and semantic descriptions (textual annotations) with these structural (syntactic) elements. Database and information retrieval communities, on the other hand, employ entity-relation or object-oriented models to model the semantics of multimedia documents. This paper proposes a new generic integrated semantic-syntactic video model to include all of these elements within a single framework to enable structured video search and browsing combining textual and low-level descriptors. The proposed model includes semantic entities (video objects and events) and the relations between them. We introduce a new "actor" entity to enable grouping of object roles in specific events. This context-dependent classification of attributes of an object allows for more efficient browsing and retrieval. The model also allows for decomposition of events into elementary motion units and elementary reaction/interaction units in order to access mid-level semantics and low-level video features. The instantiations of the model are expressed as graphs. Users can formulate flexible queries that can be translated into such graphs. Alternatively, users can input query graphs by editing an abstract model (model template). Search and retrieval is accomplished by matching the query graph with those instantiated models in the database. Examples and experimental results are provided to demonstrate the effectiveness of the proposed integrated modeling and querying framework.