Skip to Main Content
In this paper we investigate semi-automatic construction of multimedia ontologies using a data-driven approach. We start with a collection of videos for which we wish to build an ontology (an explicit specification of a domain). Each video is pre-processed: scene cut detection, automatic speech recognition (ASR), and metadata extraction are performed. In addition we automatically index the videos based on visual content by extracting syntactic (e.g., color, texture, etc.) and semantic features (e.g., face, landscape, etc.). We then combine standard tools for ontology engineering and tools in content-based retrieval to semi-automatically build ontologies. In the first stage we process the text information available with the videos (ASR, metadata, and annotations, if any). Stop words (e.g., a, on, the) are eliminated and statistics (e.g., frequency, TFIDF, and entropy) are computed for all terms. Based on this data we manually select concepts and relationships to include in the ontology. Then we use content-based retrieval tools to assign multimedia entities (e.g., shots, videos, collections of videos) to concepts, properties, or relationships in the ontology, and to select multimedia entities as concepts, relationships, or properties in the ontology. We explore this methodology to construct multimedia ontologies from 24 hours of educational films from the 1940s-1960s used in the TREC video retrieval benchmark and discuss the problems encountered and future directions.