Skip to Main Content
In this paper we present an approach towards detecting semantical events from broadcasted sports video through collaborative multimedia analysis, called intermodal collaboration. Broadcasted video can be viewed as a set of multimodal streams such as visual, auditory, and textual (closed caption: CC) streams. By considering temporal dependency between their streams, we aim to improve the reliability and efficiency for event detection. This method consists of three procedural stages: CC stream analysis, auditory stream analysis, and visual stream analysis. In this method, we learn both frequently appearing keywords related to the event from the CC stream and feature parameters characterizing cheering and shouting from the auditory stream. The experimental results for broadcasted sports video of American football games indicate that our approach is effective for event detection.