Skip to Main Content
Sports video semantic event detection is essential for sports video summarization and retrieval. Extensive research efforts have been devoted to this area in recent years. However, the existing sports video event detection approaches heavily rely on either video content itself, which face the difficulty of high-level semantic information extraction from video content using computer vision and image processing techniques, or manually generated video ontology, which is domain specific and difficult to be automatically aligned with the video content. In this paper, we present a novel approach for sports video semantic event detection based on analysis and alignment of Webcast text and broadcast video. Webcast text is a text broadcast channel for sports game which is co-produced with the broadcast video and is easily obtained from the Web. We first analyze Webcast text to cluster and detect text events in an unsupervised way using probabilistic latent semantic analysis (pLSA). Based on the detected text event and video structure analysis, we employ a conditional random field model (CRFM) to align text event and video event by detecting event moment and event boundary in the video. Incorporation of Webcast text into sports video analysis significantly facilitates sports video semantic event detection. We conducted experiments on 33 hours of soccer and basketball games for Webcast analysis, broadcast video analysis and text/video semantic alignment. The results are encouraging and compared with the manually labeled ground truth.