Skip to Main Content
We describe efficient audio/visual features and their multimodal combination to detect highlights in soccer video. A novel audio feature first detects dominant speech portions in the commentary coincident with segments of high excitement in the game. Verification is then performed in the visual domain by detecting the presence of goal-mouth in the current shot and a high frequency of camera shot change in the subsequent shots. The cascaded process filters spurious candidate highlights from the noisy audio. The impressive results obtained on a large video test-set belie the technical simplicity in the system, which may now enable rapid generation of highlights on low-cost devices such as household set-top-boxes.