In this paper, we propose a new two-level framework to analyze high-level structure of video and to detect useful events automatically based on visual keywords. The first level extracts low-level features such as motion, color, texture etc to detect video segments boundaries and label segments as visual keywords. We then apply an event detection grammar to the visual keywords sequence at the second level to detect video segments that match the predefined event model. The exact position of the segment that the event occurs can also be spotted. We have applied the proposed approach to the detection of goal and corner-kick events in the portions of 4 FIFA World Cup 2002 soccer videos (1666 segments) with more than 80% accuracy.
Published in:
Information, Communications and Signal Processing, 2003 and Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint Conference of the Fourth International Conference on
(Volume:3
)
Date of Conference: 15-18 Dec. 2003