Skip to Main Content
Presents a method for accurately identifying human behaviours for content-based retrieval by using audio and video information. In conventional content-based retrieval, target events are identified by analyzing information about the position of objects such as loci, relative positions, their transitions, etc. from video. However, methods using indices obtained only from video essentially fail to detect some important time points and positions due to tracking errors as a result of occlusion, leading to recognition failures and oversights of target events. Our approach combines the use of audio information with conventional video methods, to develop an integrated reasoning module that can recognize some events that cannot be identified by conventional ones. Based on the proposed method, we implemented a content-based retrieval system that can identify several actions in a real tennis video. The basic actions of a player such as a forehand swing, an overhead swing, etc. are identified by using information about the court and net lines, the players' positions, the ball positions, and the moments when the players hit the ball, which are called "impact points". Simulation results show that the rate of detecting impact points affects the rate of recognition of player's basic actions. They also show that by using audio information, we can avoid some recognition problems.