I. Introduction
With the ever increasing amount of multimedia information available in the internet and digital video libraries, efficient video analysis, summarization, indexing and retrieval are urgently needed. Current video content management systems support retrieval using low-level audio-visual features. However, the semantic gap becomes a challenge problem in the management and utilization of the video content management system. Semantic gap is caused by the low level media content descriptors and the superior human beings, since the media content descriptions extracted from videos are too shallow compared to the meaning that users expect to capture [1]. In order to meet the various needs of different users during video retrieval, it is popular to segment video sequences into semantic types [1], [4] and mining vast relationships and concepts in various granulites from video data [1], [2].