I. Introduction
The representation of videos as a sequence of a large number of consecutive frames presents numerous limitations in surveillance video data for fast content-based search, retrieval, navigation, and storage. It is important to segment the video into homogeneously clustered segments in content space and then to describe each segment by a compressed and sufficient number of frames to obtain a video summarization [1] [2]. For this reason, the most significant frame selection method is required for efficient video summarization.