Skip to Main Content
Video indexing requires the efficient segmentation of video into scenes. The video is first segmented into shots and a set of key-frames is extracted for each shot. Typical scene detection algorithms incorporate time distance in a shot similarity metric. In the method we propose, to overcome the difficulty of having prior knowledge of the scene duration, the shots are clustered into groups based only on their visual similarity and a label is assigned to each shot according to the group that it belongs to. Then, a sequence alignment algorithm is applied to detect when the pattern of shot labels changes, providing the final scene segmentation result. In this way shot similarity is computed based only on visual features, while ordering of shots is taken into account during sequence alignment. To cluster the shots into groups we propose an improved spectral clustering method that both estimates the number of clusters and employs the fast global k-means algorithm in the clustering stage after the eigenvector computation of the similarity matrix. The same spectral clustering method is applied to extract the key-frames of each shot and numerical experiments indicate that the content of each shot is efficiently summarized using the method we propose herein. Experiments on TV-series and movies also indicate that the proposed scene detection method accurately detects most of the scene boundaries while preserving a good tradeoff between recall and precision.