Skip to Main Content
In this paper, an interactive framework for navigating video sequences is presented using an optimal content-based video decomposition scheme. In particular, each video sequence is analyzed at different content resolution levels, creating a hierarchy from the lowest (coarse) to the highest (fine) resolution. This content hierarchy is represented as a tree structure, each level of which corresponds to a particular content resolution, while the tree nodes indicate the temporal video segments that the sequence content is partitioned at a given resolution. A criterion is introduced to measure the efficiency of the proposed scheme in organizing the video visual content and to compare it with other hierarchical video content representations and navigation schemes. The efficiency is measured as the difficulty for a user to locate a video segment of interest, while moving through different levels of hierarchy. In our case, video is decomposed so that the best efficiency is accomplished. However, the efficiency of a nonlinear video decomposition scheme depends on: 1) the number of paths required for a user to locate a relevant video segment and 2) the number of shot/frame classes (i.e., content representatives) extracted to represent the visual content. Both issues are addressed in this paper. In the first case, the probability of selecting a relevant video segment in the first path is maximized by extracting optimal content representatives through a minimization of a cross-correlation criterion. For the minimization, a genetic algorithm (GA) is adopted, since application of an exhaustive search to obtain the minimum value is too large to be implemented. The cross-correlation criterion is evaluated on the feature domain by extracting appropriate global and object-based descriptors for each video frame so that a better representation of the visual content is achieved. The second aspect (e.g., the number of content representatives) is addressed by minimizing the average transmitted information and simultaneously taking into consideration the temporal video segment complexity. More content representatives are extracted for video segments of high complexity, whereas a low number is required for low-complexity segments. In addition, a degree of interest is assigned to each- video shot (or frame) to address the fact that, from the user's perception, the visual content of a set of shots (frames) satisfies his/her information needs. Finally, a computationally efficient algorithm is proposed to regulate the degree of detail (i.e., the number of shot/frames representatives) in case the visual content is not efficiently represented from the user's perceptive view. Experimental results on real-life video sequences indicate the performance of the proposed GA-based video decomposition scheme compared to other hierarchical video organization methods.