Skip to Main Content
We define and present a new model for semantic video compression and searching, and apply this model to several video genres, with special emphasis on instructional videos. In this model, a semantic distance based on the video genre is computed between adjacent frames in a dynamic video buffer of predetermined size, and redundant "unkey" frames leak from the buffer interior into a data structure "leakage history directed acyclic graph" (LHDAG) which records the relative importance of these frames. What exits the buffer forms a highly compressed video stream consisting of only the most semantically significant video frames, whereas the LHDAG permits efficient semantic exploration of the video interior. This novel hierarchical but context-sensitive data structure LHDAG permits the searching of the video at display rates that are proportional to visual significance, at levels of semantic density selectable by the user. The data structure is simple to create and query, and appears to be more psychologically plausible than more straightforward fixed sampling indexing schemes. We empirically display and mathematically analyze the relationships of the video buffer's leaking rate, buffer size, and exit delay, and demonstrate its performance on several extended videos. The flexibility of the method is indicated by its demonstration on two very different definitions of semantic significance: color similarity, and content ("ink pixel") similarity.
Date of Conference: 2001