Skip to Main Content
We present a compressed domain video object segmentation method for the MPEG encoded video sequences. For a fraction of the raw domain analysis, compressed domain segmentation provides the essential a priori information to many vision tasks from surveillance to transcoding that require fast processing of large volumes of data where pixel-resolution boundary extraction is not required. Our method generates accurate segmentation maps in block resolution at hierarchically varying object levels, which empowers application to determine the most pertinent partition of images. It exploits the block structure of the compressed video to minimize the amount of data to be processed. All the available motion flow within a group of pictures is projected onto a single layer, which also consists of the frequency decomposition of color pattern. Then, by starting from the blocks where the spatial energy is small, it expands homogeneous regions while automatically adapting local similarity criteria. We also formulate an alternative solution that applies a kernel-based clustering where separate spatial, transform, and motion kernels are used to establish the affinity. We show that both region expansion and mean shift produce similar results as the computationally expensive raw domain segmentation. Finally, a binary clustering iteratively merges the most similar regions to generate a hierarchical partition tree.