I. Introduction
A supervoxel is commonly defined as a subset of 3-D grids of video volume, which contains voxels that are similar in color, texture [13], etc. It is regarded as a perceptually meaningful atomic volume in a spatial–temporal domain. Similar to superpixels [33], supervoxels carry richer information such as spatial–temporal consistency, and they are often more manageable than simple voxels. Thus, supervoxels have been generally used for many video analysis and processing applications [19], [35], [36], such as video tracking, object recognition [32], motion analysis [31], and scene reconstruction [28], [34], [37]. A good supervoxel should own a regular shape with a uniform size and adhere well to the boundaries of object, which will improve the performance and decrease the computing cost of video processing applications.