Skip to Main Content
3D structure tensor is an effective representation of the local motion information of video object (VO) and has been exploited for performing VO segmentation. However, existing 3D structure tensor-based VO segmentation approaches often yield inaccurate objects' boundaries, and high computation is needed for estimating dense motion field. To address these concerns, a new scheme is proposed in this paper by generating the spatial-constrained motion masks without computing dense motion field. For that, scale-adaptive spatio-temporal filtering steered by the condition number is developed to handle multiple motions contributed from different VOs. As rigid, and nonrigid VO motions need to be handled differently on mask generation, rigidity analysis is conducted based on standard deviation of correlation coefficients over a range of successive video frames in order to identify whether each video sequence frame contains rigid or nonrigid motion. Various masks, such as eigenmaps, coherency-measurement maps, and change-detection maps, are produced and fused for generating the final VO motion masks. With boundary refinement by graph-based spatial segmentation, experimental results present accurately segmented moving VOs using different kinds of test sequences.