Skip to Main Content
The MPEG-4 standard enables the representation of video as a collection of objects. This paper describes an automatic system that exploits such a representation. Our system consists of two parts: real-time content extraction algorithms and a real-time multi-object rate control method. We present two approaches to content extraction: foreground segmentation based on two cameras and face segmentation based on a single camera. The main contributions of this paper are: 1) under a stereo camera setup, we improve a disparity estimation algorithm to obtain crisp and smooth boundaries of foreground objects; 2) for a single camera scenario, we propose a novel algorithm for face detection and tracking, combining facial color and structure information; and 3) we develop a constant-quality variable bitrate (CQ-VBR) control algorithm that guarantees the quality specification for each object obtained from the two content extraction methods. Both segmentation algorithms run in real-time on a low-cost media processor, and have been tested extensively in various indoor environments. The CQ-VBR control algorithm is a useful tool for the evaluation of object-based coding. For low-bit-rate applications, we can achieve significant reduction in the overall bitrate, while maintaining the same visual quality of the foreground/face object as compared to conventional frame-based coding. Based on tests conducted on several sequences of different complexity levels, the bit-rate savings can be up to 48%. The satisfactory foreground segmentation (results presented) permits porting a live foreground object into arbitrary scenes to create composite video.