A procedure is described for the segmentation, content-based coding, and visualization of videoconference image sequences. First, image sequence analysis is used to estimate the shape and motion parameters of the person facing the camera. A spatiotemporal filter, taking into account the intensity differences between consequent frames, is applied, in order to separate the moving person from the static background. The foreground is segmented in a number of regions in order to identify the face. For this purpose, we propose the novel procedure of K-means with connectivity constraint algorithm as a general segmentation algorithm combining several types of information including intensity, motion and compactness. In this algorithm, the use of spatiotemporal regions is introduced since a number of frames are analyzed simultaneously and as a result, the same region is present in consequent frames. Based on this information, a 3-D ellipsoid is adapted to the person's face using an efficient and robust algorithm. The rigid 3-D motion is estimated next using a least median of squares approach, Finally, a virtual reality modeling language (VRML) file is created containing all the above information; this file may be viewed by using any VRML 2.0 compliant browser
Published in:
Circuits and Systems for Video Technology, IEEE Transactions on
(Volume:10
,
Issue:
8
)
Date of Publication: Dec 2000