Skip to Main Content
An unsupervised framework for face analysis aiming at lip tracking is presented in this paper. A colour video sequence of a speaker's face is simply acquired by a desktop camera under natural lighting conditions and without any particular make-up. After a logarithmic colour transform, a statistical segmentation process regularizes motion and hue information within a spatio-temporal neighbourhood. The hierarchical segmentation labels the different areas of the face. Results are then used to define a region of interest for each feature in the face, particularly the lip contours. Lip corners and associated characteristic points are extracted to initialise an active contours stage. Finally, a speaker's lip shape with inner and outer borders is tracked without user tuning: This unsupervised framework provides geometrical features of the face when no specific model of the speaker face is assumed.