Skip to Main Content
A three-stage method to extract the visual pronunciation feature of lip movements is presented in this paper. Firstly, an approach using the "Red Exclusion + Fisher Transformation'' to enhance the chromatic images in video sequences is presented, then an algorithm for segmenting the enhanced gray images with adaptive thresholding is proposed to get the box of the lip regions. Secondly, the authors classify the lip sub-images in the obtained boxes according to the visual-pronunciation features, and two formulae are presented to normalize the dimensions and the gray values of these sub-images, then a method based on SVD is used to extract features from the normalized images. Finally, the matching template based on Mahalanobis distance is applied to recognize lipshapes. The experimental results show that the features extracted by this method have the advantages of the lower dimension, more information and applicable in natural conditions over the available methods.