Skip to Main Content
The lip motion detection stands out as relevant visual feature for detecting the active speaker and speech recognition. In this paper, a new approach for lips and visual voice activity detection is proposed. First, the algorithm performs skin segmentation to reduce the search area for lip extraction, and the most likely lip and non-lip regions are detected using a Bayesian approach within the delimited area. Then, the final lip segmentation is obtained by thresholding the calculated probability regions and applying simple morphological operators. Finally, the temporal motion of the lips is explored using Hidden Markov Models (HMMs) to detect the likely occurrence of active speech within a temporal window.