Skip to Main Content
A new voice activity detection (VAD) algorithm with soft decision output in Mel-frequency domain is developed based on hidden Markov model (HMM) and is incorporated in an HMM-based speech enhancement system. The proposed VAD uses a two-state ergodic HMM representing speech presence and speech absence. The states are constructed from noisy speech and noise HMMs used in the speech enhancement system. This composite model provides a robust detection of speech segments in the presence of noise and obviates the need for extra modeling in HMM-based speech enhancement applications. As the main purpose of the proposed VAD is to detect speech segments accurately, a hang-over mechanism is proposed and is applied on the output of the VAD to improve the speech detection rate. The VAD is integrated in the HMM-based speech enhancement system in Mel-frequency spectral (MFS) and cepstral (MFC) domains. The performance of the proposed VAD, the effectiveness of the hang-over mechanism and the performance of the VAD-integrated speech enhancement system are evaluated on four noise types at different SNR levels. The experimental results confirm the superiority of the proposed VAD compared to the reference methods particularly for speech detection rate at the dominant noisy conditions.