By Topic

Hidden-Markov-model-based voice activity detector with high speech detection rate for speech enhancement

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $31
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Veisi, H. ; Dept. of Comput. Eng., Sharif Univ. of Technol., Tehran, Iran ; Sameti, H.

A new voice activity detection (VAD) algorithm with soft decision output in Mel-frequency domain is developed based on hidden Markov model (HMM) and is incorporated in an HMM-based speech enhancement system. The proposed VAD uses a two-state ergodic HMM representing speech presence and speech absence. The states are constructed from noisy speech and noise HMMs used in the speech enhancement system. This composite model provides a robust detection of speech segments in the presence of noise and obviates the need for extra modeling in HMM-based speech enhancement applications. As the main purpose of the proposed VAD is to detect speech segments accurately, a hang-over mechanism is proposed and is applied on the output of the VAD to improve the speech detection rate. The VAD is integrated in the HMM-based speech enhancement system in Mel-frequency spectral (MFS) and cepstral (MFC) domains. The performance of the proposed VAD, the effectiveness of the hang-over mechanism and the performance of the VAD-integrated speech enhancement system are evaluated on four noise types at different SNR levels. The experimental results confirm the superiority of the proposed VAD compared to the reference methods particularly for speech detection rate at the dominant noisy conditions.

Published in:

Signal Processing, IET  (Volume:6 ,  Issue: 1 )