Skip to Main Content
Recently, it has been proposed to estimate the noise power spectral density by means of minimum mean-square error (MMSE) optimal estimation. We show that the resulting estimator can be interpreted as a voice activity detector (VAD)-based noise power estimator, where the noise power is updated only when speech absence is signaled, compensated with a required bias compensation. We show that the bias compensation is unnecessary when we replace the VAD by a soft speech presence probability (SPP) with fixed priors. Choosing fixed priors also has the benefit of decoupling the noise power estimator from subsequent steps in a speech enhancement framework, such as the estimation of the speech power and the estimation of the clean speech. We show that the proposed speech presence probability (SPP) approach maintains the quick noise tracking performance of the bias compensated minimum mean-square error (MMSE)-based approach while exhibiting less overestimation of the spectral noise power and an even lower computational complexity.
Audio, Speech, and Language Processing, IEEE Transactions on (Volume:20 , Issue: 4 )
Date of Publication: May 2012