Skip to Main Content
This article addresses the problem of instantaneous signal-to-noise ratio (SNR) estimation during speech activity for the purpose of improving the performance of speech enhancement algorithms. It is shown that the kurtosis of noisy speech may be used to individually estimate speech and noise energies when speech is divided into narrow bands. Based on this concept, a novel method is proposed to continuously estimate the SNR across the frequency bands without the need for a speech detector. The derivations are based on a sinusoidal model for speech and a Gaussian assumption about the noise. Experimental results using recorded speech and noise show that the model and the derivations are valid, though not entirely accurate across the whole spectrum; it is also found that many noise types encountered in mobile telephony are not far from Gaussianity as far as higher statistics are concerned, making this scheme quite effective.