Skip to Main Content
In this paper, a new algorithm for pitch extraction from noisy speech signals based on both temporal and spectral representations is presented. We derive a harmonic sinusoidal correlation (HSC) model of clean speech as a temporal representation. Given only a noisy speech frame, a noise-robust least-squares minimization technique is proposed to acquire the parameters of the HSC model which are directly employed for the accurate estimation of a pitch-harmonic (PH). Exploiting the extracted PH and based on a spectral representation which is an enhanced spectrum in the discrete cosine transform domain, a two-fold criterion is developed in order to achieve the true consecutive number corresponding to PH that is finally adopted for pitch detection in the presence of noise. Simulation results using the Keele pitch extraction reference database manifest that combining the multi cues obtained from the temporal as well as spectral representations, the proposed algorithm is able to achieve a superior efficacy in comparison to some of the existing methods from high to very low signal-to-noise ratio (SNR) levels.