By Topic

Pitch Estimation in Noisy Speech Using Accumulated Peak Spectrum and Sparse Estimation Technique

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Feng Huang ; Dept. of Electron. Eng., Chinese Univ. of Hong Kong, Hong Kong, China ; Tan Lee

Pitch estimation from acoustic signals is a fundamental problem in many areas of speech research. For noise-corrupted speech, reliable pitch estimation is difficult. This paper presents a study of pitch estimation in noisy speech based on robust temporal-spectral representation and sparse reconstruction. We propose to accumulate spectral peaks over consecutive time frames. Since harmonic structure of speech changes much more slowly than noise spectrum, spectral peaks related to pitch harmonics would stand out over the noise through the accumulation. Experimental results show that the accumulated peak spectrum is indeed a robust representation of pitch harmonics. Subsequently, the accumulated peak spectrum is expressed as a sparse linear combination of a large set of clean peak spectrum exemplars. Gaussian mixture density is used to model noise spectrum peaks. The weights of the linear combination are estimated so as to maximize the likelihood of the accumulated peak spectrum under sparsity constraint. Robust pitch estimation is done based on the sparse weights and the corresponding peak spectrum exemplars. The use of Gaussian mixture model leads to non-convexity of the objective function for sparse weight estimation. By approximation and reformulation, two convex optimization approaches are developed to estimate the weights. Extensive experimental studies are carried out to evaluate performance of the proposed pitch estimation algorithms on a wide variety of noise conditions. It is clearly shown that the proposed methods significantly and consistently outperform the conventional methods, particularly at very low signal-to-noise ratios (e.g., SNR <; -5 dB).

Published in:

Audio, Speech, and Language Processing, IEEE Transactions on  (Volume:21 ,  Issue: 1 )