By Topic

Use of spectral autocorrelation in spectral envelope linear prediction for speech recognition

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Hong Kook Kim ; Dept. of Electr. Eng., Korea Adv. Inst. of Sci. & Technol., Seoul, South Korea ; Hwang Soo Lee

This paper proposes a linear predictive (LP) analysis method where sample autocorrelations are estimated from the spectral envelope of a speech signal on the basis of the spectral autocorrelation. The spectral autocorrelation is defined as discrete quantities of speech spectrum with spectral resolution identical to the discrete Fourier transform (DFT) used to obtain the speech spectrum. From analytical and empirical derivation of its properties, we can estimate the fundamental frequency and the maximally correlated frequency for voiced and unvoiced speech, respectively, and then obtain the spectral envelope by sampling at a rate of the estimated frequency. A frequency normalization can be applied to the estimated spectral envelope because the number of samples of the spectral envelope usually differs from frame to frame. The spectral envelope is warped into the mel-frequency scale and the inverse DFT is applied to extract the estimate of sample autocorrelations. From the result of LP analysis on the sample autocorrelations, we finally obtain the spectral envelope cepstral coefficients (SECC). Hidden Markov model (HMM) recognition experiments show that SECC significantly improves the performance of a recognizer at low signal-to-noise ratios (SNRs) over several other representations

Published in:

IEEE Transactions on Speech and Audio Processing  (Volume:7 ,  Issue: 5 )