Skip to Main Content
In this article, we concentrate on spectral estimation techniques that are useful in extracting the features to be used by automatic speech recognition (ASR) system. As an aid to understanding the spectral estimation process for speech signals, we adopt the source filter model of speech production as presented in X. Huang et al. (2001), wherein speech is divided into two broad classes: voiced and unvoiced. Voiced speech is quasi-periodic, consisting of a fundamental frequency corresponding to the pitch of a speaker, as well as its harmonics. Unvoiced speech is stochastic in nature and is best modeled as white noise convolved with an infinite impulse response filter.