By Topic

Adaptive Kalman Filtering and Smoothing for Tracking Vocal Tract Resonances Using a Continuous-Valued Hidden Dynamic Model

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Li Deng ; Microsoft Res., Redmond, WA ; Lee, L.J. ; Attias, H. ; Acero, A.

A novel Kalman filtering/smoothing algorithm is presented for efficient and accurate estimation of vocal tract resonances or formants, which are natural frequencies and bandwidths of the resonator from larynx to lips, in fluent speech. The algorithm uses a hidden dynamic model, with a state-space formulation, where the resonance frequency and bandwidth values are treated as continuous-valued hidden state variables. The observation equation of the model is constructed by an analytical predictive function from the resonance frequencies and bandwidths to LPC cepstra as the observation vectors. This nonlinear function is adaptively linearized, and a residual or bias term, which is adaptively trained, is added to the nonlinear function to represent the iteratively reduced piecewise linear approximation error. Details of the piecewise linearization design process are described. An iterative tracking algorithm is presented, which embeds both the adaptive residual training and piecewise linearization design in the Kalman filtering/smoothing framework. Experiments on estimating resonances in Switchboard speech data show accurate estimation results. In particular, the effectiveness of the adaptive residual training is demonstrated. Our approach provides a solution to the traditional "hidden formant problem," and produces meaningful results even during consonantal closures when the supra-laryngeal source may cause no spectral prominences in speech acoustics

Published in:

Audio, Speech, and Language Processing, IEEE Transactions on  (Volume:15 ,  Issue: 1 )