By Topic

Model-Based Speech Enhancement With Improved Spectral Envelope Estimation via Dynamics Tracking

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Ruofei Chen ; Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong SAR of China ; Cheung-Fat Chan ; Hing Cheung So

In this work, we present a model-based approach to enhance noisy speech using an analysis-synthesis framework. Target speech is reconstructed with model parameters estimated from noisy observations. In particular, spectral envelope is estimated by tracking its temporal trajectories in order to improve the noise-distorted short-time spectral amplitude. Initially, we propose an analysis-synthesis framework for speech enhancement based on harmonic noise model (HNM). Acoustic parameters such as pitch, spectral envelope, and spectral gain are extracted from HNM analysis. Spectral envelope estimation is improved by tracking its line spectrum frequency trajectories through Kalman filtering. System identification of Kalman filter is achieved via a combined design of codebook mapping scheme and maximum-likelihood estimator with parallel training data. Complete system design and experimental validations are given in details. Through performance evaluation based on a study of spectrogram, objective measures and a subjective listening test, it is demonstrated that the proposed approach achieves significant improvement over conventional methods in various conditions. A distinct advantage of the proposed method is that it successfully tackles the “musical tones” problem.

Published in:

IEEE Transactions on Audio, Speech, and Language Processing  (Volume:20 ,  Issue: 4 )