A time-domain approach for enhancing speech signals degraded by statistically independent additive nonstationary noise with no a priori information is developed. The autoregressive (AR)-hidden filter model (HFM) with gain contour is proposed for modeling the statistical characteristics of the clean speech signal. Given the HFM parameter set of the speech, speech enhancement becomes a set of problems of joint signal estimation for clean speech and system identification for the gain contour and time-varying parameter of noise. Then, the expectation-maximization (EM) algorithm is applied to signal estimation and system identification. In the E-step, the signal estimation becomes a weighted sum of conditional mean estimator using multiple Kalman filters with Markovian switching coefficient, where the weights equal to a posteriori probabilities of the specific state sequence history given the noisy speech. The probability is computed by the Viterbi algorithm (VA). In M-step, the gain contour and noise parameters are recursively updated by an adaptive algorithm modified from the gradient-based algorithm. The proposed method does not require framing of speech signal in, the train and enhancement procedure. The proposed method is tested against the noisy speech signals degraded by nonstationary noise at various input signal-to-noise ratios. An approximate improvement of 4.5-6.0 dB in signal-to-noise ratio (SNR) is achieved at the input SNR 10 and 15 dB
Published in:
Speech and Audio Processing, IEEE Transactions on
(Volume:8
,
Issue:
3
)
Date of Publication: May 2000