Skip to Main Content
Missing feature theory (MFT) has demonstrated great potential for improving the noise robustness in speech recognition. MFT was mostly applied in the log-spectral domain since this is also the representation in which the masks have a simple formulation. However, with diagonally structured covariance matrices in the log-spectral domain, recognition performance can only be maintained at the cost of increasing the number of Gaussians drastically. In this paper, MFT can be applied for static and dynamic features in any feature domain that is a linear transform of log-spectra. A crucial part in MFT-systems is the computation of reliability masks from noisy data. The proposed system operates on either binary masks where hard decisions are made about the reliability of the data or on fuzzy masks which use a soft decision criterion. For real-life deployments, a compensation for convolutional noise is also required. Channel compensation in speech recognition typically involves estimating an additive shift in the log-spectral or cepstral domain. To deal with the fact that some features are considered as unreliable, a maximum-likelihood estimation technique is integrated in the back-end recognition process of the MFT system to estimate the channel. Hence, the resulting MFT-based recognizer can deal with both additive and convolutional noise and shows promising results on the Aurora4 large-vocabulary database.
Audio, Speech, and Language Processing, IEEE Transactions on (Volume:19 , Issue: 1 )
Date of Publication: Jan. 2011