Skip to Main Content
Feature fusion is a popular approach for improving the accuracy of speech recognition systems in noisy environments. Although the feature fusion method performs well for high and moderate SNRs, its performance degrades rapidly at low SNRs. Moreover, auxiliary features that are robust to acoustic noise may sometimes be unreliable, because of sensor misplacements etc, which can further degrade the performance. Furthermore, noisy sensor signals may exist not only in the test data but also in the training data. Here, the feature fusion method is combined with a missing data technique to improve noise-robustness at low SNRs. An auxiliary feature is used both for feature fusion and for detecting the unreliable speech frames. Noisy auxiliary features are addressed using a missing data approach both in the decoding and training systems. In the experiments, substantial improvements in the feature fusion method are obtained especially at low SNRs. The proposed missing data-based training strategy is also shown to improve the accuracy significantly.