A Cross-Entropy-Guided Measure (CEGM) for Assessing Speech Recognition Performance and Optimizing DNN-Based Speech Enhancement | IEEE Journals & Magazine | IEEE Xplore

A Cross-Entropy-Guided Measure (CEGM) for Assessing Speech Recognition Performance and Optimizing DNN-Based Speech Enhancement


Abstract:

A new cross-entropy-guided measure (CEGM) is proposed to indirectly assess accuracies of automatic speech recognition (ASR) of degraded speech with a speech enhancement f...Show More

Abstract:

A new cross-entropy-guided measure (CEGM) is proposed to indirectly assess accuracies of automatic speech recognition (ASR) of degraded speech with a speech enhancement front-end and without directly performing ASR experiments. The proposed CEGM is calculated in three steps, namely: (1) a low-level representations via feature extraction, (2) a high-level nonlinear mapping using an acoustic model, and (3) a final CEGM calculation between the high-level representations of clean and enhanced speech. Specifically, state posterior probabilities from outputs of conventional hybrid acoustic model of the target ASR system are adopted as the high-level representations and a cross-entropy criterion is used to calculate the CEGM. Due to CEGM's differentiability, it can also be used to replace the conventional minimum mean squared error (MMSE) criterion as an objective function for deep neural network (DNN)-based speech enhancement. Therefore, the front-end enhancement model can be optimized towards improving the accuracies of the back-end ASR system. Experiments on single-channel CHiME-4 Challenge show that CEGM yields consistently the highest correlations with word error rate (WER) which is often costly to calculate, and achieves the most accurate assessment of ASR performance when compared to the perceptual evaluation metrics commonly used for assessing speech enhancement performance. Furthermore, CEGM-optimized speech enhancement could effectively reduce the WER on the CHiME-4 real test set when compared to unprocessed noisy speech and enhanced speech obtained with MMSE-optimized enhancement for ASR systems with fixed multi-condition acoustic models in various deep architectures.
Page(s): 106 - 117
Date of Publication: 12 November 2020

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.