Journals & Magazines >IEEE/ACM Transactions on Audi... >Volume: 29

A Cross-Entropy-Guided Measure (CEGM) for Assessing Speech Recognition Performance and Optimizing DNN-Based Speech Enhancement

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

A new cross-entropy-guided measure (CEGM) is proposed to indirectly assess accuracies of automatic speech recognition (ASR) of degraded speech with a speech enhancement f...Show More

Metadata

Abstract:

A new cross-entropy-guided measure (CEGM) is proposed to indirectly assess accuracies of automatic speech recognition (ASR) of degraded speech with a speech enhancement front-end and without directly performing ASR experiments. The proposed CEGM is calculated in three steps, namely: (1) a low-level representations via feature extraction, (2) a high-level nonlinear mapping using an acoustic model, and (3) a final CEGM calculation between the high-level representations of clean and enhanced speech. Specifically, state posterior probabilities from outputs of conventional hybrid acoustic model of the target ASR system are adopted as the high-level representations and a cross-entropy criterion is used to calculate the CEGM. Due to CEGM's differentiability, it can also be used to replace the conventional minimum mean squared error (MMSE) criterion as an objective function for deep neural network (DNN)-based speech enhancement. Therefore, the front-end enhancement model can be optimized towards improving the accuracies of the back-end ASR system. Experiments on single-channel CHiME-4 Challenge show that CEGM yields consistently the highest correlations with word error rate (WER) which is often costly to calculate, and achieves the most accurate assessment of ASR performance when compared to the perceptual evaluation metrics commonly used for assessing speech enhancement performance. Furthermore, CEGM-optimized speech enhancement could effectively reduce the WER on the CHiME-4 real test set when compared to unprocessed noisy speech and enhanced speech obtained with MMSE-optimized enhancement for ASR systems with fixed multi-condition acoustic models in various deep architectures.

Published in: IEEE/ACM Transactions on Audio, Speech, and Language Processing ( Volume: 29)

Page(s): 106 - 117

Date of Publication: 12 November 2020

ISSN Information:

DOI: 10.1109/TASLP.2020.3036783

Funding Agency:

Contents

References is not available for this document.

A Cross-Entropy-Guided Measure (CEGM) for Assessing Speech Recognition Performance and Optimizing DNN-Based Speech Enhancement

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

A Cross-Entropy-Guided Measure (CEGM) for Assessing Speech Recognition Performance and Optimizing DNN-Based Speech Enhancement

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?