Loading web-font TeX/Main/Regular
Hierarchical Attention-Based Contextual Biasing For Personalized Speech Recognition Using Neural Transducers | IEEE Conference Publication | IEEE Xplore

Hierarchical Attention-Based Contextual Biasing For Personalized Speech Recognition Using Neural Transducers


Abstract:

Although end-to-end (E2E) automatic speech recognition (ASR) systems excel in general tasks, they frequently struggle with accurately recognizing personal rare words. Lev...Show More

Abstract:

Although end-to-end (E2E) automatic speech recognition (ASR) systems excel in general tasks, they frequently struggle with accurately recognizing personal rare words. Leveraging contextual information to bias the internal states of E2E ASR model has proven to be an effective solution. However most existing work focuses on biasing for a single domain and it is still challenging to expand such contextualization mechanisms to many domains. To address this limitation, in this work we propose a hierarchical attention architecture to scale contextual biasing to a wide range of domains simultaneously. Given multiple catalogs of contextual information, the high-level attention determines which source of catalog to focus on and the low-level attention learns to attend to the most relevant entity within the focused catalog. Experiments on diverse domains demonstrate the proposed architecture results in 35 \% to 60 \% relative WER improvements on personal rare words and outperforms existing approaches.
Date of Conference: 16-20 December 2023
Date Added to IEEE Xplore: 19 January 2024
ISBN Information:
Conference Location: Taipei, Taiwan

Contact IEEE to Subscribe

References

References is not available for this document.