Contextual ASR with Retrieval Augmented Large Language Model | IEEE Conference Publication | IEEE Xplore

Contextual ASR with Retrieval Augmented Large Language Model


Abstract:

Automatic speech recognition (ASR) systems can benefit from incorporating contextual information to improve recognition accuracy, especially for uncommon words or phrases...Show More

Abstract:

Automatic speech recognition (ASR) systems can benefit from incorporating contextual information to improve recognition accuracy, especially for uncommon words or phrases. Current approaches like custom vocabularies or prompting with previous transcript segments provide limited contextual control. Compared to existing context biasing methods, RAG promises more flexible and scalable contextual control by leveraging LLMs’ broad knowledge. To this end, we propose leveraging large language models (LLMs) and retrieval-augmented generation (RAG) to enhance the contextual capabilities of ASR systems. Specifically, we propose systems based on text and audio LLMs to perform contextual error correction with context retrieved by querying a text-based retriever using the ASR module’s firstpass ASR hypotheses and a frequency-based custom vocabulary (CV) list. Our experiments reveal that the fine-tuned system has effectively learned to extract the relevant context to perform error correction while maintaining robustness against noise.
Date of Conference: 06-11 April 2025
Date Added to IEEE Xplore: 07 March 2025
ISBN Information:

ISSN Information:

Conference Location: Hyderabad, India

References

References is not available for this document.