Loading [MathJax]/extensions/MathMenu.js
Automatic Speech Recognition Tuned for Child Speech in the Classroom | IEEE Conference Publication | IEEE Xplore

Automatic Speech Recognition Tuned for Child Speech in the Classroom


Abstract:

K-12 school classrooms have proven to be a challenging environment for Automatic Speech Recognition (ASR) systems, both due to background noise and conversation, and diff...Show More

Abstract:

K-12 school classrooms have proven to be a challenging environment for Automatic Speech Recognition (ASR) systems, both due to background noise and conversation, and differences in linguistic and acoustic properties from adult speech, on which the majority of ASR systems are trained and evaluated. We report on experiments to improve ASR for child speech in the classroom by training and fine-tuning transformer models on public corpora of adult and child speech augmented with classroom background noise. By tuning OpenAI’s Whisper model we achieve a 38% relative reduction in word error rate (WER) to 9.2% on the public MyST dataset of child speech – the lowest yet reported – and a 7% relative reduction to reach 54% WER on a more challenging classroom speech dataset (ISAT). We also introduce a novel beam hypothesis rescoring method that incorporates a speed-aware term to capture prior knowledge of human speaking rates, as well as a Large Language Model, to select among hypotheses. We demonstrate the effectiveness of this technique on both publicly-available datasets and a classroom speech dataset.
Date of Conference: 14-19 April 2024
Date Added to IEEE Xplore: 18 March 2024
ISBN Information:

ISSN Information:

Conference Location: Seoul, Korea, Republic of

Contact IEEE to Subscribe

References

References is not available for this document.