Bone Conducted Signal Guided Speech Enhancement For Voice Assistant on Earbuds | IEEE Conference Publication | IEEE Xplore

Bone Conducted Signal Guided Speech Enhancement For Voice Assistant on Earbuds


Abstract:

In this work we present a multi-modal, streaming enhancement network to improve speech recognition for voice assistants on earbuds. The proposed model is guided by a bone...Show More

Abstract:

In this work we present a multi-modal, streaming enhancement network to improve speech recognition for voice assistants on earbuds. The proposed model is guided by a bone conducted signal (BCS) to separate the interfering sources from the target speaker signal. We train the model on a simulated speech enhancement training set with a simulated BCS and finetune it on a small earbuds specific training set, consisting of about 6 hours of speech. To account for distorted BCS the enhancement module is complemented by a voice activity-based decision to discard the enhanced output for BCS without speech information. A possibility to preprocess the BCS to account for the low-pass characteristic of the bone conduction is evaluated to lower the required transmission bandwidth from the earbuds to the recognition device. The results show that the BCS bandwidth can be reduced to 500 Hz with only small losses in word error rate. In comparison with a larger state-of-the-art multi-channel enhancement method, the systems, with and without bandwidth reduction, demonstrate superior performance on most of the considered realistic test sets.
Date of Conference: 06-11 April 2025
Date Added to IEEE Xplore: 07 March 2025
ISBN Information:

ISSN Information:

Conference Location: Hyderabad, India

References

References is not available for this document.