Abstract:
Keyword Spotting (KWS) is crucial for hands-free voice-activated systems, requiring a balance between accuracy and complexity, especially in noisy environments. While Spe...Show MoreMetadata
Abstract:
Keyword Spotting (KWS) is crucial for hands-free voice-activated systems, requiring a balance between accuracy and complexity, especially in noisy environments. While Speech Enhancement (SE) can improve KWS accuracy, existing methods often lack the ability to effectively utilize the rich features produced during enhancement. In this paper, we design a low-complexity network to address the challenges of KWS in noisy environments. We integrate the tasks of both SE and KWS into a unified network that learns a shared representation from both tasks. The proposed network features two blocks: a Residual Full-band and Sub-band Fusion (RFSF) block, and a Deformable Transition (DT) block. Our dual-task network surpasses existing KWS models in accuracy with low complexity, making it suitable for deployment on edge devices.
Published in: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 06-11 April 2025
Date Added to IEEE Xplore: 07 March 2025
ISBN Information: