Loading [MathJax]/extensions/MathMenu.js
37.8 A 13.5µW 35-Keyword End-to-End Keyword Spotting System Featuring Personalized On-Chip Training in 28nm CMOS | IEEE Conference Publication | IEEE Xplore

37.8 A 13.5µW 35-Keyword End-to-End Keyword Spotting System Featuring Personalized On-Chip Training in 28nm CMOS


Abstract:

Modern IT devices offer personalized features that can be customized to individual users. However, keyword spotting (KWS), a feature that is gaining widespread adoption i...Show More

Abstract:

Modern IT devices offer personalized features that can be customized to individual users. However, keyword spotting (KWS), a feature that is gaining widespread adoption in many personal devices, remains largely non-user-configurable as it is designed for the general public. While user-specific training can enable a personalized KWS system tailored to individual users' accents, the large power consumption and high complexity required in training have hindered its adoption. Thus, existing KWS systems often suffer from missed commands, especially for users with accents, leading to increased energy consumption and an undesirable user experience. Furthermore, existing KWS systems usually recognize only a few words for waking up the device, although future devices are expected to understand a much broader vocabulary to execute various commands. In the recent past, several KWS systems have been proposed [1]–[9] with an emphasis on low-power consumption. Unfortunately, most works only recognize up to ten keywords, and although [1] supports 35 keywords, it suffers from low accuracy (78%) (>150µW) due to the increased complexity. More importantly, none of these systems have user-specific trainability, which leads to poor accuracy for users with accents. In this work, we propose a low-power 35-keyword end-to-end KWS system featuring personalized on-chip training that ensures high accuracy for speakers with different accents. On-chip training also allows the use of a compact convolutional neural network (CNN) and small memory size, thereby saving power and area. The proposed chip also features a compact sine-based convolution [11] with an approximate logarithm function that serves as a low-complexity feature extractor (FE). Fabricated in 28nm CMOS, the proposed KWS system achieves an average accuracy of 92.2% on a 35-keyword accented dataset that is trained and tested on speakers from nine different countries, while consuming 10.93µW during inference and 13.46µW during training...
Date of Conference: 16-20 February 2025
Date Added to IEEE Xplore: 06 March 2025
ISBN Information:

ISSN Information:

Conference Location: San Francisco, CA, USA

Contact IEEE to Subscribe

References

References is not available for this document.