Abstract:
In mobile and edge devices, always-on keyword spotting (KWS) is an essential function to detect wake-up words. Recent works achieved extremely low power dissipation down ...Show MoreMetadata
Abstract:
In mobile and edge devices, always-on keyword spotting (KWS) is an essential function to detect wake-up words. Recent works achieved extremely low power dissipation down to ~500nW. However, most of them adopt noise-dependent training, i.e. training for a specific signal-to-noise ratio (SNR) and noise type, and therefore their accuracies degrade for different SNR levels and noise types that are not targeted in the training (Fig. 9.9.1, top left). To improve robustness, so-called noise-independent training can be considered, which is to use the training data that includes all the possible SNR levels and noise types. But, this approach is challenging for an ultra-low-power device since it demands a large neural network to learn all the possible features. A neural network of a fixed size has its own memory capacity limit and reaches a plateau in accuracy if it has to learn more than its limit (Fig. 9.9.1, top right). On the other hand, it is known that biological acoustic systems employ a simpler process, called divisive energy normalization (DN), to maintain accuracy even in varying noise conditions. In this work, therefore, by adopting such a DN, we prototype a normalized acoustic feature extractor chip (NAFE) in 65nm. The NAFE can take an acoustic signal from a microphone and produce spike-rate coded features. We pair NAFE with a spiking neural network (SNN) classifier chip, creating the end-to-end KWS system. The proposed system achieves 89-to-94% accuracy across -5 to 20dB SNRs and four different noise types on HeySnips, while the baseline without DN achieves a much lower accuracy of 71-87%. NAFE consumes up to 109nW and the KWS system 570nW.
Date of Conference: 13-22 February 2021
Date Added to IEEE Xplore: 03 March 2021
ISBN Information: