Abstract:
In wearable and mobile devices, speech interfaces are increasingly equipped with keyword-spotting (KWS) functions. The always-on characteristic requires KWS to achieve ul...Show MoreMetadata
Abstract:
In wearable and mobile devices, speech interfaces are increasingly equipped with keyword-spotting (KWS) functions. The always-on characteristic requires KWS to achieve ultra-low power while keeping good accuracy, which is a major concern for KWS ASICs. For the frontend, most commercial MEMS microphones consume power up to \gt 100 \mu \mathrm{W}, which breaks the low-power effort by the state-of-the-art (SoTA) works [1, 2] that lack a fully-integrated near-microphone single-chip solution. For the feature extractor (FEx), analog FExs have achieved the low power of 9.3 \mu \mathrm{W} [3] and 109nW [4], but weaken the detection accuracy due to low-quality features. Scaling-friendly digital FExs [1, 5] have the advantage of extracting high-quality features, but the computation complexity and memory optimization are still key issues. For the classifier, convolutional neural networks (CNNs) are commonly applied to KWS, achieving superior accuracy results. However, their complex networks cause redundant computation and hardware cost at the edge.
Published in: 2023 IEEE Custom Integrated Circuits Conference (CICC)
Date of Conference: 23-26 April 2023
Date Added to IEEE Xplore: 11 May 2023
ISBN Information: