Abstract:
Ultra-low-power keyword-spotting (KWS) chips are pivotal for edge devices to provide speech-triggering interaction. Recent KWS chips [1]–[4] succeed in improving the syst...Show MoreMetadata
Abstract:
Ultra-low-power keyword-spotting (KWS) chips are pivotal for edge devices to provide speech-triggering interaction. Recent KWS chips [1]–[4] succeed in improving the system accuracy and power efficiency via co-optimization between the algorithm and hardware. Yet, their false alarm rate (FAR) is still high, between 7.2% to 13% [1]–[4], leading to an unsatisfied user experience (Fig. 17.9.1, left). Aiding the KWS with speaker verification (SV) can substantially improve the FAR since most KWS interaction comes from the target users enrolled in the device [5]. Still, joint computation of KWS + SV can drastically enlarge the model parameters and power budget [2], while prolonging the decision latency. This paper reports a KWS chip achieving a 12-Class accuracy of 91.8% and a 1.8% FAR within a 2 \mathrm{~ms} decision latency. As shown in Fig. 17.9.1 (right), the key techniques are: 1) transfer-computing SV-assisted KWS that compresses the required parameters and enhances the KWS+SV computation efficiency; 2) hybrid-domain computing that handles both the analog and digital input features (IFs), alleviating the tradeoff between the computation power and system accuracy of the first layer; 3) scalable 5T-SRAM array that favors upscaling of itself with reduced leakage power and read power.
Date of Conference: 18-22 February 2024
Date Added to IEEE Xplore: 13 March 2024
ISBN Information: