An Audio Frequency Unfolding Framework for Ultra-Low Sampling Rate Sensors | IEEE Conference Publication | IEEE Xplore

An Audio Frequency Unfolding Framework for Ultra-Low Sampling Rate Sensors


Abstract:

Recent audio super-resolution works have achieved significant success in promoting audio quality by improving a sensor’s sampling rate, e.g., from 8 kHz to 48 kHz. Howeve...Show More

Abstract:

Recent audio super-resolution works have achieved significant success in promoting audio quality by improving a sensor’s sampling rate, e.g., from 8 kHz to 48 kHz. However, these works fail to maintain the performance when the sampling rate at the sensor is ultra-low, where the audios suffer serious frequency aliasing. In this paper, we propose an audio frequency unfolding framework that efficiently reconstructs the aliasing audios to be perceptually recognizable. The intuition is that the audios generated by humans have a regular pattern on the spectrums; by learning such a regular pattern, our framework can reconstruct audio that sounds similar to real human voices. We evaluate our framework in a perceptual way: an automatic speech recognition (ASR) system is used to judge whether the words in the reconstructed audios can be correctly recognized. In the implementation based on AudioMNIST, when reconstructing the sampling rate from 2 kHz to 16 kHz, the recognition accuracy of the reconstructed audio reaches 77.1%.
Date of Conference: 06-07 April 2022
Date Added to IEEE Xplore: 29 June 2022
ISBN Information:

ISSN Information:

Conference Location: Santa Clara, CA, USA

I. Introduction

The audio of human voices becomes an essential data source in many applications in reality, such as speech recognition [7], [2], user identification [5] and human localization [15], [19]. The common method to acquire these audios always requires a microphone with a high sampling rate. In general, a microphone with a sampling rate over 8 kHz can be considered speech-recognizable and with a sampling rate of 48 kHz is of good quality [10]. Such a high sampling rate of a microphone usually renders high power consumption, which limits the microphone’s wider deployment on low-power devices. On the other hand, a microphone’s being low-power means its low sampling rate, which suffers frequency aliasing according to the Nyquist sampling theorem. Besides the power consumption issue of the microphone, recent works [12], [18] focus on extracting audios from inertial measurement units (IMU). Compared to the microphones, the audios extracted from IMUs concentrate on the sound sources traveled from the solid mediums, less interfered by the noise source far away. However, the sampling rate of the IMU, much lower than that of a microphone [18], also suffers frequency aliasing. Hereby, given the benefits of the low-power microphones and the IMUs over the traditional microphones, is it possible to address the frequency aliasing problem, i.e., reconstructing their low sampling rates to a high sampling rate?

Contact IEEE to Subscribe

References

References is not available for this document.