I. Introduction
The audio of human voices becomes an essential data source in many applications in reality, such as speech recognition [7], [2], user identification [5] and human localization [15], [19]. The common method to acquire these audios always requires a microphone with a high sampling rate. In general, a microphone with a sampling rate over 8 kHz can be considered speech-recognizable and with a sampling rate of 48 kHz is of good quality [10]. Such a high sampling rate of a microphone usually renders high power consumption, which limits the microphone’s wider deployment on low-power devices. On the other hand, a microphone’s being low-power means its low sampling rate, which suffers frequency aliasing according to the Nyquist sampling theorem. Besides the power consumption issue of the microphone, recent works [12], [18] focus on extracting audios from inertial measurement units (IMU). Compared to the microphones, the audios extracted from IMUs concentrate on the sound sources traveled from the solid mediums, less interfered by the noise source far away. However, the sampling rate of the IMU, much lower than that of a microphone [18], also suffers frequency aliasing. Hereby, given the benefits of the low-power microphones and the IMUs over the traditional microphones, is it possible to address the frequency aliasing problem, i.e., reconstructing their low sampling rates to a high sampling rate?