Loading [MathJax]/extensions/MathMenu.js
VM-ASR: A Lightweight Dual-Stream U-Net Model for Efficient Audio Super-Resolution | IEEE Journals & Magazine | IEEE Xplore

VM-ASR: A Lightweight Dual-Stream U-Net Model for Efficient Audio Super-Resolution


Abstract:

Audio super-resolution (ASR), also known as bandwidth extension (BWE), aims to enhance the quality of low-resolution audio by recovering high-frequency components. Howeve...Show More

Abstract:

Audio super-resolution (ASR), also known as bandwidth extension (BWE), aims to enhance the quality of low-resolution audio by recovering high-frequency components. However, existing methods often struggle to model harmonic relationships accurately and balance the inference speed and computational complexity. In this paper, we propose VM-ASR, a novel lightweight ASR model that leverages the Visual State Space (VSS) block to effectively capture global and local contextual information within audio spectrograms. This enables VM-ASR to model harmonic relationships more accurately, improving audio quality. Our experiments on the VCTK dataset demonstrate that VM-ASR consistently outperforms state-of-the-art methods in spectral reconstruction across various input-output sample rate pairs, achieving significantly lower Log-Spectral Distance (LSD) while maintaining a smaller model size (3.01 M parameters) and lower computational complexity (2.98 GFLOPS). This makes VM-ASR not only a promising solution for real-time applications and resource-constrained environments but also opens up exciting possibilities in telecommunications, speech synthesis, and audio restoration.
Page(s): 666 - 677
Date of Publication: 24 January 2025
Electronic ISSN: 2998-4173

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.