Journals & Magazines >IEEE Transactions on Audio, S... >Volume: 33

VM-ASR: A Lightweight Dual-Stream U-Net Model for Efficient Audio Super-Resolution

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Audio super-resolution (ASR), also known as bandwidth extension (BWE), aims to enhance the quality of low-resolution audio by recovering high-frequency components. Howeve...Show More

Metadata

Abstract:

Audio super-resolution (ASR), also known as bandwidth extension (BWE), aims to enhance the quality of low-resolution audio by recovering high-frequency components. However, existing methods often struggle to model harmonic relationships accurately and balance the inference speed and computational complexity. In this paper, we propose VM-ASR, a novel lightweight ASR model that leverages the Visual State Space (VSS) block to effectively capture global and local contextual information within audio spectrograms. This enables VM-ASR to model harmonic relationships more accurately, improving audio quality. Our experiments on the VCTK dataset demonstrate that VM-ASR consistently outperforms state-of-the-art methods in spectral reconstruction across various input-output sample rate pairs, achieving significantly lower Log-Spectral Distance (LSD) while maintaining a smaller model size (3.01 M parameters) and lower computational complexity (2.98 GFLOPS). This makes VM-ASR not only a promising solution for real-time applications and resource-constrained environments but also opens up exciting possibilities in telecommunications, speech synthesis, and audio restoration.

Published in: IEEE Transactions on Audio, Speech and Language Processing ( Volume: 33)

Page(s): 666 - 677

Date of Publication: 24 January 2025

Electronic ISSN: 2998-4173

DOI: 10.1109/TASLPRO.2025.3533365

Funding Agency:

Contents

References is not available for this document.

VM-ASR: A Lightweight Dual-Stream U-Net Model for Efficient Audio Super-Resolution

Abstract:

Metadata

Abstract:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

VM-ASR: A Lightweight Dual-Stream U-Net Model for Efficient Audio Super-Resolution

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

Authors

Figures

References

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?