Loading [MathJax]/extensions/MathMenu.js
Towards an End-to-End Visual-to-Raw-Audio Generation With GAN | IEEE Journals & Magazine | IEEE Xplore

Towards an End-to-End Visual-to-Raw-Audio Generation With GAN


Abstract:

Automatically synthesizing sounds for different visual contents poses a challenge and there is a strong need to facilitate the direct creation of realistic sounds. Differ...Show More

Abstract:

Automatically synthesizing sounds for different visual contents poses a challenge and there is a strong need to facilitate the direct creation of realistic sounds. Different from previous works, in this paper, we propose a novel deep learning based approach, which formulates sound simulation as a regression problem. This allows us to circumvent the complexity of the acoustic theory by a novel, general-purpose neural sound synthesis (V2RA) network. Moreover, the end-to-end architecture of V2RA ensures full training without any extra inputs, which thereby greatly improves the scalability and reusability over previous works. In contrast to conventional visual-to-audio generation methods, the V2RA problem is established and solved by generative adversarial networks (GANs). Furthermore, our network architecture can directly predict synchronized raw audio signals (unlike most existing approaches that handle the audio through spectrograms) and generate sound in real time. To evaluate the performance of the neural network generator, we specifically introduce two quantitative scores. Various experiments demonstrate that our V2RA network can produce compelling sound results, which thus provides a viable solution for applications such as sound design and dubbing.
Page(s): 1299 - 1312
Date of Publication: 13 May 2021

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.