Abstract:
Acoustic scene generation (ASG) is a task to generate waveforms for acoustic scenes. ASG can be used to generate audio scenes for movies and computer games. Recently, neu...Show MoreMetadata
Abstract:
Acoustic scene generation (ASG) is a task to generate waveforms for acoustic scenes. ASG can be used to generate audio scenes for movies and computer games. Recently, neural networks such as SampleRNN have been used for speech and music generation. However, ASG is more challenging due to its wide variety. In addition, evaluating a generative model is also difficult. In this paper, we propose to use a conditional SampleRNN model to generate acoustic scenes conditioned on the input classes. We also propose objective criteria to evaluate the quality and diversity of the generated samples based on classification accuracy. The experiments on the DCASE 2016 Task 1 acoustic scene data show that with the generated audio samples, a classification accuracy of 65.5% can be achieved compared to samples generated by a random model of 6.7% and samples from real recording of 83.1%. The performance of a classifier trained only on generated samples achieves an accuracy of 51.3%, as opposed to an accuracy of 6.7% with samples generated by a random model.
Published in: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 12-17 May 2019
Date Added to IEEE Xplore: 17 April 2019
ISBN Information: