Abstract:
Deep neural networks have been recently applied to the task of automatic synthesizer programming, i.e., finding optimal values of sound synthesis parameters in order to r...Show MoreMetadata
Abstract:
Deep neural networks have been recently applied to the task of automatic synthesizer programming, i.e., finding optimal values of sound synthesis parameters in order to reproduce a given input sound. This paper focuses on generative models, which can infer parameters as well as generate new sets of parameters or perform smooth morphing effects between sounds. We introduce new models to ensure scalability and to increase performance by using heterogeneous representations of parameters as numerical and categorical random variables. Moreover, a spectral variational autoencoder architecture with multi-channel input is proposed in order to improve inference of parameters related to the pitch and intensity of input sounds. Model performance was evaluated according to several criteria such as parameters estimation error and audio reconstruction accuracy. Training and evaluation were performed using a 30k presets dataset which is published with this paper. They demonstrate significant improvements in terms of parameter inference and audio accuracy and show that presented models can be used with subsets or full sets of synthesizer parameters.
Date of Conference: 08-10 September 2021
Date Added to IEEE Xplore: 11 May 2022
ISBN Information: