Data Augmentation for Monaural Singing Voice Separation Based on Variational Autoencoder-Generative Adversarial Network | IEEE Conference Publication | IEEE Xplore

Data Augmentation for Monaural Singing Voice Separation Based on Variational Autoencoder-Generative Adversarial Network


Abstract:

Random mixing and circularly shifting for augmenting the training set are used to improve the separation effect of deep neural network (DNN)-based monaural singing voice ...Show More

Abstract:

Random mixing and circularly shifting for augmenting the training set are used to improve the separation effect of deep neural network (DNN)-based monaural singing voice separation (MSVS). However, these manual methods are based on unrealistic assumptions that two sources in the mixture are independent of each other, which limits the separation effect. This paper proposes a data augmentation method based on variational autoencoder (VAE) and generative adversarial network (GAN), which is called as VAE-GAN. The VAE models the observed spectra of sources (vocal and music) separately and reconstructs new spectra from the latent space. The GAN's discriminator is introduced to measure the correlation between the latent variables of the vocal and music generated by the VAE probability encoder. This adversarial mechanism in VAE's latent space could learn the synthetic likelihood and ultimately decode high quality spectra samples, which further improves the separation effect of general MSVS networks.
Date of Conference: 08-12 July 2019
Date Added to IEEE Xplore: 05 August 2019
ISBN Information:

ISSN Information:

Conference Location: Shanghai, China

1. Introduction

The purpose of source separation is to recover sources from a mixture signal. Separating vocal and music from monaural mixture signals, i.e., monaural singing voice separation (MSVS), is a long-term active research area in the field of source separation. Since only one single channel information is available, the MSVS is considered more challenging. Recently, the deep neural network (DNN) based methods have been used for MSVS and are effective for producing superior separation effects compared to the traditional methods [1].

Contact IEEE to Subscribe

References

References is not available for this document.