Conferences >2019 IEEE International Confe...

Data Augmentation for Monaural Singing Voice Separation Based on Variational Autoencoder-Generative Adversarial Network

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Random mixing and circularly shifting for augmenting the training set are used to improve the separation effect of deep neural network (DNN)-based monaural singing voice ...Show More

Metadata

Abstract:

Random mixing and circularly shifting for augmenting the training set are used to improve the separation effect of deep neural network (DNN)-based monaural singing voice separation (MSVS). However, these manual methods are based on unrealistic assumptions that two sources in the mixture are independent of each other, which limits the separation effect. This paper proposes a data augmentation method based on variational autoencoder (VAE) and generative adversarial network (GAN), which is called as VAE-GAN. The VAE models the observed spectra of sources (vocal and music) separately and reconstructs new spectra from the latent space. The GAN's discriminator is introduced to measure the correlation between the latent variables of the vocal and music generated by the VAE probability encoder. This adversarial mechanism in VAE's latent space could learn the synthetic likelihood and ultimately decode high quality spectra samples, which further improves the separation effect of general MSVS networks.

Published in: 2019 IEEE International Conference on Multimedia and Expo (ICME)

Date of Conference: 08-12 July 2019

Date Added to IEEE Xplore: 05 August 2019

ISBN Information:

ISSN Information:

DOI: 10.1109/ICME.2019.00235

Conference Location: Shanghai, China

Contents

1. Introduction

The purpose of source separation is to recover sources from a mixture signal. Separating vocal and music from monaural mixture signals, i.e., monaural singing voice separation (MSVS), is a long-term active research area in the field of source separation. Since only one single channel information is available, the MSVS is considered more challenging. Recently, the deep neural network (DNN) based methods have been used for MSVS and are effective for producing superior separation effects compared to the traditional methods [1].

References is not available for this document.

Data Augmentation for Monaural Singing Voice Separation Based on Variational Autoencoder-Generative Adversarial Network

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Data Augmentation for Monaural Singing Voice Separation Based on Variational Autoencoder-Generative Adversarial Network

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?