1. Introduction
The purpose of source separation is to recover sources from a mixture signal. Separating vocal and music from monaural mixture signals, i.e., monaural singing voice separation (MSVS), is a long-term active research area in the field of source separation. Since only one single channel information is available, the MSVS is considered more challenging. Recently, the deep neural network (DNN) based methods have been used for MSVS and are effective for producing superior separation effects compared to the traditional methods [1].