1. INTRODUCTION
Speech source separation and dereverberation [1]–[3] are fundamental techniques in automatic speech recognition(ASR) and teleconferencing systems. Dereverberation techniques based on statistical modeling have been actively studied, e.g., Weighted Prediction Error (WPE) [4]. Simultaneous optimization of speech source separation and dereverberation has been also actively studied based on statistical modeling [5]–[8]. These techniques rely on speech source models based on super-Gaussian distributions, e.g., Laplacian distribution [9],[10] and time-varying Gaussian distribution [11]. However, the expression capability of these speech source models is not enough for expressing a complicated speech source spectrum. Recently, a deep neural network (DNN) is utilized for expressing a complicated speech source spectrum [12]–[20]. The expression capability of the DNN based speech source model is higher than that of statistical models.