Skip to Main Content
In this paper, we present a new technique for separating two speech signals from a single recording. For this purpose, we decompose the speech signal into the excitation signal and the vocal tract function and then estimate the components from the mixed speech using a hybrid model. We first express the probability density function (PDF) of the mixed speech's log spectral vectors in terms of the PDFs of the underlying speech signal's vocal tract functions. Then, the mean vectors of PDFs of the vocal tract functions are obtained using a Maximum Likelihood estimator given the mixed signal. Finally, the estimated vocal tract function along with the extracted pitch values are used to reconstruct estimates of the individual speech signals. We compare our model with both an underdetermined blind source separation and a CASA method. The experimental results show our model outperforms both techniques in terms of SNR improvement and the percentage of crosstalk suppression.