Skip to Main Content
The joint approximate diagonalization of a set of time-varying cross-spectral matrices is a method well-suited for blind separation of speech signals. The elements of the cross-spectral matrices are most commonly estimated using the Welch's method of weighted overlapped segment averaging. The segment length is restricted by the length of the mixing channel impulse responses which might be exceedingly long depending on the reverberation characteristics of the room. However, long effective data lengths risk averaging over non-stationary segments, cause higher estimation errors and large processing delays, and imperil the exploitation of non-stationarity in adjoining frames. This paper discusses the cross-spectral matrix estimation in the orthogonal multitaper framework, where a single segment of data is used for each cross-spectral matrix by averaging direct estimates over multiple windows or tapers. Four different nonparametric cross-spectrum estimators that fall into this framework are compared via numerical simulations involving convolutively mixed speech signals with premeasured room impulse responses.