FastFCA: Joint Diagonalization Based Acceleration of Audio Source Separation Using a Full-Rank Spatial Covariance Model | IEEE Conference Publication | IEEE Xplore

FastFCA: Joint Diagonalization Based Acceleration of Audio Source Separation Using a Full-Rank Spatial Covariance Model


Abstract:

Here we propose an accelerated version of one of the most promising methods for audio source separation proposed by Duong et al. [“Under-determined reverberant audio sour...Show More

Abstract:

Here we propose an accelerated version of one of the most promising methods for audio source separation proposed by Duong et al. [“Under-determined reverberant audio source separation using a full-rank spatial covariance model,” IEEE Trans. ASLP, vol. 18, no. 7, pp. 1830-1840, Sep. 2010]. We refer to this conventional method as full-rank spatial covariance analysis (FCA), and the proposed method as FastFCA. A major drawback of the conventional FCA is computational complexity: inversion and multiplication of covariance matrices are required at each time-frequency point and each EM iteration. To overcome this drawback, the proposed FastFCA diagonalizes the covariance matrices jointly based on the generalized eigenvalue problem. This leads to significantly reduced computational complexity of the FastFCA, because the complexity of matrix inversion and matrix multiplication for diagonal matrices is O(M) instead of O(M3) (M: matrix order). Furthermore, the FastFCA is rigorously equivalent to the FCA, and therefore the reduction in computational complexity is realized without degradation in source separation performance. An experiment showed that the FastFCA was over 250 times faster than the FCA with virtually no degradation in source separation performance. In this paper, we focus on the two-source case, while the case of more than two sources is treated in a separate paper.
Date of Conference: 03-07 September 2018
Date Added to IEEE Xplore: 02 December 2018
ISBN Information:

ISSN Information:

Conference Location: Rome, Italy
References is not available for this document.

I. Introduction

The source separation method proposed by Duong et al. [1], which is called full-rank spatial covariance analysis (FCA) in this paper, can be considered one of the most promising source separation methods. In the FCA, the spatial characteristics of each source signal are modeled by a full-rank matrix called a spatial covariance matrix. The full-rank spatial covariance matrix enables the FCA to model not only point-source signals but also reverberant source signals and diffuse signals (e.g., background noise). This contrasts markedly with conventional modeling of spatial characteristics with a steering vector, which is done, e.g., in the independent component analysis (ICA) [2]. However, a major drawback of the FCA is computational complexity, which may be prohibitive in applications with restricted computational resources (e.g., hearing aids) or to a large dataset (e.g., the CHiME-3 dataset [3]). Indeed, the FCA requires matrix inversion and matrix multiplication (both of complexity with being the number of microphones) of covariance matrices at each time-frequency point and each EM iteration.

Select All
1.
N. Q. K. Duong, E. Vincent, and R. Gribonval, “Under-determined reverberant audio source separation using a full-rank spatial covariance model,” IEEE Trans. ASLP, vol. 18, no. 7, pp. 1830–1840, Sep. 2010.
2.
A. Hyvärinen, J. Karhunen, and E. Oja, Independent Component Analysis, John Wiley & Sons, New York, 2001.
3.
J. Barker, R. Marxer, E. Vincent, and S. Watanabe, “The third ‘CHiME’ speech separation and recognition challenge: Dataset, task and baselines,” in Proc. ASRU, Dec. 2015, pp. 504–511.
4.
R. Sakanashi, S. Miyabe, T. Yamada, and S. Makino, “Comparison of superimposition and sparse models in blind source separation by multichannel Wiener filter,” in Proc. APSIPA, Dec. 2012.
5.
N. Ito, S. Araki, T. Yoshioka, and T. Nakatani, “Relaxed disjointness based clustering for joint blind source separation and dereverberation,” in Proc. IWAENC, Sept. 2014, pp. 268–272.
6.
Ö. Yilrnaz and S. Rickard, “Blind separation of speech mixtures via time-frequency masking,” IEEE Trans. SP, vol. 52, no. 7, pp. 1830–1847, Jul. 2004.
7.
A. Yeredor, “Non-orthogonal joint diagonalization in the least-squares sense with application in blind source separation,” IEEE Trans. on SP, vol. 50, no. 7, pp. 1545–1553, July 2002.
8.
N. Ito and T. Nakatani, “FastFCA-AS: Joint diagonalization based acceleration of full-rank spatial covariance analysis for separating any number of sources,” arXiv preprint, May 2018, arXiv: 1805.09498.
9.
N. Ito, S. Araki, and T. Nakatani, “FastFCA: A joint diagonalization based fast algorithm for audio source separation using a full-rank spatial covariance model,” arXiv preprint, May 2018, arXiv: 1805.06572.
10.
H. Sawada, S. Araki, and S. Makino, “Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment,” IEEE Trans. ASLP, vol. 19, no. 3, pp. 516–527, Mar. 2011.
11.
E. Vincent, R. Gribonval, and C. Févotte, “Performance measurement in blind audio source separation,” IEEE Trans. ASLP, vol. 14, no. 4, pp. 1462–1469, Jul. 2006.

Contact IEEE to Subscribe

References

References is not available for this document.