Skip to Main Content
Based on Technology of Time-Frequency Masking, we raise a blind separation algorithm of speech mixtures, which can be used for separating any number of source using only two mixtures. The method is valid when sources are satisfying W-disjoint orthogonal, that is, when the supports of the windowed Fourier transform of the signals in mixture are disjoint. In time-frequency domain, Performance is compared for floating-point and fixed-point implementations. A Weighted K-means clustering algorithm is presented as an alternative to gradient descent methods for peak tracking and demonstrated to achieve excellent performance without adversely affecting computational load. extract the spatial cues of speech signal, which are relative attenuation-delay pairs, then Motivated by the maximum likelihood mixing parameter estimators, we define a power weighted two-dimensional (2-D) histogram constructed from the ratio of the time-frequency representations of the mixtures that is shown to have one peak for each source with peak location corresponding to the relative attenuation and delay mixing parameters. Then, mark the time-frequency binary masking and using this technique separate the source in time-frequency domain. Finally, I-STFT is used to transform the separated source back to time domain and separated the signal. In a word, the proposed algorithm will give a new prospect to the research of blind separation of speech.
Date of Conference: 21-23 April 2012