Skip to Main Content
In this paper, we address the issue of underdetermined source separation of I nonstationary audio sources from a J -channel linear instantaneous mixture (J <; I). This problem is addressed with a specific coder-decoder configuration. At the coder, source signals are assumed to be available before the mixing is processed. A time-frequency (TF) joint analysis of each source signal and mixture signal enables to select the subset of sources (among I ) leading to the best separation results in each TF region. A corresponding source(s) index code is imperceptibly embedded into the mix signal using a watermarking technique. At the decoder, where the original source signals are unknown, the extraction of the watermark enables to invert the mixture in each TF region to recover the source signals. With such an informed approach, it is shown that five instruments and singing voice signals can be efficiently separated from two-channel stereo mixtures, with a quality that significantly overcomes the quality obtained by a semi-blind reference method and enables separate manipulation of the source signals during stereo music restitution (i.e., remixing).
Audio, Speech, and Language Processing, IEEE Transactions on (Volume:19 , Issue: 6 )
Date of Publication: Aug. 2011