Skip to Main Content
Sound source separation has become a topic of intensive research in the last years. The research effort has been specially relevant for the underdetermined case, where a considerable number of sparse methods working in the time-frequency (T-F) domain have appeared. In this context, although binary masking seems to be a preferred choice for source demixing, the estimated masks differ substantially from the ideal ones. This paper proposes a maximum a posteriori (MAP) framework for binary mask estimation. To this end, class-conditional source probabilities according to the observed mixing parameters are modeled via ratios of dependent Cauchy distributions while source priors are iteratively calculated from the observed histograms. Moreover, spatially smoothed posteriors in the T-F domain are proposed to avoid noisy estimates, showing that the estimated masks are closer to the ideal ones in terms of objective performance measures.
Audio, Speech, and Language Processing, IEEE Transactions on (Volume:20 , Issue: 7 )
Date of Publication: Sept. 2012