Skip to Main Content
Computational Auditory Scene Analysis (CASA) has attracted a lot of interest in segregating speech from monaural mixtures. In this paper, we propose a new method for single channel speech separation with frame-based pitch range estimation in modulation frequency domain. This range is estimated in each frame of modulation spectrum of speech by analyzing onsets and offsets. In the proposed method, target speaker is separated from interfering speaker by filtering the mixture signal with a mask extracted from the modulation spectrogram of mixture signal. Systematic evaluation shows an acceptable level of separation comparing with classic methods.