Skip to Main Content
In this paper, a new method for multiple fundamental frequency estimation for speech and music signals is proposed. Applications of audio and speech processing include many well-reviewed algorithms for estimating the fundamental frequency of monophonic speech and music signals. In the case of polyphonic signals, it is more difficult to successfully estimate each of the fundamental frequencies, as reflected by the dearth of existing methods addressing this problem. In this paper, a new method based on the combination of the maximum likelihood and maximum a posteriori probability criteria is derived for fundamental frequencies tracking where each one of the fundamental frequencies is modeled by a first-order Markov process. The dominant signal is modeled as a harmonic source with unknown deterministic amplitudes, while the remaining signals, including other harmonic signals, are modeled as Gaussian interference sources with an unknown covariance matrix. After estimation of the dominant source, it is removed from the signal by projection of the signal into the null subspace spanned by the estimated signal. This procedure is iterated for all the harmonic sources in the data. The algorithm is tested with speech, music, and synthetic signals where in each case, two harmonic sources of the same kind were mixed. The performance of the proposed algorithm is evaluated and compared to an existing reference method in terms of gross-error-rate as a function of signal-to-interference ratio.