Skip to Main Content
A sound source localization system is implemented that uses only three microphones to input sound signals. This system can estimate the azimuth and elevation of a sound source in real-time and in sufficient accuracy. We add a SNR measure besides spectra entropy to help detect voiced frames. Next, synchronous FFT phase copying is adopted, and cross-power spectrum phase is calculated to estimate TDOA (time delay of arrival) for each frame. Also, to enhance the accuracy of TDOA, parabolic interpolation is adopted. Then, by comparing the estimated TDOA values with theoretic ones, the azimuth and elevation of a sound source can be determined. Since a pair of azimuth and elevation is estimated from each voiced frame, these estimated values are thereafter summed with a weighting method to give one final answer of azimuth and elevation. According to the experiment results, the average errors in estimating azimuth and elevation are 4.02 and 2.18 degrees, respectively.