Skip to Main Content
This paper presents the results of simulation and real room studies for localization of a moving speaker using information about the excitation source of speech production. The first step in localization is the estimation of time-delay from speech collected by a pair of microphones. Methods for time-delay estimation generally use spectral features that correspond mostly to the shape of vocal tract during speech production. Spectral features are affected by degradations due to noise and reverberation. This paper proposes a method for localizing a speaker using features that arise from the excitation source during speech production. Experiments were conducted by simulating different noise and reverberation conditions to compare the performance of the time-delay estimation and source localization using the proposed method with the results obtained using the spectrum-based generalized cross correlation (GCC) methods. The results show that the proposed method shows lower number of discrepancies in the estimated time-delays. The bias, variance and the root mean square error (RMSE) of the proposed method is consistently equal or less than the GCC methods. The location of a moving speaker estimated using the time-delays obtained by the proposed method are closer to the actual values, than those obtained by the GCC method.