Framewise Multiple Sound Source Localization and Counting Using Binaural Spatial Audio Signals | IEEE Conference Publication | IEEE Xplore

Framewise Multiple Sound Source Localization and Counting Using Binaural Spatial Audio Signals


Abstract:

Sound source localization is the problem of estimating the positions of one or several sound sources. In terms of binaural audio, localization is a paramount perceptual c...Show More

Abstract:

Sound source localization is the problem of estimating the positions of one or several sound sources. In terms of binaural audio, localization is a paramount perceptual characteristic which can be assessed subjectively or objectively. For objective evaluation of binaural sound localization, typical methods exploit binaural or monaural cues to predict directions of sound sources. Since multiple sound sources are often perceived simultaneously in daily sound scenes, an objective sound localization model which can detect temporally overlapping sources is required. In this paper, we propose a binaural multiple sound source localization network (BMSSLnet) model, which can predict framewise azimuths without prior knowledge of sound source number in a binaural audio signal. We implement multiple azimuth prediction as a multi-label classification task, and propose to use separated multi-label cross-entropy and mean square error as the loss function. Experimental results show that the proposed model obtains the average precision of 0.9 and 0.75 for spatial prediction on the anechoic dataset and reverberant dataset with up to three temporally overlapping sources, respectively. Framewise temporal prediction with average accuracy of 38.3 ms is achieved.
Date of Conference: 04-10 June 2023
Date Added to IEEE Xplore: 05 May 2023
ISBN Information:

ISSN Information:

Conference Location: Rhodes Island, Greece

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.