Skip to Main Content
Delimiting the most informative voice segments of an acoustic signal is often a crucial initial step for any speech processing system. In the current work, we propose a novel segmentation approach based on a perception-based measure of speech intelligibility. Unlike segmentation approaches based on various forms of voice-activity detection (VAD), the proposed segmentation approach exploits higher-level perceptual information about the signal intelligibility levels. This classification based on intelligibility estimates is integrated into a novel multistream framework for automatic speaker recognition task. The multistream system processes the input acoustic signal along multiple independent streams reflecting various levels of intelligibility and then fusing the decision scores from the multiple steams according to their intelligibility contribution. Our results show that the proposed multistream system achieves significant improvements both in clean and noisy conditions when compared with a baseline and a state-of-the-art voice-activity detection algorithm.