Skip to Main Content
This paper addresses the problem of single channel speech separation for robust speech recognition in noisy environments. We propose to apply shape analysis techniques in image analysis to a computational auditory scene analysis (CASA)-based speech separation system. The CASA-based system extracts the desired speech signals from a mixture of speech signals and other sound sources. In the proposed method, we complement the missing speech signals by applying the shape analysis techniques such as labeling and distance function. In the speech separation experiment using various artificial mixtures of speech and various noise sources, the proposed system increases the signal-to-noise ratio by 6.6 dB. In the speech recognition experiment using the Interspeech speech database, it improves recognition accuracy by 22% for recognition of speech added with speech-shaped stationary noise.