Abstract:
Wave-U-Net is an end-to-end single-channel source separation method that works in the time domain and thus can take the phase information into account during separation. ...Show MoreMetadata
Abstract:
Wave-U-Net is an end-to-end single-channel source separation method that works in the time domain and thus can take the phase information into account during separation. It has shown high performance in tasks such as singing voice separation and speech enhancement. We previously proposed an extension of Wave-U-Net to online processing with a short input using teacher-student learning. Since online Wave-U-Net processes input signals frame-by-frame, where the frames are segmented by applying a window function, the window length is generally the lower bound of the algorithmic delay. In this paper, based on the fact that the separation performance of online Wave-U-Net is concentrated at the center of the segment, we propose to reduce the algorithmic delay by applying windows with a zero region near the edges into the online Wave-U-Net. Experimental results showed that the proposed method reduced the algorithmic delay by 40% of that of the conventional method while keeping the high speech enhancement performance with source-to-distortion ratio improvement of about 15 dB, thus enabling low-delay and high-performance speech enhancement.
Published in: 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
Date of Conference: 14-17 December 2021
Date Added to IEEE Xplore: 03 February 2022
ISBN Information:
ISSN Information:
Conference Location: Tokyo, Japan