Abstract:
The time-domain speech separation methods adopting deep learning have obtained impressive performance. However, the computational complexity, model size, and performance ...Show MoreMetadata
Abstract:
The time-domain speech separation methods adopting deep learning have obtained impressive performance. However, the computational complexity, model size, and performance are still the challenges for the implementation on real-time low-resource devices. In this paper, we introduce a lightweight yet effective network for speech separation, namely SeliNet. The SeliNet is the one-dimensional convolutional architecture that employs bottleneck modules, and atrous temporal pyramid pooling. In bottleneck modules, the depth-wise separable convolution significantly decreases the model size and computational cost meanwhile the squeeze excitation uses a context vector to interact with the entire hidden state vector. Specifically, the atrous temporal pyramid pooling recognizes long-time sequences of various lengths and extracts context at different field-of-views. This helps SeliNet to obtain impressive performance while still maintaining the small computational cost and model size.
Published in: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 04-10 June 2023
Date Added to IEEE Xplore: 05 May 2023
ISBN Information: