Improved Encoder-Decoder Architecture With Human-Like Perception Attention for Monaural Speech Enhancement | IEEE Journals & Magazine | IEEE Xplore

Improved Encoder-Decoder Architecture With Human-Like Perception Attention for Monaural Speech Enhancement


Abstract:

Speech enhancement (SE) models based on deep neural networks (DNNs) have shown excellent denoising performance. However, mainstream SE models often have high structural c...Show More

Abstract:

Speech enhancement (SE) models based on deep neural networks (DNNs) have shown excellent denoising performance. However, mainstream SE models often have high structural complexity and large parameter sizes, requiring substantial computational resources, which limits their practical application. In this paper, a high-efficiency encoder-decoder structure, inspired by the top-down attention mechanism in human brain perception and named human-like perception attention network (HPANet), is proposed for monaural speech enhancement, which is able to emulate brain perceptual attention in noise environments. In HPANet, the raw waveform is first encoded by using attention encoder to capture shallow global features. These features are then downsampled, and multi-scale information is aggregated through top attention module to prevent the loss of crucial information. Next, down attention module integrates features from neighboring layers to reconstruct signal in a top-down manner. Finally, the decoder reconstructs the denoised clean signal. Experiments show that the proposed method effectively reduces model complexity while maintaining competitive performance.
Published in: IEEE Signal Processing Letters ( Volume: 32)
Page(s): 1670 - 1674
Date of Publication: 07 April 2025

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.