Codec-ASV: Exploring Neural Audio Codec For Speaker Representation Learning | IEEE Conference Publication | IEEE Xplore

Codec-ASV: Exploring Neural Audio Codec For Speaker Representation Learning


Abstract:

Discrete speech representations have gained significant success in a variety of speech-related tasks. Among these, Neural Audio Codec (NAC), which serves as a compressed ...Show More

Abstract:

Discrete speech representations have gained significant success in a variety of speech-related tasks. Among these, Neural Audio Codec (NAC), which serves as a compressed form of audio signals, have proven effective in speech AIGC applications. Moreover, we believe that the speaker information can be largely preserved in the compression process since the reconstructed voice is almost the same in human listening. In this paper, we explore various training strategies and codec types for NAC-based speaker representation learning. Using ECAPA-TDNN as the model backbone, our approach achieves state-of-the-art performance with a 2.08% EER in NAC-based speaker verification scenarios. To better retain speaker information in early, more compressed layers, we introduce mask-layer augmentation and embedding fusion techniques during the training process. Experimental results show the effectiveness of our methods, particularly when inferring with limited codec layers.
Date of Conference: 06-11 April 2025
Date Added to IEEE Xplore: 07 March 2025
ISBN Information:

ISSN Information:

Conference Location: Hyderabad, India

Funding Agency:


References

References is not available for this document.