Abstract:
Balancing emotion preservation and privacy protection in voice anonymization presents a significant challenge, particularly due to the difficulty of effectively handling ...Show MoreMetadata
Abstract:
Balancing emotion preservation and privacy protection in voice anonymization presents a significant challenge, particularly due to the difficulty of effectively handling prosody, a key feature in speech. While preserving prosodic features in anonymized speech enhances emotional expression, it also increases the risk of leaking speaker information. To address this conflict, we propose a lightweight Emotion-Preserving Prosody Anonymization (EPPA) network, which extracts speaker-independent prosodic features to preserve speech emotion while converting them into another speaker’s style for anonymization. By combining EPPA with timbre cloning for anonymization while retaining speech content, we achieve a more balanced voice conversion. Evaluated using the Voice Privacy Challenge (VPC) 2024 metrics, our proposed EPPA, utilizing the closest center distance (CCD) anonymization strategy, demonstrates strong performance across emotional expression, content clarity, and privacy protection, achieving the highest ranking in both average and weighted ranks compared to the six baseline solutions.
Published in: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 06-11 April 2025
Date Added to IEEE Xplore: 07 March 2025
ISBN Information: