Loading [MathJax]/extensions/MathMenu.js
Multi-stage attention for fine-grained expressivity transfer in multispeaker text-to-speech system | IEEE Conference Publication | IEEE Xplore

Multi-stage attention for fine-grained expressivity transfer in multispeaker text-to-speech system


Abstract:

The main goal of this work is to provide fine-grained transfer of expressivity in various speaker's voices for which no expressive speech data is available. Our approach ...Show More

Abstract:

The main goal of this work is to provide fine-grained transfer of expressivity in various speaker's voices for which no expressive speech data is available. Our approach conditions a multispeaker Tacotron 2 system with latent embeddings extracted from phoneme sequence, speaker identity, and reference expres-sive Mel spectrogram. The proposed system utilizes attention modules for discovering local and global expressivity attributes. Additionally, location-sensitive attention is applied in the decoder to learn the alignment between phoneme sequence-Mel spectro-gram pair. In addition to conventional objective metrics for speech synthesis, we used cosine similarity and character error rate (CER) measures for the evaluation of transfer of expressivity and intelligibility. The obtained results demonstrate the presented cosine similarity metric for speaker and expressivity is consistent with the subjective evaluation. Thus, the usage of multiple evaluation measures provides a way to estimate the strength of emotions and the speaker's voice for transferred expressivity in the target speaker's voice. The obtained results show that presented fine-grained TTS systems performed better than the Tacotron 2 based baseline systems.
Date of Conference: 29 August 2022 - 02 September 2022
Date Added to IEEE Xplore: 18 October 2022
ISBN Information:

ISSN Information:

Conference Location: Belgrade, Serbia

Contact IEEE to Subscribe

References

References is not available for this document.