Conferences >ICASSP 2022 - 2022 IEEE Inter...

On the Interplay between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Are end-to-end text-to-speech (TTS) models over-parametrized? To what extent can these models be pruned, and what happens to their synthesis capabilities? This work serve...Show More

Metadata

Abstract:

Are end-to-end text-to-speech (TTS) models over-parametrized? To what extent can these models be pruned, and what happens to their synthesis capabilities? This work serves as a starting point to explore pruning both spectrogram prediction networks and vocoders. We thoroughly investigate the tradeoffs between sparsity and its subsequent effects on synthetic speech. Additionally, we explore several aspects of TTS pruning: amount of finetuning data versus sparsity, TTS-Augmentation to utilize unspoken text, and combining knowledge distillation and pruning. Our findings suggest that not only are end-to-end TTS models highly prunable, but also, perhaps surprisingly, pruned TTS models can produce synthetic speech with equal or higher naturalness and intelligibility, with similar prosody. All of our experiments are conducted on publicly available models, and findings in this work are backed by large-scale subjective tests and objective measures. Code and 200 pruned models are made available to facilitate future research on efficiency in TTS¹.

Published in: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date of Conference: 23-27 May 2022

Date Added to IEEE Xplore: 27 April 2022

ISBN Information:

ISSN Information:

DOI: 10.1109/ICASSP43922.2022.9747728

Conference Location: Singapore, Singapore

Contents

References is not available for this document.

On the Interplay between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

On the Interplay between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Authors

Figures

References

Citations

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?