Conferences >ICASSP 2024 - 2024 IEEE Inter...

Controllable Prosody Generation with Partial Inputs

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

We address the problem of human-in-the-loop control for generating prosody in the context of text-to-speech synthesis. Controlling prosody is challenging because existing...Show More

Metadata

Abstract:

We address the problem of human-in-the-loop control for generating prosody in the context of text-to-speech synthesis. Controlling prosody is challenging because existing generative models lack an efficient interface through which users can modify the output quickly and precisely. To solve this, we introduce a novel framework whereby the user provides partial inputs and the generative model generates the missing features. We propose a model that is specifically designed to encode partial prosodic features and output complete audio. We show empirically that our model displays two essential qualities of a human-in-the-loop control mechanism: efficiency and robustness. With even a very small number of input values (~4), our model enables users to improve the quality of the output significantly in terms of listener preference (4:1).

Published in: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date of Conference: 14-19 April 2024

Date Added to IEEE Xplore: 18 March 2024

ISBN Information:

ISSN Information:

DOI: 10.1109/ICASSP48485.2024.10446859

Conference Location: Seoul, Korea, Republic of

Contents

References is not available for this document.

Controllable Prosody Generation with Partial Inputs

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Controllable Prosody Generation with Partial Inputs

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Authors

Figures

References

Citations

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?