Generating Speech with Prosodic Prominence based on SSL-Visually Grounded Models

Generating Speech with Prosodic Prominence based on SSL-Visually Grounded Models | IEEE Conference Publication | IEEE Xplore