UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal Prompts | IEEE Conference Publication | IEEE Xplore