Optimization of Multimodal Inputs Based on Diffusion Models: Zero-Shot Semantic Image Generation | IEEE Conference Publication | IEEE Xplore