Spectrogram Analysis Via Self-Attention for Realizing Cross-Model Visual-Audio Generation | IEEE Conference Publication | IEEE Xplore