Deep Latent Space Learning for Cross-Modal Mapping of Audio and Visual Signals | IEEE Conference Publication | IEEE Xplore