Deep multimodal semantic embeddings for speech and images | IEEE Conference Publication | IEEE Xplore