Learning Contextual Tag Embeddings for Cross-Modal Alignment of Audio and Tags | IEEE Conference Publication | IEEE Xplore