Enhancing Video Music Recommendation with Transformer-Driven Audio-Visual Embeddings | IEEE Conference Publication | IEEE Xplore