Vision-Language Relational Transformer for Video-to-Text Generation | IEEE Journals & Magazine | IEEE Xplore