Transformer Based Multimodal Scene Recognition in Soccer Videos | IEEE Conference Publication | IEEE Xplore