Shift Swin Transformer Multimodal Networks for Action Recognition in Videos | IEEE Conference Publication | IEEE Xplore