Multimodal Token Fusion for Vision Transformers | IEEE Conference Publication | IEEE Xplore