Multimodality Self-distillation for Fast Inference of Vision and Language Pretrained Models | IEEE Journals & Magazine | IEEE Xplore