Learning Social Relationship From Videos via Pre-Trained Multimodal Transformer | IEEE Journals & Magazine | IEEE Xplore