12-in-1: Multi-Task Vision and Language Representation Learning | IEEE Conference Publication | IEEE Xplore