Multi-Modal Representation Learning with Text-Driven Soft Masks | IEEE Conference Publication | IEEE Xplore