Enhancing Vision-Language Pre-Training with Rich Supervisions | IEEE Conference Publication | IEEE Xplore