Finding "It": Weakly-Supervised Reference-Aware Visual Grounding in Instructional Videos | IEEE Conference Publication | IEEE Xplore