Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos | IEEE Conference Publication | IEEE Xplore