Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos | IEEE Conference Publication | IEEE Xplore