Rethinking Video Sentence Grounding From a Tracking Perspective With Memory Network and Masked Attention | IEEE Journals & Magazine | IEEE Xplore