A Masked Reference Token Supervision based Iterative Visual-language Framework for Robust Visual Grounding | IEEE Journals & Magazine | IEEE Xplore