Referring Segmentation in Images and Videos With Cross-Modal Self-Attention Network | IEEE Journals & Magazine | IEEE Xplore