Cross-Aware Early Fusion With Stage-Divided Vision and Language Transformer Encoders for Referring Image Segmentation | IEEE Journals & Magazine | IEEE Xplore