Cross-modal Spectral Fusion Model for Referring Video Object Segmentation | IEEE Conference Publication | IEEE Xplore