Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation | IEEE Conference Publication | IEEE Xplore