Conditional Video-Text Reconstruction Network with Cauchy Mask for Weakly Supervised Temporal Sentence Grounding | IEEE Conference Publication | IEEE Xplore