Video-Text Representation Learning via Differentiable Weak Temporal Alignment | IEEE Conference Publication | IEEE Xplore