Cross-modal Manifold Cutmix for Self-supervised Video Representation Learning | IEEE Conference Publication | IEEE Xplore