Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos | IEEE Conference Publication | IEEE Xplore