Self-Supervised Audio-Visual Representation Learning for in-the-wild Videos | IEEE Conference Publication | IEEE Xplore