Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens | IEEE Conference Publication | IEEE Xplore