Multimodal Representation Learning by Hybrid Transformer with Multi-level Fusion | IEEE Conference Publication | IEEE Xplore