Detecting Expressions with Multimodal Transformers | IEEE Conference Publication | IEEE Xplore