Distributed Audio-Visual Parsing Based On Multimodal Transformer and Deep Joint Source Channel Coding | IEEE Conference Publication | IEEE Xplore