DepthFormer: Multimodal Positional Encodings and Cross-Input Attention for Transformer-based Segmentation Networks | IEEE Conference Publication | IEEE Xplore