MIMO: A medical vision language model with visual referring multimodal input and pixel grounding multimodal output | IEEE Conference Publication | IEEE Xplore