Multi-Modal Large Language Models are Effective Vision Learners | IEEE Conference Publication | IEEE Xplore