Intern VL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

Intern VL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks | IEEE Conference Publication | IEEE Xplore