A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions | IEEE Conference Publication | IEEE Xplore