Is CLIP the main roadblock for fine-grained open-world perception? | IEEE Conference Publication | IEEE Xplore