Video OWL-ViT: Temporally-consistent open-world localization in video | IEEE Conference Publication | IEEE Xplore