VideoGrounding-DINO: Towards Open-Vocabulary Spatio- Temporal Video Grounding | IEEE Conference Publication | IEEE Xplore