Sketch, Ground, and Refine: Top-Down Dense Video Captioning | IEEE Conference Publication | IEEE Xplore