Jointly Localizing and Describing Events for Dense Video Captioning | IEEE Conference Publication | IEEE Xplore