End-to-End Dense Video Captioning with Masked Transformer | IEEE Conference Publication | IEEE Xplore