Sequence-Aware Learnable Sparse Mask for Frame-Selectable End-to-End Dense Video Captioning for IoT Smart Cameras | IEEE Journals & Magazine | IEEE Xplore