Variational Stacked Local Attention Networks for Diverse Video Captioning | IEEE Conference Publication | IEEE Xplore