Dual-Scale Alignment-Based Transformer on Linguistic Skeleton Tags for Non-Autoregressive Video Captioning | IEEE Conference Publication | IEEE Xplore