End-to-End Video Captioning Based on Multiview Semantic Alignment for Human–Machine Fusion | IEEE Journals & Magazine | IEEE Xplore