MVC: Multi-stage video caption generation model based on multi-modality | IEEE Conference Publication | IEEE Xplore