Gaining Extra Supervision via Multi-task learning for Multi-Modal Video Question Answering | IEEE Conference Publication | IEEE Xplore