Text-Guided Object Detector for Multi-modal Video Question Answering | IEEE Conference Publication | IEEE Xplore