MMTF: Multi-Modal Temporal Fusion for Commonsense Video Question Answering | IEEE Conference Publication | IEEE Xplore