Leveraging Local Temporal Information for Multimodal Scene Classification | IEEE Conference Publication | IEEE Xplore