CLIP4VideoCap: Rethinking Clip for Video Captioning with Multiscale Temporal Fusion and Commonsense Knowledge | IEEE Conference Publication | IEEE Xplore