Clover: Towards A Unified Video-Language Alignment and Fusion Model | IEEE Conference Publication | IEEE Xplore