Enhancing Video-Language Representations With Structural Spatio-Temporal Alignment | IEEE Journals & Magazine | IEEE Xplore