VicTR: Video-conditioned Text Representations for Activity Recognition | IEEE Conference Publication | IEEE Xplore