Towards Surveillance Video-and-Language Understanding: New Dataset, Baselines, and Challenges | IEEE Conference Publication | IEEE Xplore