Video Scene Searching using Modified Bi-Modal Transformer | IEEE Conference Publication | IEEE Xplore