Towards Efficient Cross-Modal Visual Textual Retrieval using Transformer-Encoder Deep Features | IEEE Conference Publication | IEEE Xplore