Textual Tokens Classification for Multi-Modal Alignment in Vision-Language Tracking | IEEE Conference Publication | IEEE Xplore