I. Introduction
Text-to-image person retrieval is a task that involves retrieving a person of interest from a large image gallery that best matches a given textual description query [1]. Textual descriptions provide a natural and comprehensive way to describe a person's attributes and are more easily accessible than images. As a result, text-to-image person retrieval has received increasing attention in recent years, benefiting a variety of applications from personal photo album searches to public security.