Skip to Main Content
Protein sub-cellular localization is a central problem in understanding cell biology and has been the focus of intense research. In order to predict localization from amino acid sequence a myriad of features have been tried: including amino acid composition, sequence similarity, the presence of certain motifs or domains, and many others. Surprisingly, sequence conservation of sorting motifs has not yet been employed, despite its extensive use for tasks such as the prediction of transcription factor binding sites. Here, we flip the problem around, and present a proof of concept for the idea that the lack of sequence conservation can be a useful feature for localization prediction.