1. Introduction
Deep metric learning (DML) seeks embeddings that al-low a predefined distance metric to not only express se-mantic similarities between training samples, but to also transfers to unseen classes. The ability to learn compact image representations that generalize well and transfer in zero-shot manner to unseen test data distributions is crucial for a wide range of visual perception tasks such as visual retrieval [51], [63], image classification [44], [80], [88], clustering [7], [28], or person (re-)identification [11], [27], [69].