I. Introduction
The success of ImageNet has enabled a standard paradigm of image recognition. Specifically, neural networks are often first pretrained on ImageNet to obtain a set of pretrained weights [e.g., in Fig. 1(a)]. Then, these pretrained network weights are further finetuned on a smaller, task-specific data set to obtain the final optimal weights [e.g., , and in Fig. 1(a)]. Such a paradigm has led to state-of-the-art performance in almost all computer vision tasks, including person re-identification (re-ID) [1], human attribute recognition (e.g., age estimation and gender recognition) [2], and image classification [3].
Comparison between (a) WP&F and (b) proposed framework of NTAA. In WP&F, only network weights are transferred to the downstream tasks, e.g., from in a source task to , and in the target tasks. While in our NTAA, both the network weights and architecture are transferred to the downstream tasks, e.g., from conv, in a source task to , , and in the target tasks.