A Few-Shot Learning Method Using Feature Reparameterization and Dual-Distance Metric Learning for Object Re-Identification

Many object re-identification (Re-ID) methods that depend on large-scale training datasets have been proposed in recent years. However, the performance of these methods degrades dramatically when insufficient training data are available. To address this challenging problem, we propose a few-shot object re-identification (FSOR) method that enhances the generalization and discrimination abilities of object Re-ID models trained on small datasets. This method applies two novel techniques: reparameterization for feature vectors and dual-distance metric learning. The reparameterization mechanism transforms the primary feature vector of each input image into a Gaussian distribution to enhance the robustness of the FSOR method when performing object Re-ID tasks. The dual-distance metric learning technique, called H&C learning, considers both the hard mining distance and the center-point distance between each query sample and each support set of different object identities. H&C learning extracts the characteristics of the entire training dataset more precisely than other approaches and thus improves the discriminative abilities of object Re-ID models. Extensive experiments on both person and vehicle Re-ID datasets, such as Market-1501, DukeMTMC-ReID, CUHK03, and VeRi-776, show that the FSOR method has improved performance and outperforms state-of-the-art methods when the amount of labeled training data is small.


I. INTRODUCTION
As the demand for intelligent video surveillance has increased, object Re-identification (Re-ID), which retrieves an object of interest from a large image gallery dataset across multiple nonoverlapping cameras, has become an important computer vision task. This task is challenging due to different camera viewpoints [1], varying image resolutions [2], illumination changes, unconstrained poses [3], image occlusion, and significant background changes. Generally, building an object Re-ID system for a specific scenario requires five main steps [4]. The data collection step involves the collection of video data from multiple nonoverlapping cameras, but these raw data are likely to contain considerably complex and noisy background clutter. The object extraction step extracts object bounding boxes from the collected video data through an object detection or tracking algorithm. The data annotation The associate editor coordinating the review of this manuscript and approving it for publication was Lefei Zhang . step labels the extracted object images; this is usually a timeconsuming process. The model training step constructs a discriminative and robust Re-ID model using the annotated object images. The object retrieval step generates a ranked list of object images from a large-scale gallery dataset for a given query regarding an object of interest by sorting based on the similarity between the query image and each gallery image. Note that the data annotation and model training steps are invoked only during the learning phase.
Most existing object Re-ID models depend on large-scale labeled training data to learn how to distinguish between objects with different identities. Obtaining this large amount of labeled training data requires tedious data collection and time-consuming annotation processes, which lead to poor scalability in real-world Re-ID applications. It is challenging to annotate the identities of objects in a large-scale crosscamera dataset because many similar objects exist in the dataset and because indistinguishable object images are captured by the cameras under varying conditions. These factors make scaling a Re-ID system into a large camera network difficult. Because the majority of Re-ID datasets provide few images for each individual object, deep learning-based models usually suffer from a lack of training data and performance degradation resulting from overfitting. Furthermore, some object IDs in the testing dataset likely will not appear in the training dataset in a few training data scenarios. Therefore, a method for training object Re-ID models should possess excellent generalizability, allowing the models to identify object IDs that are not included in the training dataset.
To tackle these problems, one intuitive approach is to use transfer learning [5] to retrain an existing model for a different application with a new dataset. However, when applying transfer learning for object Re-ID, the constructed model can still easily overfit the new training data. Another approach is to use a few-shot learning scheme [6] that can rapidly generalize the trained model from only a few labeled samples for each target object class. However, an effective few-shot object Re-ID model must have sufficient generalization and discrimination abilities. Good generalizability can allow the model to avoid the overfitting problem when utilizing limited training data and enables the Re-ID model to identify objects that do not appear in the training set. Good discriminability allows the Re-ID model to learn discriminative features and handle drastic viewpoint changes with few training data.
The proposed few-shot object re-identification (FSOR) method is a few-shot learning approach that applies some novel techniques to construct object Re-ID models with superior generalization and discrimination abilities. First, a reparameterization approach, which is derived from the concept of the reparameterization trick proposed for variational auto-encoders (VAEs) [7], is adopted to transfer the primary feature vector of each input image extracted by a convolutional neural network (CNN) (a ResNet-50 model) into a Gaussian distribution. This approach avoids overfitting with the constructed Re-ID model and enhances its generalizability when re-identifying nontrained objects. Second, we propose a dual-distance metric learning approach that evaluates both the hard-sample distance and the center-point distance between the support dataset of each object identity and the query sample. This approach, called H&C learning, is useful for enhancing the discrimination ability of the constructed Re-ID model. In addition, we apply two data augmentation approaches to increase the richness and diversity of the training data for FSOR. Padding and random crop approaches are used to make the trained Re-ID model more adaptable to the positions of identified objects in the images. A random erasing approach [8] is used to increase the robustness of the trained Re-ID model for image object occlusion. We also adopt some training tricks to improve the learning results. A warming-up learning rate strategy [9] is adopted to bootstrap the FSOR model by dynamically changing the learning rate during training. A label smoothing strategy [10] is used to prevent the Re-ID model from overfitting the object IDs in the training dataset by changing the prediction logits term in the ID loss to reduce the weight of the ground truth label's ID prediction logits, q i , which is the output of the neural network, can be computed by exp (p i ) / K l=1 exp(p l ), where p i denotes the predicted score for class i and K denotes the number of labels.
In summary, the main contributions of this paper include the following: • The FSOR method can efficiently construct object Re-ID models without tedious data collection and time-consuming annotation processes.
• FSOR guarantees model discrimination and generalization abilities when performing object Re-ID tasks through a novel few-shot learning model that includes two performance-improving mechanisms: a reparameterization mechanism that causes the FSOR approach to be more adaptive to re-identification for objects not included in the training data and the H&C metric learning mechanism, which makes the FSOR method more discriminative.
• The superior performance of FSOR is confirmed with various widely-used person Re-ID and vehicle Re-ID datasets and a comparison with some state-of-the-art methods.

II. RELATED WORKS
Recent studies on object Re-ID have mostly focused on deep CNNs, which learn the identity-discriminative features of object images. The most commonly used feature representation methods can be classified into global and local representation schemes [11], [12]. Global schemes extract features that represent entire object images, while local schemes extract features that represent critical parts of object images.
The deep learning methods that use global features for object Re-ID can be roughly divided into two main types according to the loss functions that they use: classification loss or metric loss functions [13]. When using the classification loss function, a Re-ID model is trained with the same object identities as those in the image classification task. For example, ID-discriminative embedding (IDE) [14] combines both an identity model and a verification model to train a Re-ID model. CamStyle [15] uses a generative model to perform data augmentation, which changes the image style between different cameras. It also uses classification loss for the Re-ID model. Several learning methods that use metric loss functions have also been proposed to construct object Re-ID models. Wang et al. [16] proposed a network model to extract feature maps with multiple scales from different stages of the backbone network and utilized the acquired feature maps to obtain an advanced result. TriNet [17] proposed a batch selection method for hard triplet samples to train a person Re-ID model according to the triplet loss. This type of loss function learns the relationships between triplet samples, including an anchor sample, positive sample and negative sample, from a distance function that measures the similarity between a pair of samples [18]. In [19], a robust person Re-ID model was learned with a Fast-Approximated Triplet (FAT) loss that converts a point-wise triplet loss into a VOLUME 9, 2021  [20] evaluates the hard mining distances of hard samples in each identity to construct an object Re-ID model. The BagTricks [21], [22] model was proposed, which combines a classification loss and deep metric loss to achieve better performance. The LiftedStructure [23] is a method that lifts the vector of pairwise distances within the batch to the matrix of pairwise distances. It helps to learn the state-of-the-art feature embedding by optimizing a novel structured prediction objective on the lifted problem. More recently, Sun et. al [24] proposed a circle loss to maximize the within-class similarity and minimize the between-class similarity. Proxy anchor [25] combines the advantages of pair-based and proxy-based loss. It can boost the speed of convergence and is robust against noisy labels and outliers. Khosla et. al [26] proposed a fully-supervised contrastive method to effectively leverage label information. The asymmetric weighted logistic metric learning (AWLML) [27] constructs a logistic metric-learning approach that uses an objective function with a positive semidefinite constraint to learn the metric matrix from a set of labeled samples. Then, an asymmetric weighted strategy is adopted to solve the unbalance problem between the number of target and background samples.
To avoid the need for a large, labeled training dataset, Xin et al. [28] proposed a self-paced multi-view clustering (SPMVC) method, which is a semi-supervised person Re-ID model trained with a small amount of labeled data and a large amount of unlabeled data. SPMVC performs the object Re-ID task using a heterogeneous set of CNNs initialized by the labeled training samples. Then, these models assign pseudo labels to the unlabeled training data step by step to further fine-tune all the constructed CNNs together with the original labeled training samples. In contrast, fewshot Re-ID models use only a labeled dataset with small amount of data for training. Few-shot learning [6] aims to enhance model generalizability and avoid overfitting while retaining a good discriminative capability. The existing fewshot learning methods can be roughly divided into modelbased, metric-based, and optimization-based categories. For example, the memory-augmented neural network model [29], which uses external memory as short-term memory and slowly updated weights as long-term memory, is a modelbased method. This model learns strategies for storing expressions in memory and learns how to use these expressions to make predictions. Metric-based methods learn the relationships between samples of different object classes by training an end-to-end few-shot classifier with a nonparametric scheme. In contrast, a parametric scheme must optimize tens of thousands of parameters in the neural network classifier; therefore, it will almost certainly overfit in situations with few data samples. Matching networks [30], prototypical networks [31] and relation networks [32] are some examples of metric-based methods. Unlike conventional transfer learning, optimization-based methods learn a beneficial common initialization for transfer learning, such as model-agnostic metalearning (MAML) [33]. In summary, Table 1 lists the type of method, main idea, and research gap of existing methods. Our method is proposed to overcome the research gaps on the overfitting and discrimination abilities in existing methods.

III. LEARNING FRAMEWORK FOR FEW-SHOT OBJECT RE-IDENTIFICATION (FSOR)
As mentioned in sections I and II, there are several problems with object Re-ID in cases with few labeled data. When the amount of training data decreases, the model overfits these few data, resulting in low generalization ability. In addition, most existing object Re-ID models suffer from low discrimination ability. Our FSOR method aims to solve these problems to create object Re-ID models with superior generalization and discrimination abilities. The whole flow chart of FSOR is illustrated in Figure 1. The learning framework of FSOR is illustrated in Figure 2, and its detailed learning procedure is illustrated in Figure 3. The input images are first augmented using random erasing, padding, and random crop techniques. Next, the backbone network (ResNet-50 in this case) extracts primary feature vectors, and then a reparameterization mechanism is used to transform these primary feature vectors so that they conform to a Gaussian distribution. This process forms a continuous feature space that allows the object Re-ID model to be more generalizable and cover object images not included in the training dataset. During the learning phase, the training samples in each batch are divided into a query set and support sets containing different object identities. The H&C metric learning mechanism is first invoked to acquire the relationships between each query sample and the support sets. This learning mechanism possesses the ability to make the feature vectors of objects with the same identity closer while making those with different identities further apart. Then, a batch normalization layer is used to separate the feature vectors used for the metric loss and classification loss (ID loss) [22] because they are inconsistent in a single embedding space. The batch normalization layer optimizes these two losses in two different embedding spaces. Finally, a fully connected layer with a softmax function is implemented as a classifier to learn the association between each sample and its identity. To improve the learning efficiency of the model, the ID loss with label smoothing is used to predict the identity of each image. The KL loss, metric learning loss, and ID loss are all referred to when fine-tuning the parameters of the feature extractor F and the linear layers l mean and l variance . However, when finetuning the parameters of the fully connected layer, only the ID loss is referred to. The testing procedure will be introduced in part IV.

A. FEATURE VECTOR REPARAMETERIZATION
To enhance the generalization ability of the model, we use the concept of reparameterization trick proposed in variational auto-encoders (VAEs) [7], which force the feature distribution to follow the normal distribution, to make the model adapt to the data not seen in the training set. As shown in Figure 2, the FSOR method uses a ResNet-50 [34] model pretrained with ImageNet as the backbone network for feature extraction. This network receives a 256 × 128 input image and outputs a 2048-dimensional feature vector. To enrich the feature granularity, the last spatial down-sampling operation of the ResNet-50 backbone is removed, that is, the last stride is reduced from 2 to 1. This removal increases the spatial resolution of each feature map from 8 × 4 to 16 × 8. For each feature vector f acquired from the ResNet-50 backbone, the proposed reparameterization mechanism (R) invokes two independent linear layers to generate the µ and σ vectors . The H&C metric learning method. The query-center distance is useful for conforming to the property of the whole support set, and the hard mining distance is useful for making the query sample closer to the ''hard'' sample, which can improve the convergence speed of the model.
using (1) and (2), respectively: where w µ and b µ are the trainable parameter and bias of the linear layer l mean used to generate µ, respectively, while w σ and b σ denote the trainable parameter and bias of the linear layer l variance used to generate σ , respectively. Using µ, σ and an additional noise vector v sampled from a normal distribution, a new feature vector z is generated by (3): where µ and exp(σ ) denote the mean and variance of a Gaussian distribution devoted to f , respectively. The exponential operation of σ is invoked to ensure a non-negative variance. All the µ, σ , v, and z vectors have 2,048 dimensions. To make all the z vectors conform to a Gaussian distribution, we attempt to minimize the Kullback-Leibler (KL) divergence between the µ and σ vectors according to the KL divergence loss L KL , also used in VAEs, as shown in (4a) and (4b), where bs denotes the number of training samples and J is the dimensionality of z, µ and σ .

B. H&C METRIC LEARNING
To enhance the discriminative ability of the FSOR method on the object Re-ID task, we develop the H&C metric learning method, which learns a distance metric that can precisely determine the similarities between objects. This method uses the negative log likelihood (NLL) loss, which simultaneously evaluates both the hard mining distance and the query-center distance.

1) HARD MINING DISTANCE
The hard mining distance is used to find hard samples in each batch to produce substantial gradients from very few data points. Using hard samples rather than randomly selected samples for model training can speed up the convergence speed as mentioned in [20]. This is because the model can obtain more useful information and be guided to put in more effort to efficiently reduce the loss value when it is trained with hard samples. Thus, the convergence speed of model learning can be significantly accelerated. For this reason, the hard sample mining is an important aspect of several metric learning methods. For each query sample, the furthest support sample with the same ID is defined as a positive hard sample, while the closest support samples with different IDs are defined as negative hard samples. Then, the hard mining distance between a query sample and the support set of a specific ID D h q m j , S n can be computed as follows: where q m j denotes the query sample with ID m, S n denotes the support set of ID n, s n i ∈ S n , n denotes the ID index, and i denotes the index of a support sample in S n . The distance function dis (x, y) estimates the Euclidean distance between feature vectors x and y as follows: dis (x, y) = d z x , z y = d (R (F (x)) , R (F (y))) , (6) where R is the function in (3) and F is a trainable ResNet-50 feature extractor.

2) QUERY-CENTER DISTANCE
The query-center distance is a set-based distance from the query sample to the center point of the support set of a specific ID. Because the center point represents the properties of the entire support set, the query-center distance is useful for learning the overall relationship between a query sample and the support set for a specific ID. We define the mean of all the samples in the support set as its center point; this approach regards each support sample as having the same influence on the query sample.
The query-center distance D c q m j , S n is computed as follows: where c n denotes the center feature vector of ID n in each batch and is computed as where M denotes the amount of images of ID n in each batch.
In both the hard mining distance and query-center distance schemes, a query sample is assigned the same ID as that of the point closest to it.

3) DUAL-DISTANCE METRIC LEARNING
The H&C metric learning method combines the advantages of the hard mining distance and query-center distance, as shown in Figure 4. The hard mining distance improves the convergence speed, and the query-center distance learns the overall relationships between the query samples and the support sets.
The hard mining distance learns from the hard samples to distinguish similar samples with different IDs or dissimilar samples with the same ID. However, the hard samples cannot represent the properties of the entire set of support samples, so this approach might ignore the influence of other samples in the support set. To address this problem, the H&C method simultaneously learns the overall effect of all the support samples using the query-center distance and learns from the extreme samples based on the hard mining distance. The loss functions for learning with the hard mining distance L HM and the query-center distance L CM are as follows: where n q is the number of query samples, q m j denotes the j th one among the n q query samples with ID m, N is the number of IDs in each batch, and n denotes the ID index form 1 to N. In addition, ε is a hyper-parameter in the query-center distance-based label smoothing process to prevent overfitting when learning with the query-center distance. Label smoothing is not necessary for the hard mining distance process because the hard samples are clearly distinguished from the query samples.
In summary, the total loss of the H&C metric learning function L ML is as follows: where λ hm and λ cm are hyper-parameters.
In addition, we remove the outliers from each support set to avoid noise samples being selected as hard samples, which might lead to an incorrect gradient. We calculate the mean and standard deviation of the distances between all support samples and the center point. Then, each support samples whose distance from the central point is greater than a threshold, as formulated in (11), is ignored when selecting the hard samples: (11) where dis (x, y) is the distance function in (5) and δ is a hyper-parameter.

C. OVERALL TRAINING LOSS
During FSOR training, the hybrid loss function is formulated as follows: where λ kl , λ ml and λ id are the hyper-parameters denoting the weights of the three partial losses. In (12), the KL divergence loss (L KL ) in the reparameterization approach is utilized to make the extracted feature vectors conform to a Gaussian distribution. The H&C metric learning loss (L ML ) allows the FSOR method to learn a precise distance measure between two feature vectors. Finally, the label smoothing ID loss (L ID ) acts as a classification loss for the FSOR model. Figure 5 illustrates the detailed inference procedure of FSOR, and Figure 6 depicts the inference process of FSOR, which retrieves and sorts the input gallery images according to their similarity scores with respect to the query image. During the inference phase, the feature vectors of the query image and all the gallery images are extracted and reparametrized by the same mechanism used during the learning phase. Then, the similarity between the query image and each gallery image is estimated by the Euclidean distance between their feature vectors, as formulated in (6). Finally, a ranked list of candidate gallery images is obtained according to their similarity scores with respect to the query image. In Figure 6, the query image is shown in the red box, and the gallery images that have the same ID as that of the query image are shown in the blue boxes.

V. EXPERIMENTAL RESULTS AND ANALYSIS
In this section, the performance of FSOR is evaluated for both person and vehicle Re-ID. In addition, we embed the FSOR method into two existing person Re-ID models to VOLUME 9, 2021  demonstrate its ability to improve the performance of existing Re-ID models in situations with insufficient training data.

A. DATASET
These experiments use three datasets for person Re-ID and one dataset for vehicle Re-ID. Table 2 lists detailed information regarding these four datasets. Figure 7 shows some examples of training samples, which are selected from the Market-1501, DukeMTMC-ReID, CUHK03 and VeRi-776 datasets. Market-1501 [35] is a person Re-ID dataset that contains 32,668 images of 1,501 identities captured by 6 cameras. It is divided into a training set with 12,936 images of 751 identities and a testing set with 19,732 gallery images of 750 identities and 3,368 hand-drawn query images of 750 identities.
DukeMTMC-ReID (Duke Multi-Tracking Multi-Camera ReIDentification) [36] is a subset of the DukeMTMC dataset for image-based person Re-ID. This dataset contains 34,183 images of 1,404 identities captured by 8 cameras. These images are divided into a training set with 16,522 images of 702 identities and a testing set with 17,661 gallery images of 702 identities and 2,228 hand-drawn query images of 702 identities.
CUHK03 (Chinese University of Hong Kong Re-identification) [37] is a person Re-ID dataset derived from two camera viewpoints. We use the CUHK03-labeled set in our experiments; it contains 12,696 images of 1,467 identities. The dataset is divided into a training set with 7368 images of 767 identities and a testing set with 5,328 gallery images of 700 identities and 1,400 hand-drawn query images of 700 identities.
VeRi-776 (Vehicle Re-identification) [38] is a vehicle Re-ID dataset covering a 1.0 km 2 area over 24 hours. Each vehicle is captured by 2∼18 cameras with different viewpoints, illumination conditions, resolutions, and occlusions. The VeRi-776 dataset contains 49,357 images of 776 different vehicles captured by 20 cameras. It is divided into a training set with 37,778 images of 576 vehicles and a testing set with 11,579 gallery images and 1,678 query images of 200 vehicles.

B. EVALUATION METRICS
As with most existing Re-ID methods, for our experiments with FSOR, we adopt two popular performance evaluation metrics: the cumulative matching characteristic (CMC) curve and mean average precision (mAP). The CMC metric checks the position of the first matching gallery image in the ranked list for each query image and obtains the rank-k accuracy. The rank-k accuracy indicates the probability of correct matching results appearing in the top k in the ranking list. For example, the rank-1 accuracy equals 1 when the label of the first image in the sorted gallery images matches the label of the query image. The mAP metric reflects the positions of all the gallery images that belong to the same object identity as that of the query image as a whole.

C. IMPLEMENTATION DETAILS
We implement all our experiments in PyTorch and use ResNet-50 pretrained with ImageNet as the backbone network of the feature extractor [34]. When training the FSOR model, we use a 5-shot setting for few-shot learning. In the 5-shot setting, we select at most 5 samples for each ID from the training set of each dataset. As summarized in Table 3, we select 3,710, 3,510, 3,835 and 2,880 images from the Market-1501, DukeMTMC-ReID, CUHK03, and VeRi-776 datasets, respectively, as training samples. All the training and testing images input into the FSOR model are resized to ''256 × 128''. For H&C metric learning, the batch size is 80, each batch contains 16 IDs (n q ), and there are 5 samples for each ID (N). Then, we divide the 5 samples of each ID into two parts: 4 samples from the support set (S), and 1 sample acts as the query set (Q). In addition, the warmup learning rate at epoch t is determined as follows: When using the 5-shot setting, all the person Re-ID datasets and the vehicle Re-ID dataset share the same experimental settings described above.    Table 4 shows a comparison between the FSOR method and some state-of-the-art methods in terms of the mAP and rank-1 accuracy when training data are scarce. We repeat the experiments 5 times and show the mean and std of the results that are close to each other for different random seeds. To verify the generalization ability of our FSOR approach, we perform experiments not only on the person Re-ID dataset but also on the vehicle Re-ID dataset. From this table, we can observe that the performances of most existing object Re-ID methods degrade substantially in the experiment under the 5-shot setting. However, the FSOR model achieves the best performance when the training data are scarce, even better than that of SPMVC, which is a semi-supervised method that uses similar amounts of labeled data as those in our method.
The experiments of CamStyle on CUHK03-labeled and VeRi are lack because of that the CamStyle needs some specifically generated data to train the model, but the author of CamStyle did not offer them. It makes that the experiments on Cam-Style cannot be conducted. The experiments of SPMVC on CUHK03-labeled and VeRi are lack because the method does not provide open-source code. We therefore can only show the results that have been showed in the published paper.

2) COMPARISON WITH BASELINE METRIC LEARNING METHODS
To validate the superiority of H&C metric learning when it is used in the construction of object Re-ID models, Table 5 shows a comparison between the H&C method and some baseline metric learning methods. To ensure a fair comparison, only the metric learning loss differs for all the  methods compared in this experiment. From the results listed in Table 5, we can observe that the H&C method achieves the best performance, outperforming the other metric learning methods in terms of its discriminative ability with few labeled training samples. We also visualize the feature distribution to verify the discrimination ability of the model trained with H&C metric learning on the Market-1501 dataset in Figure 8. The model trained with H&C metric learning makes the distances between feature vectors belonging to the same classes closer and the distance between feature vectors in different classes further than those yielded by the model trained without H&C metric learning.

3) ABLATION STUDY AND PARAMETER ANALYSIS
To verify the effectiveness of each component in the FSOR model, we perform an ablation experiment with a 5-shot setting on the same four datasets, as shown in Table 6. First, we build a baseline model that employs the label smoothing ID loss and metric learning based on the hard mining distance. Then, the data augmentation, reparameterization, and H&C metric learning mechanisms are added step by step to investigate the effect of each. The experimental results of the ablation study show that the reparameterization mechanism greatly improves both the rank-1 accuracy and mAP scores by more than 9% on all experimental datasets. The improvement on CUHK03 is much bigger than that on the other sets. This phenomenon may result from CUHK03 being the smallest among all testing datasets. This means that its data distribution is sparser than that of other datasets. Our reparameterization mechanism can greatly improve the situation.
In addition, the influences of different numbers of samples per ID are analyzed. As shown in Figure 9, the performance of FSOR improves as the number of training samples per ID increases. Furthermore, when the FSOR model uses only approximately half of the training data for training (the 9shot setting in Market-1501 uses 48.6% of the training data, the 11-shot setting in DukeMTMC-ReID uses 46.6% of the training data, the 5-shot setting in CUHK03 uses 52.04% of the training data, and the 30-shot setting in VerRi-776 uses 44.0% of the training data), it achieves rank-1 accuracy scores close to those of other existing models using all the training data. In this experiment, the result of proxy anchor method which records proxy information to help the training is unstable because similar training samples with different labels may mislead the proxy information in this method.

VI. CONCLUDING REMARKS
In this paper, a novel few-shot object Re-ID method, FSOR, is presented; it efficiently constructs object Re-ID models without the need for tedious data collection and time-consuming annotation processes. Moreover, it guarantees the discrimination and generalization abilities of object Re-ID models with an efficient few-shot learning model that employs a reparameterization mechanism and a dual-distance metric learning approach, named H&C metric learning. The reparameterization mechanism makes the constructed object Re-ID model more generalizable and adaptive, allowing it to re-identify objects not covered in the training data. The proposed H&C metric learning enhances the discrimination ability of the constructed model by combining the advantages of query-center distance and hard-mining distance. According to our experimental results, both of the reparameterization and H&C metric learning can increase more than 17% mAP in average. In addition, we employ several simple but effective techniques, such as data augmentation, a warmup learning rate, and label smoothing, during the construction and operation processes of the FSOR model.
The extensive experimental results and comparisons show that FSOR effectively improves model performances on object Re-ID tasks when the amount of training data is small. Our method even outperforms a semi-supervised Re-ID method when only a few labeled training data are available and without a large number of unlabeled data. The experimental results of the ablation study show that the reparameterization and H&C metric learning schemes significantly improve the performances of object Re-ID models. We also observe from the experimental results of the metric learning method comparison that the proposed H&C metric learning technique is most suitable for model training when the amount of training data is insufficient for satisfying other approaches. Table 7.