GMRNet: A Novel Geometric Mean Relation Network for Few-Shot Very Similar Object Classification

With the widespread use of Deep Learning (DL), the use of DL has increased to provide a solution to the problem of object recognition and classification. In addition to classifying many different types of objects, the Deep Metrics Learning(DML) technique is effective in classifying objects that are visually very similar to each other. In this study, a novel Relation Network (RN) based DML has been designed to classify objects in two different datasets we created. We distinguished groups of objects that had a high degree of similarity to each other. These objects have been categorized using few-shot learning(FSL) since they are quite similar to one another. The impact of changing the number of classes and samples in the database on the network’s performance has been studied. It is shown how the network’s accuracy varies depending on the N-way (number of classes) and K-shots (number of samples) combinations used in its design. Additionally, the performance of the network has improved by an average of 15% thanks to the contribution of the recently introduced geometric mean module to the RN in our study. The accuracy rate of our recommended RN in screw and spare parts datasets is 96.1% and 92.3%, respectively. The first dataset consists of 1800 screw images with 18 classes, while the second dataset consists of 4100 spare parts images with 20 classes. The effectiveness of our method is expressed by the two datasets that we have extensively experimentally studied.

ria as criteria. The similarity between embedded features 28 The associate editor coordinating the review of this manuscript and approving it for publication was cheng Chin .
in the images is calculated with the distance learning func-29 tion [1]. Some studies on object classification using DML are 30 mentioned. 31 Image segmentation was performed with DML and few-32 shot learning [2]. The CNN network was used as a feature 33 extractor of the VGG16 network and contrastive loss was 34 used as a loss function. With 120,000 iterations and SGD 35 optimizer parameters, 1-way 1-shot accuracy was 45.8%. 36 In another study [3], deep feature mapping learning was pro-37 posed for low-dimensional person image embedding. The 38 feature presentation and distance metric were combined in 39 the proposed architecture. The GoogLeNet architecture was 40 used. The significance of the proposed work, the addition 41 of the global loss function to the objective function, and the 42 comparison of positive and negative samples were assigned 43 The geometric mean module is added to the standard RN, 97 which improves performance. The number of data samples 98 available in FSL is limited. The designed network has been 99 tested with combinations of N = 3,5,7,10 and K = 5,10 100 values. 101 The following are the study's main contributions: 102 • The GMRNet is intended to classify objects that have 103 a high degree of similarity. The newly created net-104 work is innovative and effective. In terms of network 105 performance, it is discovered to be more successful 106 than [9], [10], [11], and 12].

107
• Two different real datasets have been created to test 108 the performance of the designed new network. Both 109 datasets were generated from real-world industry data. 110 The majority of the other studies in the literature uti-111 lized preexisting datasets.

112
• It will be practical for industrial processes to distin-113 guish between highly similar objects in the datasets 114 produced by our study using computer vision algo-115 rithms. corresponding to the same class. This is how it intends to learn 137 data feature vectors. The distance metric is a new data format 138 that uses sample similarity to provide a more meaningful and 139 powerful discriminating model.

140
Traditional machine learning approaches are limited in 141 their ability to process raw data. Deep learning is not required 142 for preprocessing or feature extraction. The DL teaches itself 143 high-level features. Euclid, Mahalanobis, Matusita, Bhat-144 tacharyya, and Kullback-Leibler distances are the most com-145 monly used similarity measures for data classification [1]. 146 DML uses deep architectures to learn from raw data and 147 provides embedded feature similarity finding via nonlinear 148 subspace learning [18], [19]. It is proportional to the distance 149 between samples. As shown in Figure 1, it narrows the gap 150 between similar samples while widening the gap between 151 different objects. The deep learning metric loss function is 152 used to accomplish this.
The aim of this study is to classify two different real datasets.  Figure 3 that we designed in 205 our study produces the feature map f ϕ (x) of a given x image. 206 As the images from the designed network are processed, 207 g ϕ (x) is obtained.

211
The samples x q from the query set Q and x ij from the training 212 set T are obtained using f ϕ (x). Before passing the obtained x q 213 and x ij to the relation module, the geometric mean values are 214 calculated as in (5) and x m is obtained. Geometric averages 215 represent the central disposition in the set of points used as 216 input. As a result, the relationship between the two inputs is 217 determined. The relation module g ϕ (x) receives the x m , x q , 218 and x ij embeddings. A similarity scalar value in the range of 219 0-1 is generated based on the inputs. As in Figure 5, a relation 220 237

240
On the basis of DML and FSL, a new RN has been formed. 241 We test the suggested network model using two different sets 242 of our own data. The visual similarity of the objects in these 243 datasets is their key characteristic. By collecting the similarity 244 score between the objects, the built network has made it eas-245 ier to classify these objects. In-depth comparisons are made 246 VOLUME 10, 2022       Table 4 compares the existing network and GMRNet per-299 formance on three different datasets. The first two of these 300 VOLUME 10, 2022   In Table 5, the GMRNet algorithm is compared with the   In addition, transfer learning algorithms that are frequently 315 used in the literature have been preferred and applied on 316 two datasets. The obtained experimental results are added to 317 Table 5. Thus, the proposed method in this study is compared 318 with many existing methods. In Table 5, 'Acc', 'P', and 'R' 319 represent accuracy, precision, and recall, respectively. When 320 the table is examined, the fact that the accuracy rates and 321 is slightly lower than screw dataset, due to the higher number 336 of classes in the dataset. The higher the Auc value in the 337 spare dataset, the better the classes are distinguished from 338 each other and the less similarity between the classes in the 339 spare dataset than the screw. The relation network generated 340 the best results among the meta-learning methods, matching, 341 prototypical, and relation networks, while the prototypical 342 network obtained the worst results. The matching network 343 compares image pairs, whereas the Relation network forms 344 better results by establishing a relationship between image 345 pairs. In cases where each image has different lighting and 346 exposure, measuring the distance by comparing the images 347 in the matching network and the prototypical network may 348 result in incorrect labeling. Unlike the matching network, 349 the prototype image produced for each class is the second 350 of the images compared in the prototypical network. This 351 image is frequently the mean of the images in the class. 352 The prototypical network, on the other hand, yields less suc-353 cessful results. GMRNet, on the other hand, accomplishes 354 better than the standard relation network and is detailed in 355 Table 5.  formance by allowing the model to learn more effectively. 371 We worked on N = 3, 5, 7, and 10 ways and K = 5 and 372 10 shots in this study. Thus, we obtained detailed experi-373 mental results. In this study, we developed a new geometric 374 mean-based RN model. The addition of a new module, the 375 Disposer mode, distinguishes this model from the standard 376 RN model. The developed GMRNet was also tested on our 377 datasets after data augmentation. As a result, in these datasets, 378 GMRNet achieved results of 92.3% and 96.1%, respectively, 379 compared to the standard RN model's 72.8% and 86.5%. 380 This demonstrates how well the model we created performs 381 in the categorization process. GMRNet was carried out for 382 classification using t-SNE visualization. GMRNet isolates 383 data points into distinct classes more successfully than regular 384 RN when the generated images are inspected. The datasets 385 we used for this study also include actual images of indus-386 trial processes. In this respect, our study outperforms other 387 studies.