Few-Shot PCB Surface Defect Detection Based on Feature Enhancement and Multi-Scale Fusion

In printed circuit board (PCB) defect detection, it is difficult to collect defect samples, and the detection effect is poor due to the lack of data. On the basis of the few-shot learning method, a few-shot PCB defect detection model is proposed. This model introduces feature enhancement module and multi-scale fusion module. The feature enhancement module based on the improved convolution block attention module (CBAM) can highlight the key areas of the received feature maps and suppress the interference of useless information. Aiming at the small size of PCB defects, a multi-scale feature fusion strategy is proposed. It can extract multi-scale feature maps of PCB and fuse them into a high-quality feature map containing different scale information, which can improve the detection precision of the model for small object defects. A large number of experiments on PCB dataset show that our few-shot PCB defect detection model outperforms state-of-the-art methods under different shot settings ( $\text{k}=1$ ,2,3,5,10,30). Notably, the proposed model can take into account both detection efficiency and precision, which means it has high practical application value.


I. INTRODUCTION
As the foundation of the modern information industry, PCB is widely used in various high-end equipment manufacturing fields such as computers, communication electronic equipment and military systems [1], [2]. As an important carrier of electrical connection and support, the quality of PCB has great influence on the stability and safety of various high-end equipment products [3]. Therefore, it is particularly important to study the high-quality detection of PCB surface defects and to eliminate them in time.
In the field of the surface defect detection, from the original manual visual method to the traditional machine learning method, the detection effect has made continuous progress [4], [5]. However, the detection precision still cannot meet the needs of modern industrial development. Deep learning has brought unprecedented advances in the surface defect detection [6], and convolutional neural network (CNN) is The associate editor coordinating the review of this manuscript and approving it for publication was Mehul S. Raval . widely used. Various object detection algorithms based on deep learning, such as SSD [7], YOLO [8], Region with CNN features (R-CNN) [9], Fast R-CNN [10], Faster R-CNN [11], etc., have improved the surface defect detection effect to a higher level. In recent years, there are also many excellent models to further improve the precision of object detection algorithm. For example, Zhao et al. [12] proposed an object detection method based on the Larger Scale 'You Only Look Once' Version 4 (LS-YOLOv4) algorithm for detecting the insulators and drop fuses. Zhang et al [13] proposed a novel backbone network named Deep-IRTarget to solve the problems of poor texture information, low resolution and high noise levels in infrared images. This framework superimposes features in frequency domain and spatial domain to construct Dual-domain features. They proposed a Resource Allocation model for Features (RAF) to integrate the features efficiently. This network has achieved remarkable results in infrared target detection and further expanded the application range of object detection algorithms. These algorithms have laid a solid foundation for the development of defect detection. VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ These surface defect detection models based on deep learning often require a large amount of labeled data for training. However, the replacement cycle of PCB production line is short, and the number of defect samples is small. Traditional object detection models often have problems such as over-fitting or under-fitting, which cannot achieve better detection accuracy. The limited training data brings great challenges to defect detection, and also limits the promotion of defect detection methods. With the development and progress of industry, the demand for training high-precision models with limited samples is growing. The few-shot problem has received great attention in the field of defect detection. Therefore, in order to solve the problem of lack of PCB defect samples and promote the development of industrial defect detection, a few-shot PCB surface defect detection algorithm is proposed in this article.
Meta-learning has provided a very good solution for fewshot surface defect detection [14], [15]. By designing an excellent meta-learner, the meta-learning method enables the model to get a great detection effect after a few steps of updating on a very small amount of data. This model has a strong generalization ability to quickly adapt to new tasks. In order to solve the problem of few-shot object detection, Kang et al. [16] and Yan et al. [17]. designed a meta-learner to reweight the feature maps of the novel classes, which can improve the prediction precision for few-shot data. The metalearning Few-Shot Re-Weighting model (FSRW) proposed by Kang et al. [16] is based on YOLOv2 [18]. This model is composed of a meta-feature extraction module, a reweighting module, and a detection module. The feature extractor uses a darknet-19 network to extract the meta-features of the query images. The reweighting module is composed of a lightweight CNN network. The input of reweighting module is the support images and the corresponding location mask, and the output of this module is reweighting vectors which are generated by the support feature maps. The vectors are used to reweight the features of the query images. Finally, the detection module uses the reweighted features of the query images to predict classes and bounding boxes.
Perez-Rua et al. [19] proposed a meta-learning method of center point prediction based on the structure and ideas of CenterNet [20], which can realize incremental learning. However, in the process of PCB defect detection, most methods do not have a strong ability to mine the feature information of key areas, which leads to low detection precision in few-shot defect detection. At the same time, as shown in Fig. 1, for various types of PCB defects, due to the particularity of this products, their defects characteristics are small and their sizes vary, which make it difficult for the model to capture defects and judge the types of defects in the detection process.
Aiming at the above problems, we improve few-shot detection model FSRW of Kang et al. [16] with a few-shot PCB surface defect detection model based on feature enhancement module and multi-scale fusion module (FPFM). The improved model adopts YOLOv3 [21] as the backbone network, and the feature extractor uses the DarkNet-53 network.
Experiments on PCB dataset show that when the number of samples is small, our model has higher detection precision than the original model.
The main contributions of this article are as follows: This article is organized as follows. In the Section 2, we introduced some existing research results of PCB defect detection and few-shot problems. Then, we propose our own few-shot model in the Section 3, and analyze the model specifically. We make a lot of experiments to verify the effectiveness of the model in the Section 4. In the Section 5, we make a summary for our work.

II. RELATED WORK
In this section, we will introduce the related work about PCB surface detection and few-shot problems.
A. PCB SURFACE DEFECT DETECTION PCB surface defect detection has always been a challenging task, and the detection method based on deep learning has great development potential. In view of the different sizes of the defective PCB solder paste, Park et al. [23] improved the traditional convolution neural network. They proposed a double-layer defect detection point network which could detect defects at both the micro and macro semantic levels. Wu et al. [24] proposed a method to identify solder joint defects using Mask R-CNN, which could simultaneously realize the classification, location and segmentation of solder joint defects. Aiming at the problem of high false detection rate and low efficiency of PCB defect detection, Ding et al. [25] proposed a tiny defect detection network (TDD-Net) and achieved good detection results on the public PCB datasets. Then, in order to further improve the efficiency and precision of PCB detection, Adibhatla et al. [26] used YOLOv5 large to detect the defects in PCB, which optimized the detection manpower and time. Their work laid a foundation for PCB defect detection, and also promoted the follow-up research.
However, these models require a great number of samples to train. In industry, the number of PCB defect samples is insufficient. Therefore, it is very important to study the fewshot PCB surface defect detection.

B. FEW-SHOT LEARNING
The goal of few-shot learning is to train the network with a very small number of samples to obtain good performance [27], this method is conducive to solving the problem of the few PCB defect samples. In recent years, few-shot learning methods based on meta-learning have achieved remarkable results. Its focus is that the network can quickly learn when the label data are limited and can be generalized to other new tasks. Specifically, 1) based on fine-tuning, it aims to use a small number of samples to fine-tune these initialization parameters to achieve better results. Finn et al. [28] proposed Model-agnostic meta-learning (MAML) based on the optimizer. He used a brand-new weight optimization method to update the trained initial weights with a small amount of data, which could fit the new data features quickly. 2) Based on a recurrent neural network, it uses external memory to accumulate prior knowledge and then uses it in new tasks to complete classification. Santoro et al. [29] proposed a one-shot learning with memory-augmented neural networks (MANN) which skillfully applied the neural Turing machine to the few-shot learning task. He designed an external memory module to save the information of the feature maps and combined the meta-learning idea to optimize the neural Turing machine to achieve few-shot classification and regression. 3) Based on the metric, it means learning an embedding function which can map the input images into a new space. The images can be classified by similarity measurement. Li et al. [30] proposed an adaptive edge loss function for few-shot learning. He used the semantic information of the data to describe the distance of tasks or categories, so as to achieve the purpose of optimizing the boundary distance. In addition, the graph neural network has a significant effect in few-shot learning methods. Zhang et al. [31] proposed a feature distribution transformation to solve the problem of feature distribution mismatch in graph based few-shot learning. By calculating the optimal class allocation matrix, the classification precision has been further improved. This method further promotes the development of few-shot learning. However, most of these work use few-shot learning to complete classification tasks. Our work will focus on the application of few-shot learning in PCB defect detection.

C. FEW-SHOT OBJECT DETECTION
PCB surface defect detection includes defect classification and defect object detection and we focus on the latter in this article. Therefore, learning about few-shot object detection is conducive to our research. A Low-Shot Transfer Detector (LSTD) [32] and Representative-Based Metric Learning (RepMet) [33] adopt a general migration framework to adapt to few-shot scenes through pre-training detection models. Meta R-CNN [17] and Few-Shot Object Detection and Viewpoint Estimation (FSDet) [38] add basic detectors to perform detection processes similar to Faster R-CNN to solve the few-shot problems. Kang et al. [16] learned the base class meta-features based on the meta-learning method and used the reweighting method to reweight the features to adapt to the novel classes. However, the above methods have not paid attention to the scale problem of samples. Therefore, Wu et al. [40] proposed multi-scale sample thinning operation to solve the scale variance problem in the model and emphasizes the necessity of dealing with the problem of scale change. However, these models lack of focus on the key areas of the images, which leads to their inability to effectively learn the important features of the images. These problems prevent models from achieving good effect when detecting the few-shot PCB surface defects. Therefore, according to the characteristics of PCB surface defects and the existing few-shot object detection model problems, we design a few-shot detection model for PCB surface defects.

III. METHODOLOGY
In our work, the defect detection method based on metalearning is adopted. In this method, we divide the dataset into the base classes C base and the novel classes C novel . The base classes contain enough training samples and annotation information, while the number of samples in the novel classes is small. When organizing data, we generally follow the Episode data organization method proposed by Vinyals [34]. It means that the data entered into the network each time is called a task T i and each task consists . Therefore, we can get the expression for T i : where I q i is the query picture in the detection task of time i, and L i is its corresponding label. I S N i and M I S N i is the support picture and corresponding mask in the task of time i. The training follows the two-phase training strategy proposed by FSRW [16]. The first phase is the base training phase. In this phase, we use the base classes data with sufficient information. Let the network learn to use the information of the support to help the query predict. The second phase is few-shot fine-tuning. Following the k-shot setting, each class in the novel classes contains only k annotations to fine-tune the model.

A. OVERVIEW OF FPFM
In this article, the structure of FPFM, a few-shot PCB surface defect detection model based on feature enhancement and multi-scale fusion, is shown in Fig. 2.
Formally, the query image I is input into the meta-feature extraction module D to obtain meta-features F ∈ R w×h×c with c channels, which can be expressed by: The produced meta features F are enhanced by the feature enhancement module to get enhanced query features F c .
At the same time, the support images and label information are input into the multi-scale feature fusion module to obtain a high-quality feature map F m containing different scales. The feature reweighting module is composed of a lightweight CNN network, which shapes the high-quality feature maps F m into reweighted feature vectors ω i ∈ R c according to classes. The model realizes the fusion of query features F c for novel class i and support features vectors ω i by: where ⊗ denotes channel-wise multiplication. This model inputs the reweighted feature map F i into the detection module to predict the confidence o of the object classes in the image, the position information(x, y, h, w) of the object prediction frames, and the classification score c of the object classes.

B. FEATURE ENHANCEMENT MODULE
In the task of few-shot defect detection, the number of the labeled novel classes is small, the diversity of corresponding categories is poor, and the detection precision of the model is low. Therefore, it is very important to fully mine the information of the samples themselves. In order to fully highlight the important information in the samples and suppress the interference of useless information, a feature enhancement module is introduced in this article. This module is based on the improved convolution block attention module.

1) BASIC STRUCTURE OF CBAM
The basic structure of the convolution block attention module (CBAM) [22]is shown in Fig. 3. The channel attention network filters the channels of input features, while the spatial attention network focuses on the prominent areas in the feature maps.
Specifically, the input of the CBAM module is the metafeatures F ∈ R w×h×c extracted by the feature extractor. We use CBAM to derive a one-dimensional channel attention map M c ∈ R c×1×1 and a two-dimensional spatial attention map M S ∈ R 1×h×w in turn. The overall process can be summarized by: where ⊗ denotes channel-wise multiplication. During multiplication, attention values are propagated accordingly.

2) EL-CBAM
On the basis of the existing CBAM, this article improves the channel attention module and the spatial attention module to obtain Efficient and Lightweight CBAM (EL-CBAM), whose specific structure is shown in Fig. 4. Specific improvement strategies are described from two aspects: channel attention and spatial attention.

a: CHANNEL ATTENTION
In the traditional channel attention module, two fully connected layers are used to distribute the attention weights. However, this method will produce many redundant calculations and cause negative effects. Therefore, this article uses the idea of document [35] for reference and replaces the fully connected layers with a 1D-convolution to reduce the number of parameters and achieve better results. For the input features F, max pooling and average pooling are performed on the basis of channels to obtain two different spatial context descriptors F c max and F c avg . The information of k channels in the channel's neighborhood is aggregated by the 1D-convolution which replaces the original fully connected layers. The convolution kernel length of this 1D-convolution is k. The output features are subjected to element-wise weighting operation and sigmoid activation operation to generate the final M c ∈ R c×1×1 . Finally, M c and the meta-feature map F are multiplied element-wise to generate the intermediate feature F . Therefore, the improved channel attention calculation formula is expressed as follows: where σ denotes the sigmoid function, and f k×k 1D represents a one-dimensional convolution operation with the filter size of k × k. Where the size of k is provided by the equation in [35]: where c denotes the number of feature map channels. γ and b are hyperparameters, which are taken as 2 and 1 in this article.

b: SPATIAL ATTENTION
In traditional spatial attention, the traditional convolution of the 7 × 7 receptive field is used to aggregate spatial features, which has a large number of parameters and also omits a lot of information between channels. Therefore, in this article, the traditional convolution is replaced by the depthwise separable convolution. This convolution method has fewer parameters under the same receptive field size. It can realize information exchange between channels, which can help the attention mechanism to notice rich and important information. We do max pooling and average pooling on its c-dimensional channel of the intermediate feature F with its spatial position as the unit to obtain two maps F S max and F S avg ,and then splice the two maps. The dimension of the spliced tensors is reduced to a channel of w × h × 1 by convolution with the depth-wise separable convolution. After the sigmoid function, M S ∈ R 1×h×w is generated. Finally, M S and the meta-feature map F are multiplied element-wise to generate the final feature map F c . The improved spatial attention calculation formula is expressed as follows: where σ denotes the sigmoid function, and f 5×5 depth represents depth-wise separable convolution operation with the filter size of 5 × 5.

C. MULTI-SCALE FUSION MODULE
The deep convolution network calculates the feature hierarchy layer by layer and generates feature maps with different spatial resolutions within the network. The high-resolution feature map has weak semantics and strong structure, while the low-resolution feature map has strong semantics. PCB defects have the characteristics of small size. Therefore, the bi-directional feature pyramid network (BiFPN) [37] is used, and we proposed the BiFPN and fusion strategy (BI-FU) to solve the problem of small objects. The module structure is shown in Fig. 5.

1) BI
In order to extract the multi-scale features in the support more effectively, the algorithm uses VGG16 [36] and BiFPN layer [37] as the basic blocks of the multi-scale feature extraction part. VGG16 consists of 13 convolutional layers and 5 pooling layers to form 5 Blocks (Block1-Block5), and the size of the feature map in each Block becomes 1/2 of the input size. Compared with ordinary FPN, BiFPN integrates bidirectional cross-scale connections and fast normalization fusion. In the network operation, we put the N images of the support into the multi-scale feature extraction part. This part can generate 5N feature maps from the images. Finally, a plurality of feature maps with different scales are input into the fusion module FU.

2) FU
In the feature fusion part, the network adaptively learns a feature compression vector (a 1 , a 2 , a 3 , a 4 , a 5 ). Specifically, the elements of each feature compression vector are obtained by convolving the corresponding feature map. The corresponding feature map is compressed into a tensor of size 1 × 1 to obtain a k , k=1,2,3,4,5. The BI part generates 5 feature maps from every image. We reweight the 5 feature maps through the feature compression vector, and then through element wise add, the FU part fuses those reweighted feature maps into one feature map. Therefore, the final feature map contains different scale information. This feature map can integrate lowresolution and strong semantic features with high-resolution and strong structural features perfectly, so that the network has a better ability to extract and detect the defect features of small objects in PCB.

D. DETECTION MODULE
The detection module consists of convolution layers and fully connected layers and is used to locate and predict the confidence for the reweighted query picture features. The classification score of the novel classes is given by this module, and the classification score is corrected by the softmax function. The classification score of the class i object class is c i . The actual classification score after correction isĉ i : The loss function of the object category is as follows: where I(·, i) is an indicator function for whether current anchor box really belongs to class i or not. The loss function of bounding box regression is L bbx , and the loss function of objectiveness is L obj . These two loss functions are similar to the loss function defined by YOLOv3.
where S 2 denotes the feature map areas, and B represents the number of bounding boxes. Thus, the overall detection loss function is L det = L c + L bbx + L obj .

E. OTHER MODULES 1) FEATURE EXTRACTOR
We use this module to extract the feature information in the query images. Its input is the query images, and its output is the feature maps F of the query images. We adopt DarkNet-53 network to replace the original model DarkNet-19 network. Compared with DarkNet-19, DarkNet-53 introduces residual structures and uses conv2d to replace maxpooling2d, which increase the network depth and help to extract deeper features.

2) REWEIGHTING MODULE
The module is constructed by light-weight CNN. In our work, the module receives the feature maps F m obtained from BI-FU module. We use the light-weight CNN to shape the feature maps F m into vectors of size 1 × 1 × c. Their feature is embedded in the vector set of specific classes. These vectors are used to reweight the query feature maps F c .

IV. EXPERIMENTS
In this section, we will evaluate the performance of the FPFM model in few-shot PCB defect detection through comprehensive experiments. The model is compared with several most advanced methods [16], [17], [38], [39], [40], [41]. The results are given and analyzed below to show that our model can detect few-shot PCB defects more accurately.
The experiments in this article are based on the 64-bit operating system Ubuntu 18.04 and Python 3.6 under the PyTorch deep learning framework. The CPU is an Intel Core i7-6850K. The reference frequency is 3.60GHz, the GPU is NVIDIA GeForce GTX 1080 Ti, and the memory is 11GB. CUDA 10.1 is used to accelerate the training.

A. DATASETS AND TRAINING SETTINGS 1) DATASETS INFORMATION
Our dataset consists of two parts, the base classes and the novel classes. In this article, the Few-Shot Object Detection Dataset (FSOD) [42] is used as our base classes. The FSOD is a professional few-shot object detection dataset created by Tencent in 2020. It is rich in category diversity, covering 1,000 categories, with a total of 66,502 images and 182,000 annotated boxes. We use FSOD dataset as the base classes, which is conducive to better meta-learning for the model, thereby improving the detection effect. The novel classes are the PCB defect image dataset from the Intelligent Robot Open Laboratory of Peking University. The dataset consists of 693 PCB defect images and corresponding annotation files in total, including 6 defect types, and the types and quantity are shown in the Table 1.

2) TRAINING SETUP
The model in this article adopts a two-phase training strategy. The first phase is the base training phase. In this phase, the support images and query images are obtained from the base classes which have enough labeled information. We randomly  sample images from the dataset to form the query set and the support set. A query set has only one image, and the support set has N images per class. Every image has only one corresponding class's object, and the other object pixels in the image are set to 0. In this process, we use the momentum parameter of 0.9 and the parameter attenuation of 0.0001 to run the SGD optimizer, and the batch size is set to 8. The model uses FSOD as the base classes for meta-learning, and adopts end-to-end training methods. we train the model for 14k, 12k, 10k iterations at the learning rate of 5e-3, 5e-4 and 5e-5 respectively. In few-shot fine-tuning phase, we use the PCB defect dataset as the novel classes for fine-tuning, and we get the data in the same ways as first stage. The difference is that each class in the novel classes only has k annotations (k-shot) for training. In this phase, we train the model for 8k, 6k, 4k iterations at the learning rate of 5e-3, 5e-4 and 5e-5 respectively. In order to balance the difference in the number of samples at this stage, only k annotations are also selected from each base class for fine-tuning. In this article, experiments are carried out under the conditions of k = 1, 2, 3, 5, 10, 30 respectively.

B. COMPARISON OF TESTING RESULTS
In order to verify that the FPFM model can detect PCB surface defects with higher detection precision under the condition of fewer training samples, the FPFM model and several excellent few-shot object detection models are analyzed under different shot settings. The experimental results VOLUME 10, 2022 are shown in the following Table 2 and Table 3. For fair comparison, the training strategies and settings of the comparison models are the same as our model.

1) COMPARISON OF FEW-SHOT RESULTS
In the above table, AP represents the average precision obtained by testing under different settings of IoU (from 0.5 to 0.95, step size 0.05). AP 50 and AP 75 refer to the precision of IoU of 0.5 and 0.75. AP L means the objects' average precision whose bounding box area is larger than 96 * 96, AP S represents the average precision in which the bounding box area is less than 32 * 32, and AP M refers to the average precision whose bounding box area is between the above.
We compare our FPFM model with the recent state-ofthe-art methods. In order to evaluate the performance of the model more comprehensively, we also report the time required for the model tuning process. It can be seen from Table 2 that our method FPFM outperforms recent state-ofthe-art (SOTA) methods in most indicators. For AP, compared with the SOTA methods, our FPFM has achieved about 2.57% and 3.39% performance improvement under 10-shot and 30-shot respectively. When k = 10 and k = 30, AP 50 reached 69.52% and 78.86%, respectively, 7.33% and 5.45% higher than the SOTA methods. In addition, due to the addition of the BI-FU multi-scale feature fusion module, the small object defects precision AP S which is difficult to be optimized is improved by 3.36% in 10-shot and 4.89% in 30-shot, which shows that this module has a great role in the detection of PCB small object defects.
Besides, we also compare the tuning time of our FPFM model and other few-shot object detection models. When k = 10, the time required for the proposed model is not the shortest, but it is very close to SOTA time. When k = 30, the model tuning time can reach the level of the SOTA methods. We analyze the following two main reasons: First, the feature enhancement module improves the adaptation speed of the model, so that the model can achieve better detection results in fewer iterations. Second, the addition of the feature enhancement module and the BI-FU module does not increase the computational complexity of the model too much. The feature enhancement module is built on the basis of the CBAM module. This module has the characteristics of high lightweight and strong versatility, and it itself brings less calculation increment to the model. In addition, we further improve the CBAM module. The inner fully connected layers are replaced by 1D-convolution, and the traditional convolution is replaced by depth-wise separable convolution, which further reduce the parameters generated by the model. For the BI-FU module, the module first extracts the multi-scale feature maps of the PCB images, which increases the computational complexity. However, after that, through feature fusion, the multi-scale information of the feature maps is retained, and the calculation increment of the module is reduced. These improvements not only improve the detection precision of PCB defects, but also ensure the detection efficiency of the model.
Then, further comparative experiments are conducted to verify the few-shot detection effect of our FPFM model under different shot settings (k=1, 2, 3, 5). As shown in Table 3, the AP 50 of our FPFM model can be 3.95%-8.42% better than the state-of-the -art methods under different shot settings. In most other indicators, the FPFM model can also outperform other methods, which shows the general effectiveness of the model under various few-shot settings in PCB defect detection.

2) ADAPTATION SPEED
In order to further verify the detection effect of the model, we further compare the adaptation speed and precision of different methods under the 5-shot setting, as shown in Figure 7. We use the number of iterations required for model convergence to express the adaptation speed of the model. If the current novel class AP no longer exceeds the best recorded AP for consecutive 2000 iterations, we consider that the model has converged, and determine the iteration of the best recorded as the model's adaptation speed. In Figure 7, we show the visualization results of the model training process and the number of iterations when the model converges under the 5-shot setting. The curve shows the fine-tuning results in all periods of the novel class AP, and our proposed method achieves better detection precision. Besides, because of the addition of the feature enhancement module, the key features of the feature maps can be highlighted, which helps the model grasp the key feature information of the images faster, and suppresses the interference of useless information. Therefore, our model exhibits a faster adaptation speed. It takes only 3100 iterations to reach the peak, which is still 400 fewer iterations than the FSRW model with the second fastest convergence speed. More importantly, the FPFM model only needs 2000 iterations to achieve 95% peak performance. This represents that 33% training time can be saved, and the performance is only reduced by 5%. This shows that our model has a better adaptation speed and reflects a strong generalization ability. These properties are very helpful for few-shot PCB defect detection.

3) COMPARISON OF DIFFERENT TYPES OF DEFECTS
In Fig. 8, we compare the FPFM algorithm under 10-shot with four other few-shot object detection algorithms that are also   based on meta-learning. These four models are Meta R-CNN [17], FSDet [38], MetaDet [39], FSRW [16]. We can find that among the 6 types of defects, the detection precision of the missing hole and spurious copper defects is relatively high on all models, while the precision of mouse bite, open circuit, short and spur is relatively lower. The main reason is that the defect features of the missing hole and spurious copper are relatively obvious, the defect size is relatively large, and it VOLUME 10, 2022  is easy to be detected by the model. Open circuit, short, and spur are generally small in size, so they are easily ignored by ordinary models. For mouse bite defects, the defect characteristics vary greatly, and the sizes of mouse bite defects at different positions are also different, so it is difficult for general models to accurately judge the defect type.
From Fig 8, due to the introduction of the BI-FU multiscale fusion module and feature enhancement module, the detection precision of small object defects such as open circuit, short, and spur has been greatly improved by the model, and the detection precision of mouse bite defects with different scales has also been significantly improved. For the detection of the missing hole and spurious copper, although the improvement is small, it is still better than other detection models, indicating that FPFM has the highest comprehensive performance for PCB defect detection. When k = 10, some visual detection results are shown in Fig. 9.

C. ABLATION EXPERIMENT
In this section, we conduct comprehensive ablation experiments to analyze the effect of various modules in the model.

1) EFFECTS OF DIFFERENT BASE CHLASSES
When training the model, we use the FSOD dataset as the base classes for model pre-training. As a control experiment, we select PASVAL VOC [44], [45] and MS-COCO [46] as the base classes for the same pattern of pre-training.

a: FSOD
This dataset is a professional few-shot learning dataset with high category diversity, including 83 parent semantics, which are further divided into 1000 leaf categories with a total of 66502 images. In addition, the dataset contains objects with large differences in object size and aspect ratio, which are very conducive to few-shot learning for the model. We use 1000 subcategories as the base classes to pre-train the FPFM model. PASVAL VOC: The dataset is a small-scale object detection dataset containing 20 object classes with a total of  16,551 images. We use VOC 07-12 as our base classes to pretrain the model for comparison. MS-COCO: The dataset is a large-scale object detection dataset containing 80 categories, with a total of 123287 images. We selected the 80 category images as our other group of comparison base classes and pre-trained the model. For fair comparison, the novel classes select the same images in the PCB dataset and fine-tune the model under the same shot settings to compare the experimental results.
It can be seen from Table 4 that when FSOD dataset is used as the base classes for pre-training, the model achieves the highest detection precision under all shot settings. With the increase of shots, the precision gap becomes more and more obvious. When PASVAL VOC is used for pre-training model, the detection precision is the worst. Compared with PASVAL VOC, MS-COCO has larger scale and more pictures, which is more helpful for the model to learn low-level semantic information in the images, so it can achieve higher detection precision. However, the key of few-shot learning is to improve the generalization ability of the model. Therefore, the FSOD dataset with a large number of objects and high diversity is more conducive to the meta-learning for the model and can help the model achieve better detection results.

2) BACKBONE NETWORK
In this article, the feature extractor uses DarkNet-53 as the backbone network to extract query features, instead of the original DarkNet-19. The experimental results of the two networks under 10-shot are shown in Table 5.
The detection precision of DarkNet-53 is 69.52%, which is better than 66.58% of DarkNet-19. DarkNet-53 draws on the idea of feature pyramid network and introduces residual mechanisms, so it has stronger ability to extract small defects in PCB and has higher detection precision. Therefore, the DarkNet-53 network is adopted as the feature extractor

3) EFFECTS OF DIFFERENT MODULES
As shown in Table 6, in order to verify the effect of different modules on model performance, we conduct ablation experiments under different shot settings (k = 1, 2, 3, 5, 10, 30). In the table, DN-53 indicates that the feature extractor adopts the DarkNet-53 network, EL refers to the addition of EL-CBAM feature enhancement module, and BI-FU represents the addition of multi-scale fusion module.  When only the feature extractor network is replaced, the model detection precision improvement is small due to the lack of enhancement of key feature information and the lack of multi-scale feature information for small object defects. When the feature enhancement module or multi-scale fusion module is added separately, the performance of the model is further improved, which proves the effectiveness of the two modules respectively. When the two modules are joined at the same time, the model detection precision reaches the highest level, which shows that the combination of the two modules is very useful for improving the model performance.

4) COMPARISON OF FEATURE FUSION STRATEGIES
The multi-scale feature fusion module is composed of multiscale feature extraction part BI and adaptive feature fusion part FU to realize image multi-scale feature extraction and fusion. In Table 7, we record the effects of three different feature fusion methods under 5-shot and 10-shot settings. (1) The N feature maps extracted from the feature extraction part BI are directly input into the reweighted module and converted into reweighting vectors. (2) We connect N feature maps according to the channel dimension. Then, we use convolution operation to directly compress the feature map by channel to obtain the fused feature maps. (3) We adopt the method in this paper, that is, we use feature compression vectors to achieve feature maps fusion.
From Table 7, we can find that method 1 achieves the highest detection precision. However, Method 1 directly inputs N feature maps into the feature reweighting module, which increases the computational complexity of the model and affects the detection efficiency. Method 2 adopts direct convolution compression. It is easy to operate, but too much feature information is lost, which leads to poor detection precision. In order to further balance the model precision and calculation speed, Method 3 is selected as our final feature fusion method.

5) COMPARISON OF FEATURE ENHANCEMENT MODULE
In order to fully mine the key information of samples and suppress the interference of useless information, a feature enhancement module based on EL-CBAM is introduced in this article.
In this article, the feature enhancement module is constructed based on SENet [43], ECANet [35], CBAM [22], and EL-CBAM. As can be seen from Table 8, after adding the feature enhancement module, the detection precision of the model has been greatly enhanced. CBAM, which combines both spatial and channel attention, has a better effect on feature enhancement than single dimensional attention of SENet and ECANet. Compared with CBAM, the EL-CBAM module increases the mAP by 0.38% and 0.40% at k = 5 and k = 10, respectively, which reflects the effectiveness of EL-CBAM. It can further highlight the key feature information of the feature map and help the network detect PCB.

V. CONCLUSION
In this article, we propose a PCB defect detection method FPFM based on few-shot learning. Based on FSRW, the proposed method improves the network in the feature extractor to darknet-53. What' s more, we introduce a feature enhancement module based on the improved CBAM to highlight key regional features, which effectively suppress useless interference information and improve the feature extraction ability of query samples. Meanwhile, the few-shot learning method combined with multi-scale feature fusion is used for PCB defect detection for the first time. This module can extract multi-scale features and fuse them into a high-quality feature map, which improves the detection precision of smallscale defects. Finally, we use the FSOD dataset as the base classes and the PCB dataset as the novel classes to conduct a large number of experiments. We can find that the proposed model outperforms the recent state-of-the-art results under different shot settings. At the same time, we take into account the efficiency and precision of detection, which is conducive to the application of industrial production. However, there is still some work to continue to be researched. At present, the defect types studied in this article are relatively fixed, and there are still more defect types on PCB that need to be experimentally analyzed. In addition, this article does not consider the multi-label problem and the correlation between labels. In the future work, we consider expanding the model to more PCB defect types, even to other industrial products. At the same time, we will consider the study of multi-label problem, which means that we predict both the types and locations of PCB defects and other attributes such as defect severity.