Automatic Micro-crack Detection of Polycrystalline Solar Cells in Industrial Scene

Solar energy can be a clean and renewable alternative to traditional fuels, which enables its wide application in our life and the industry. However, some defects inevitably occur in the solar cells during production, transportation, and installation, which will reduce the power generation efficiency. In this paper, we propose a ResNet-based micro-crack detection method to detect the micro-cracks on polycrystalline solar cells. Specifically, a novel feature fusion model is introduced to aggregate the low-level features and deep semantically strong features by self-attention mechanism to obtain accurate geometry information. This method boosts the detection accuracy to 99.11%, which significantly surpasses other counterparts, e.g., some state-of-the-art deep neural networks, by a large margin. Since it is difficult for other methods to precisely detect other defect types apart from micro-cracks, we further propose a transfer learning method based on MK-MMD to guide the training process of defect detector with another pre-trained micro-crack detector. With the help of transfer learning, the accuracy of solar cell defect detection increases by 11.6%.


I. INTRODUCTION
I N recent years, the practical application of new energy sources, such as solar energy has been attracted much attention. Photovoltaic power generation utilizes solar energy to generate electricity and has been widely used in all aspects of life [1]. The operation of photovoltaic power generation is inseparable from solar cell panels. However, solar cells may get various defects due to external forces or aging during transportation, installation, and utilization. This will weaken the power generation effect of the cells and even make the cells fail to generate electricity, reducing the entire efficiency of the photovoltaic power generation system. Therefore, it is necessary to check the battery board promptly to prevent the defect problems of solar cells.
Electroluminescence (EL) has been widely used in the detection of defects in photovoltaic system components, especially solar cells, due to the high-resolution images it can produce. The acquisition process of EL images is shown in Fig. 1. During this process, the solar cell emits infrared light due to being activated by the voltage. Then cooled Si-CCD cameras capture these infrared lights and output  sensed images. The defective parts of the cells cannot be activated normally, so they appear as black in the sensed images. Therefore, it is possible to determine whether the cells contain defects by detecting the EL images. As shown in Fig. 2(a)(b), Solar cells can mainly be divided into two categories which are polycrystalline and monocrys-talline. Among them, because of the presence of impurities in the polycrystalline solar cell, there will be dark spots in the obtained EL image, which will affect defect detection. Therefore, the defect detection of the polycrystalline cell is more difficult compared with the monocrystalline cell.
As shown in Fig. 2, the defects of solar cells contain many types, including material defects and micro-cracks [2], [3]. Wherein material defects such as finger interruption generally will not affect the life span and efficiency of the battery. However, the micro-cracks will become more and more serious over time, thereby affecting the power generation effect of solar cells.
The defect detection process is mainly completed by manual identification before the era of machine learning, so that the detection efficiency is inevitably limited. Recently, many researchers begin to adopt machine learning methods for the detection of solar cell defects, such as SVM, AdaBoost and so on [3]. With the continuous development of deep learning and convolutional neural networks [3], more and more studies tend to use deep learning methods for defect detection of solar cells. However, most of these methods use conventional CNN networks to extract features from images for two-class classification. That is, only the feature outputs of networks' last layer are used for the classification of images. Considering that defect features such as microcracks are contained in the low-level features of the image, many useful features from the middle layers of the deep networks are ignored.
To solve this problem, we propose a ResNet-based method for detecting the micro-cracks of polycrystalline cells. In our method, we combine feature fusion module with ResNet50 backbone and use the fused features to perform classification. We aim to make full use of both the low-level and high-level information in the network for classification. This method leads to significant improvements on the polycrystalline micro-crack dataset that we collected from industrial production lines. However, in industrial scenarios, the defects of polycrystalline cells are diverse. The effects of the detection methods we propose will also be affected when targeting other defect datasets. Considering this, we conduct some analyses between different defect datasets of polycrystalline solar cells. The source domain is polycrystalline micro-crack data, and the target domain is various polycrystalline defects. The obtained transfer method adapts the output features of the training models corresponding to the two domains, which makes the micro-crack detection model guide the training process of the polycrystalline defect detection task. To summarize, the main contributions of this work are as follows: • We propose a micro-crack detection method for polycrystalline solar cells, including image preprocessing, feature extraction, feature fusion module, and image classification network. • Referring to the mainstream method of transfer learning, we develop an algorithm that can be utilized when transferring from the micro-crack detection task of polycrystalline cells to the polycrystalline defect detection task.
Based on the collected polycrystalline EL images including micro-cracked images, we created a polycrystalline micro-crack dataset. Experiments conducted on this dataset show that the micro-crack detection method based on deep learning and feature fusion can obtain satisfactory detection results. After that, based on the micro-crack detection model obtained in the previous step, we conducted transfer learning experiments on the polycrystalline defect dataset obtained from the network. The results showed that the transfer learning method we propose can effectively improve the effect of the defect detection task.

A. IMAGE CLASSIFICATION
The purpose of the image classification task is to assign a corresponding label to an image from a specific label set, that is, to identify the category of the image.
Traditional image classification methods include the process of feature extraction, classification, and so on. Through feature extraction, a large amount of local description information which is robust and not affected by noise such as light can be extracted. The main methods of feature extraction include Scale-Invariant Feature Transform (SIFT) [4], Speeded Up Robust Features (SURF) [5], Histogram of Oriented Gradient (HOG) [6], Local Binary Pattern (LBP) [7] and so on. Besides, some works propose a combination of multiple extraction methods used in the feature extraction process to prevent the loss of useful information. A vector with fixeddimension can be obtained through feature encoding [8] after feature extraction and fed into the classifier for classification. The classifiers mainly used for image classification include K-Nearest Neighbor (KNN) [9], Support Vector Machine (SVM), Random Forest (RF) [10] and various ensemble methods.
With the development of deep learning, convolutional neural networks (CNN) are gradually applied to image classification. The pioneering deep network used for large-scale image classification which is named AlexNet starts the era of CNN. The CNN models generally used for classification include convolutional layers, fully connected layers, softmax multiclass classifiers, and multi-class cross-entropy loss functions, etc., which are continuously trained to achieve image classification effects.

B. DEFECT DETECTION
The methods for defects detection are generally categorized into Photoluminescence (PL) [11], [12] and Electroluminescence (EL). Since some defects in solar cells only display under EL imaging of photovoltaic modules, most current methods [13], [14] use EL for solar cells' defect detection. Table 1 summarizes relevant literature and methods.
In [15], a Fourier image reconstruction method is proposed to detect defects in EL images of solar cells. However, the shape assumption makes defects detection with different shapes more difficult. Therefore, [16] proposes to utilize anisotropic diffusion filtering and shape analysis to detect defects in solar cells. This method produces a better detection effect on the micro-cracks, but the detection performance on other defects is weaker. Besides, [17] proposes a detection Micro-crack Image Segmentation [16] Binary Clustering [17] Finger Interruption

Feature Extraction
All Defects Deep Learning [3] method for finger interruption, which mainly uses binary clustering of candidate region features. In addition, methods using CNN for defect detection have gradually emerged in recent years. [3] proposes a CNN-based network for defect classification, and compares it with prior traditional classification methods, proving the superior performance of CNN for defects detection.

C. TRANSFER LEARNING
Transfer learning (TL) refers to the transfer of knowledge learned from one domain (source domain) to another domain (target domain) to enable the model to achieve better results in the target domain. According to the specific domains and tasks in transfer learning, transfer learning can be divided into three types: Inductive, Transductive, and Unsupervised Transfer Learning [18]. Our proposed framework belongs to Inductive Transfer Learning, in which the target domain contains labels, and the ground truth labels can be used to train the target model while transfer learning.
The inductive transfer learning can also be categorized into Instance-based TL, Feature-representation-transfer, VOLUME 4, 2016 Parameter-transfer, and Relational-knowledge-transfer [18]. TrAdaBoost [19] belongs to Instance-based TL, which obtains the target model by adjusting the weight of the source domain label and target domain label, using instance reweighting and importance sampling. Feature-representationtransfer is a widely studied transfer learning method that can decrease classification and regression errors by reducing the difference between the source domain and the target domain. Fine-tuning commonly used in model training belongs to parameter-based transfer learning, which improves the detection level of the target model by sharing parameters between the source domain and target domain. In addition, Relationalknowledge-transfer transfers the similarity relations between the source and target domain.

III. METHODS
Our proposed framework, which is illustrated in Fig. 3, can be composed of two important parts. We first perform an effective micro-crack detection method for polycrystalline cells, which can classify the cells as two categories (whether micro-cracks exist) in Section III-A. In Section III-B, a training method of defect detection model of polycrystalline cells based on transfer learning is proposed.

A. MICRO-CRACK DETECTION
As a common type of defect in polycrystalline cells, microcracks will affect the power generation efficiency of the cell to a certain extent. Therefore, the detection of micro-cracks is critical in the practical application of polycrystalline cells. According to whether it contains cracks, we can classify polycrystalline cells into normal and micro-cracked types, which can be represented by 0 and 1, respectively. As a result, the micro-crack detection problem of polycrystalline cells can be converted into an image binary classification problem. In this part, we propose a deep learning-based polycrystalline micro-crack detection method, which uses a deep convolutional neural network as the backbone and uses the fused network features to perform binary classification operations.

1) Data augmentation
In practical industrial situations, the cells with micro-cracks are only a minority of all cells. In our industrial polycrystal cell micro-crack dataset, the number of non-cracked cells is about 24 times larger than that of micro-cracked batteries. Considering that the micro-crack represents the local features of the image, the commonly used image processing methods such as cropping and affine transformation are not applicable here. Therefore, we choose the flip method for data augmentation. The whole flip process for each image contains 3 steps, which are: After these flips, we can get 4 images from the original one.  2) Image Preprocessing a: Fourier filtering As shown in Fig. 2(a), there are busbars in the polycrystalline cells which are similar to micro-cracks. Therefore, the busbars will cause a lot of noise interference to the detection of micro-cracks, which should be removed as much as possible before image classification.
Here we choose Fourier transform [20] to conduct image filtering to remove the busbars from the original images of polycrystalline cells as much as possible. For an M × N grayscale image f (x, y), the Fourier transform formula for converting it from the time domain to frequency domain is shown in Eq. (1), and the Fourier inversion formula is shown in Eq. (2).
As shown in Fig. 4, the entire Fourier filtering process can be mainly divided into three steps: After trying various filters, we got the following filter: , v ∈ (n − 120, n − 2) ∪ (n + 2, n + 125)} (4) m and n represent half of the width and height of the image respectively.
This filter can keep other data of the polycrystalline images from being lost, especially information such as irregular lines like micro-cracks, under the premise of achieving the goal of removing the busbars in our observations. And we also find that the use of this Fourier filter can reduce the influence of dark spots inside the polycrystalline cells on the micro-crack detection while removing the busbars.

b: Local Binary Pattern
LBP is the operator used to describe the local texture characteristics of an image. In the images after Fourier filtering, the components with obvious line shapes are basically only possible hidden cracks. Therefore, refer to related works [21]- [24], we extract the LBP feature of the image to highlight the texture. This enables the possible line-like micro-cracks to become more obvious, which is beneficial to the detection of micro-cracks.
The neighborhood radius of LBP for calculating is set to 1, and the number of pixels sampled is 8. The final sampling points selected by LBP operator are shown in Fig. 5. The corresponding images obtained after calculating the LBP feature are shown in Fig. 4(e) and Fig. 4(f).

Layer Name
Layer Structure Output Feature Size

3) Backbone For Classification
After preprocessing, the images are fed into a ResNet50 [25] network for feature extraction. The structure of the ResNet50 is shown in Fig. 3. Table 2 and Fig. 6 show more details of the network composition. Based on the original network, the ResNet50 used in this work changes the output number of the fully connected (FC) layer to 2, so for each image, a corresponding array F = [F 0 , F 1 ] can be output, in which there are two elements. In addition, we add the softmax layer at the end of ResNet50, the softmax calculation formula is where i can be 0 and 1, S 0 and S 1 respectively represent the possibility that this image belongs to category 0 and category 1.

4) Feature Fusion Module
It is intuitive that the stripes in the polycrystalline solar cell image contain rich discriminative information that is beneficial for defects detection. The features of the deepest layer in the network are semantically strong but lacks precise low-level information. Low-level features are usually deemed rich in containing geometric patterns and stripes, which is contrary to high-level features. Inspired by this, we propose a feature fusion module to fuse as much as possible low-level information to enhance the defects detection ability of the network.
Unlike some previous works integrate multi-scale features by convolutions, we propose to facilitate multi-scale feature fusion by attention mechanism in this paper. The detailed architecture of feature fusion module is illustrated in Fig. 7. We first resize the features of the prior stage by downsampling operation, which is a convolution layer with kernel size of 3 and stride of 2 in our implementation. Then we adopt attention mechanism to fuse the downsampled lowlevel feature and features at the higher level. We model the relationships among all spatial positions in the features to get precise and accurate low-level information by a self-attention layer. To be consistent with the transformer architecture, we insert another feed-forward network into the bottom of the self-attention layer to enhance the feature semantics. A residual connection is added in the self-attention layer and feed-forward network to ease optimization. We insert three feature fusion modules before Stage 3, Stage 4 and Stage 5 to construct a bottom-up pathway which is parallel to the backbone.

5) Cross Entropy Loss Function
Since this is a relatively simple binary classification task, we choose a typical Cross-Entropy (CE) loss function as the calculation method of the loss value here. The formula of the loss function is where y true represents the ground truth label of this image, which is likely to be 1 or 0.

B. DEFECT DETECTION BASED ON TRANSFER LEARNING
In industrial scenarios, the specific conditions and possible defects of different batches of cells are different. In the previous section, we propose the training method of a microcrack detection model for the collected polycrystalline cell dataset from the industrial scene. In this batch of cells, the defects of the battery are mainly micro-cracks, so the data used in training is relatively regular, only containing two types: without and with micro-cracks. As a result, without using other auxiliary designs, the convergence effect of the model has been well. However, when the types of defects contained in the dataset increase, or the dataset has less data, the model obtained by only using the method proposed in the above section will have a poor detection effect on defect detection and cannot be truly applied to actual industrial scenarios. Therefore, in this section, we propose a defect detection model training method based on transfer learning. With the help of the micro-crack detection model obtained in the previous part, knowledge transfer is carried out through transfer learning, so that the effect of detection can be improved.

1) Fine-tune
As the most basic transfer learning method, fine-tuning has been widely used in the training process of deep learning models. The most commonly used pre-trained model is the one trained on ImageNet [26]. But for the defect dataset, the images contained in ImageNet have a certain gap with those in it. So our intuition is to consider using the model obtained on a more similar dataset as a pre-trained model to get better fine-tune effects.
In the previous part, we have obtained a detection model for polycrystalline cell micro-crack based on the industrial dataset. Therefore, we consider using the model obtained previously as the pre-trained model, only to replace the old FC layer of the micro-crack detection model with a new one. Experiments have proved that using this type of model as the pre-trained model can get better results than using those trained on ImageNet.

2) Model Details
Referring to the previous work related to transfer learning, we use a transfer learning model to obtain a better defect detection model by learning the output of the trained microcrack detection model. Here, denotes the source domain, where n S is the total amount of source domain data got from the micro-crack dataset. Besides, the target domain is obtained from the defect dataset, notated as where n T is the number of target domain data. P and Q represent the probability distribution of the source domain and target domain respectively.

a: DAN and MK-MMD
In the representative work [27] of transfer learning, Deep domain confusion (DDC) is proposed, which uses the results obtained from the 7th layer of AlexNet to calculate the Maximum Mean Discrepancy (MMD) distance, and optimizes this value to reduce the distance between the source domain and the target domain. Deep Adaptation Network (DAN) [28] further improves based on DDC, adapts to the multiple layers of the deep convolutional network. By reducing the multiple kernel variant of MMD (MK-MMD) distance between the source domain and target domain among more layers, DAN can obtain better transfer results than DDC.
MMD is widely used in transfer learning tasks to measure the distance between the source and target domains mapped to the reproducing kernel Hilbert space (RKHS) space. Let F represent a function set in the sample space, MMD can be calculated as where k is the kernel function. MK-MMD is the multiple kernel variant of MMD, so when calculating MK-MMD, we only need to change the kernel function in the above MMD formula to a convex combination of multiple kernel functions, which means respectively represent the target domain and source domain of the lth picked layer. The corresponding distributions are respectively p l and q l . Then MK-MMD distance of layer l is Therefore, according to the MK-MMD distance calculation method, the loss value corresponding to all layers can be obtained as where L represents the total number of the picked layers, which is 5 in this work. The overall loss of the training process is where L CE represents Cross Entropy loss function for binary classification in the target domain, considering the prediction results can be optimized according to the ground truth labels. Besides, α is hyperparameters, which means the weight of L mmd in the fina loss, being set to 0.25 normally.

IV. EXPERIMENTS
In order to prove the effectiveness of the micro-crack detection model and the transfer learning model, we conduct a series of experiments on the collected polycrystalline microcrack dataset and the polycrystalline defect dataset obtained from the network. The following show the basic configurations of the experiments and the corresponding results.

A. DATASET
The dataset prepared for micro-crack detection contains the collected polycrystalline images from industrial assembly lines. These images are segmented from the EL images of the entire polycrystalline cell panels, which are all grayscale images. The size of each image is 240 × 240. In these cell images, there may be micro-crack defects. Therefore, according to whether the images contain micro-cracks, we divide them into two types: with micro-cracks (micro-cracked) and without micro-cracks (normal). The classification result of this dataset is shown in Table 3. An overview of the dataset is shown in Fig. 9. In addition, it can be seen that the number of cell images without micro-cracks is much larger than that of cells with micro-cracks, as a result, we should enrich the data of images with micro-cracks before training. Also, we find a solar cell defect dataset from the Internet [29] and select the polycrystalline cell part of it to perform  Table 3.

B. EXPERIMENTAL SETTING
Except for the model based on transfer learning, the deep networks of other detection models are all pretrained on ImageNet. The main deep convolutional neural network used is ResNet50, and the number of output channels of the FC layer is changed to 2 to better suit the binary classification.
In all experiments in this work, we use mini-batch stochastic gradient descent (SGD) with a learning rate of 0.005 and momentum of 0.9 for training.

C. EVALUATION METRICS
The detection models in this work are essentially classification models for polycrystalline cells, so here we introduce the confusion matrix, which is shown in Table 4, as the calculation basis for the model evaluation metrics. From Table 4, True Positive (TP) represents the predicted result and the actual class are both 1, which refers to cells with defects (micro-cracks as for micro-crack dataset). The concepts of True Negative (TN), False Positive (FP), and False Negative (FN) are similar.
With the help of the confusion matrix, we can get the calculation methods of accuracy (ACC), precision, and recall, which are P recision = T P T P + F P , The values of precision and recall in the results are hoped to be as large as possible. However, because of the interaction between these two values, we need to make some sacrifices here. To find the balance point, we introduce F1 score [30], which can consider both precision and recall. When it increases, both precision and recall will increase. The formula for calculating the F1 score is Since this work is concerned with binary classification, we further introduce the evaluation metric Area Under Curve (AUC), which means the area under the receiver operating characteristic (ROC) curve. In the ROC curve, the abscissa is Considering the characteristics of the binary classification problem and the imbalance problem between two classes, we finally select ACC, Precision, Recall, F1 score, and AUC as the evaluation metrics in the experiments of this work.

D. EXPERIMENTAL RESULTS OF MICRO-CRACK DETECTION
Experiments are carried out on the industrial micro-crack dataset to prove the importance of image preprocessing in the proposed method and the effectiveness of this method. All experiments in this part are trained for 10 epochs. This micro-crack dataset is divided into two parts, train and valid, after data augmentation, as shown in Table 5. The model is trained on train and tested on valid.

1) Influence of Image Preprocessing
In this section, our goal is to prove that image preprocessing helps to improve the final detection results. We conduct a series of experiments on the basic ResNet50 without feature fusion. Table 6 shows the gap between the detection effects obtained with and without image preprocessing. It can be seen from Table 6 that the addition of image preprocessing can improve the detection of the model under various evaluation metrics, respectively increasing ACC, F1 score, and AUC by 27.72%, 0.7456, and 0.4435. Also, the two main subcomponents of image preprocessing Fourier Filtering and LBP extraction are helpful to improve the effectiveness of the model more or less.
In addition, as shown in Table 7, we also try other filtering   methods and edge feature extraction methods to find the best options for image preprocessing. From the evaluation results in Table 7 and Fig. 10 which shows the filtered images, we can find that Fourier filtering has the best effect, outperforming other methods including Mean Blur, Median Blur, and Gaussian Blur. Similarly, after comparing the results of different edge feature extraction methods shown in Table 7 and Fig. 10, we find using the LBP feature for detection is the best choice, increasing ACC, F1 score, and AUC by 26.67%, 0.7182, and 0.4189. Compared with other extracted features, it is obvious that LBP is more likely to highlight the line-like micro-cracks in polycrystalline cells.

2) Influence of Data Augmentation
It can be seen from Table 3 that the used dataset is unbalanced, so we propose to use image flipping for data augmentation in Section III-A1. In the previous experiments above, the dataset we use is one after data augmentation.
In this section, we conduct related experiments based on image preprocessing, which proves that data augmentation is necessary and can greatly improve the effectiveness of the proposed micro-crack detection model. The results of the experiments are shown in Table 8, which shows that data augmentation can increase ACC, F1 score, and AUC by 4.55%, 0.0959, and 0.0303 respectively.

3) Influence of Feature Fusion
The above experiments are all implemented based on the basic ResNet50. In this section, we add feature fusion to the original network to make full use of the shallow features of the network, and conduct related experiments to verify the  Table 9. The first column of the table represents the features used in the following classification, and the numbers in it represent the output features of the corresponding stages. It can be seen from Table 9 that using the fusion features obtained by fusing the outputs of the four stages for microcrack detection can get a better detection effect. After adding the feature fusion module, ACC, F1 score, and AUC can be increased by 0.82%, 0.0204, and 0.0231 on the basis of using the basic ResNet50 for detection.

4) Comparisons with Other Classifiers
Keeping the same image preprocessing method, we try other commonly used machine learning methods to detect microcracks of polycrystalline cells. For each method, we try to get the configurations that gave the best detection results. The methods used and the corresponding parameters are as follows: • KNN: We attempt to use one of the simplest classification algorithms, the k-Nearest Neighbors (KNN) algorithm, to classify polycrystalline images, keeping the number of neighbors as 5 and the leaf size as 30. • GaussianNB: Gaussian Naive Bayes (GaussianNB), which belongs to Naive Bayes algorithms, is also attempted. • SVM: For the Support Vector Machine (SVM) algorithm, we set the penalty coefficient C to 3, and keep the others as default. • DT: As a traditional classification method, Decision Tree (DT) is tried here. A classification and regression tree (CART) is built to classify the cells. • RF: Combining Decision Tree with ensemble learning, we can get Random Forest (RF). Here, we set the number of decision trees in the forest to 100. • GBDT: We set the number of estimators to 100, learning rate to 0.08, max depth to 9 when using Gradient Boosting Decision Tree (GBDT). • AdaBoost: As a boosting method, AdaBoost combines the weak classifiers to get a strong classifier. The weak classifier we use is Decision Tree, the number of iterations is set to 50.  [31] and GoogleNet [32] we use for detection are the standard networks from torchvision, while the category number of classifiers is changed to 2. The training optimizer is SGD with a learning rate of 0.005. • ViT-small/ViT-base/MLP-Mixer-base: The patch size of used ViT [33] and MLP-Mixer [34] is 16×16. The depth of these three models is 12. The used optimization methods are consistent with the original article. Table 10 shows the classification effect of the above methods. It can be seen that the MLP can get the best result among the proposed learning methods, which is a simple feedforward neural network containing two hidden layers. However, compared with our proposed ResNet-based microcrack classifier, the traditional classification methods like MLP still have certain gaps considering that deep neural network ResNet has stronger feature extraction and recognition ability.

E. EXPERIMENTAL RESULTS OF TRANSFER LEARNING
Further experiments on the effectiveness of the transfer learning method are conducted. The source domain of these experiments is composed of the industrial micro-crack dataset, and the target domain is the defect dataset. Similar to the previous part, we divided the data into train set and valid set after aligning these two datasets. The optimizer in experiments of this part is SGD with a learning rate of 0.005. The results are VOLUME 4, 2016  recorded after the models are trained for 100 epochs. Table 11 shows the results of transfer learning, including the detection effects of models obtained by fine-tune and MK-MMD-based transfer learning method. It can be found that compared with the ResNet-based classification model directly training on defect dataset, the evaluation metrics of the model obtained using fine-tune have some increase. Besides, since the image type of both the micro-crack dataset and defect dataset is polycrystalline cells, the parameters of the model based on the micro-crack dataset are more suitable for defect detection. As a result, the effect of fine-tune with the help of the micro-crack dataset is better than that with the aid of ImageNet, with ACC, F1 score, and AUC increase of 7.20%, 0.1191, and 0.0838.
In addition, we conduct single-layer and multi-layer MK-MMD related experiments. The results in Table 11 report that compared to only performing the optimization of MK-MMD distance after the FC layer output of the source domain and the target domain, optimizing the output of multiple layers can better reduce the distance between the source and target domains, to improve the model training effect of the target domain. Fig. 11 shows the changes of Cross-Entropy losses between predictions and true labels in the first 20 epochs of the experiments mention in Table 11. From the change trends  of losses in 20 epochs, we can tell that transfer learning plays a great role in promoting the convergence of the loss. To find the most suitable hyper-parameter α in (11), we conduct a series of experiments about α based on multi-level transfer learning. As shown in Table 12, when α is set to 0.25, ACC, F1 score, and AUC can achieve the best results.
As shown Fig. 12 and Fig. 13, we further carry out vi-sualization experiments. The heat maps in the figures are calculated from the output features of the last convolutional layer in the network. The features of images got from ResNet are activated in different regions, and the redder parts in the heat map have the higher activation degree. It can be seen in Fig. 12 that the pictures with the large activated areas are more likely to be classified as 1. On the contrary, the picture with less activated parts is classified as 0. Besides, there is a trend that the heat maps of class 0 pictures are similar. Fig. 13 clearly shows that transfer learning improves the feature activation effect of the network, to improve the detection effect. With transfer learning, the activated areas of the heat maps obtained by features of class 0 images become smaller and more similar to each other to a certain extent. And the activated areas of images that should be classified as class 1 become larger, of which the positions are closer to the defects.

V. CONCLUSION
This paper proposes a ResNet-based micro-crack detection method for polycrystalline solar cells. This method contains image preprocessing and a backbone network. The image preprocessing includes data augmentation, Fourier Filtering, and LBP extraction. After image preprocessing, the busbars of the polycrystalline cells can be removed, and the microcracked part can be enhanced. The backbone network is ResNet50, which is used for feature extraction and final classification operations. For the industrial micro-crack dataset, this method can achieve a detection accuracy of 98.29%.
In industrial scenarios, in addition to hidden cracks, polycrystalline solar cells have other defect forms, and training defect detection networks from scratch often fails to achieve good results. We propose a transfer learning method based on the micro-crack detection network, using fine-tune and the obtained micro-crack detection model to guide the defect detection. This method can significantly improve the effect of defect detection.
As for future work, we propose to focus on defect detection of polycrystalline cells based on one-shot learning and online learning, which aims to solve the difficulties of initial dataset acquisition. Besides, we observe the performance of current deep networks can be further improved by some tricks, e.g., knowledge distillation.