Defects Inspection in Polycrystalline Solar Cells Electroluminescence Images Using Deep Learning

Solar cells defects inspection plays an important role to ensure the efficiency and lifespan of photovoltaic modules. However, it is still an arduous task because of the diverse attributes of electroluminescence images, such as indiscriminative complex background with extremely unbalanced defects and various types of defects. In order to deal with these problems, this paper proposes a new precise and accurate defect inspection method for photovoltaic electroluminescence (EL) images. The proposed algorithm leverages the advantage of multi attention network to efficiently extract the most important features and neglect the nonessential features during training. Firstly, we designed a channel attention to exploit contextual representations and spatial attention to effectively suppress background noise. Secondly, we incorporate both attention networks into modified U-net architecture and named it multi attention U-net (MAU-net) to extract effective multiscale features for defects inspection. Finally, we propose a hybrid loss which combines focal loss and dice loss aiming to solve two problems: a) overcome the class imbalance problem, and b) allowing the network to train with irregular image labels for some complex defects. The proposed multi attention U-net is evaluated on real photovoltaic EL images datasets using 5-fold cross validation technique. Experimental results demonstrate that the proposed network can segment and detect various complex defects correctly. The proposed method achieved the mean intersection over-union (m-IOU) of 0.699 and F-measure of 0.799 which outperforms the previous methods.


I. INTRODUCTION
In this era of technology, solar energy provides the most elegant solution for arising energy demand by enabling generation at any scale [1]. Among different solar cell technologies, polycrystalline solar cells dominate the monocrystalline solar cells due to cost. During the production process, solar cells may get damaged due to thermal stress or improper operations. The damage may be due to defects such as finger interruptions, cracks or cell breakages etc. Among these defects, cracks can cause a severe loss in power efficiency of solar cells because they can electrically disconnect certain areas of solar cells [2]. The more severity of crack will lead to greater The associate editor coordinating the review of this manuscript and approving it for publication was Min Xia . electrical power loss of the module [3]. Therefore, timely detection of these cracks is essential to improve the endurance and reliability of solar cells [4]. Thus, this paper presents a method based on deep learning to automatically segment and detect various crack and finger interruption defects in polycrystalline solar cells.
Electroluminescence (EL) imaging is an important nondestructive technology for defects inspection of solar cells with the ability to capture high-resolution solar cell images [5]. The EL imaging highlights the internal defects such as cracks that are difficult to be recognized by the human eye. These cracks are of different sizes, shapes and orientations and appear darker as compared to the background. Figure.1 shows two defective solar cell images with various types of cracks and finger interruption defect. The defects can VOLUME 8, 2020 This be categorized into 1) defects submerged into background, 2) complex star-like cracks 3), line crack defects, 4) crack defect separated by bus bar and 5) finger interruption defects. As compared to monocrystalline El images, polycrystalline EL images have heterogeneous complex background with randomly distributed crystal grains making random patterns. These patterns are unique for every image and may contain the same pixel intensity value as a defect which makes the defect detection task difficult. As shown in Figure.1 the EL imaging makes the defects prominent. However, the visual inspection of polycrystalline solar cell EL images is time consuming and also require expert's involvement. Several image processing algorithms are proposed to eliminate expert's involvement. However, they are not robust and are inaccurate. Figure.2(a) shows the defected EL image. As shown in Figure.2(b), Figure.2(c) and Figure.2(d) the segmentation of crack is inaccurate with large amount of noise using traditional image processing algorithms (such as Otsu threshold, Gaussian threshold and global fixed threshold).Thus, we propose an automatic defects inspection method for polycrystalline solar cells which is fast, robust and accurate. The defects inspection methods are commonly divided into statistical, structural and filter-based approaches. In statistical approaches, the image is separated into distinct regions based on their statistical behavior. Histogram analysis analyzes the image features based on histogram which is a kind of statistical method. Tsanakas and Botsaris [6] proposed a thermographic method for hot-spot detection in defective solar modules. They used non-destructive thermographic approach to view the photovoltaic patterns by using data gained through line profile and image histogram. The proposed method does not filter hot-spots in the image background. Wakaf and Jalab [7] used images with arbitrary gray-level pixel distribution and propose a method to detect defects from these images. They used histogram matching and separated defective object from foreground from the image background. Gray level co-occurrence matrix (GLCM) is a powerful statistical technique that provides second-order method for producing texture features. GLCM contains frequencies at which two pixels in an image are separated by a certain vector.
The structural analysis methods make up the image textures and spatial arrangements. Qian et al. [8] used selflearning features to detect polycrystalline solar cell surface micro-cracks. In their method, the defect information is obtained from self-learning features then these features are combined with super-pixel segmentation for defect regions localization. The proposed method gives significant results but they used a very small number of defective samples. Furthermore, the method is only evaluated on simple microcrack defects. Tsai et al. [9] used a Haar-like feature extraction techniques and proposed a novel fuzzy c-means clustering technique for cracks and finger interruption defects in polycrystalline solar cell EL images.
Filter based methods are implemented in the spatial domain, frequency domain or joint spatial/spatial-frequency domain. The filter-based methods can carefully select the points of interest and detect irregularities. A particular method has been introduced in [10] using Fourier image reconstruction. The methods eliminate unwanted objects by setting their frequency in the frequency domain to zero and then transforms to a spatial image. Their method takes into account only the straight line-like defects. Chen et al. [11] generated an enhanced saliency map using a novel steerable evidence filter then applied morphological operations to accomplish the segmentation of solar cell EL defects. Anwar and Abdullah [12] detects and segment micro cracks in polycrystalline solar cell images using a method based on anisotropic diffusion. The method performs well but didn't consider the cracks submerged with background.
In recent years, deep convolutional neural networks (DCNN) has made convincing progress in the computer vision domain. In comparison to traditional methods, DCNN provides better solutions for arduous problems such as image segmentation [13] and scene recognition [14]. Researcher in the field of industrial defect inspection also adopts the power of DCNN to solve the problems of segmentation and classification of industrial defects. Recently, Han et al. [15] added a region proposal network (RPN) into U-net [16] and used dilated convolutions to segment polycrystalline silicon wafer defects. The method gives reasonable results but combining RPN and dilated convolution makes the detection slower. The U-net gives significant results in segmentation tasks with small-scale datasets. Despite having symmetric skip connections to fuse encoder and decoder features, U-net like architecture still faces degradation problem which can be solved by adding residual connections within encoder and decoder [17]. Another problem with U-net is that it fuses low-level features with high-level features based on fixed weights. In order to solve this problem, this paper proposes a multi-attention network to weigh the feature maps according to their importance and suppresses the irrelevant information.
Inspired by human visual system, attention mechanism is a popular trend in deep learning which has proven to offer significant results for image captioning [18], machine translation [19] and classification [20]. Chen et al. [21] introduced spatial and channel attention and incorporated into a DCNN for image captioning. Channel attention highlights the most important features while spatial attention suppresses the noise information. The architecture outperforms previous methods for image captioning. Xu et al. [22] confirm significant improvements in speech recognition task by adding spatial and channel attention into the CNN. Henceforth, we introduce a multi-attention network consisting of spatial and channel attention and incorporate it into modified U-net to solve defects segmentation problem. The schematic diagram of the proposed network is shown in Figure.3. The proposed method effectively improves the performance and speed with a smaller number of parameters. The main contributions of this paper are as follows, 1. A multi attention network is proposed consisting of spatial attention and channel attention which helps to learn and weigh multiscale feature map channels. The channel attention emphasizes on defect regions while spatial attention suppresses the background noise. 2. The proposed multi attention network is added into the modified U-net to accomplish defects inspection in Photovoltaic electroluminescence images. The insertion of multi attention network allows the network to utilize contextual and spatial information effectively. 3. A hybrid loss function is presented to train Multi attention U-net. The hybrid loss combines dice loss and focal loss which solves class imbalance problem and allows the network to learn poorly classified complex cracks. With the proposed loss function and multi attention network, we achieve a reasonable increase in performance. 4. The Multi attention U-net is trained and evaluated on a real industrial dataset. The results are compared with former methods for defects inspection of solar cells. The results verify the significant improvement in defects inspection using Multi attention U-net.

II. METHODOLOGY A. MULTI ATTENTION NETWORK
Attention mechanism is inspired by the human visual system. Human visual attention allows to emphasize on a certain area while defocusing the surrounding region. Most convolutional neural networks for defects inspection give equal contribution to all channels, which is ineffective for feature extraction. Generally, each channel of CNNs generate different semantic information from the image. Some features may be valuable for defect localization, while some features may comprise of noises and can result in redundant information which in turn lead to bad segmentation. So, with the aim to extract effective features, a CNN must be capable of highlighting the defects while suppressing the background information. In this paper, we propose a multi attention network, which consists of channel attention and spatial attention to emphasize the defective area and suppress the unwanted details. As shown in Figure.4, the input features C h are firstly refined using channel attention and spatial attention, then we will get the channel attention map C h and spatial attention map S h . C h and S h are used to reweight the input features. Then these features are added together to complete the phenomena of multi attention network. The multi attention network is incorporated into U-net which gives a boost to the performance and will be verified by following experimental results. The phenomenon of multi attention network is given in Equation (1).
where C h and S h are channel attention and spatial attention respectively.

B. CHANNEL ATTENTION
In order to extract effective features, a CNN must be capable of highlighting the defect regions. For, this purpose, we propose a Channel attention mechanism which enables the network to emphasize on the defect regions. The Squeeze and excitation (SE) block in [23] is designed for classification task, and it plays a significant role in improving the performance of the network by placing it in each block of the network. For segmentation task, we improved the original SE block by adding a global max pooling branch together with global average pooling branch. In [24] and [25] the significance of Global max pooling and Global average pooling is shown for distinct objects recognition. After squeezing the inputs using Global max pooling and global average pooling, we apply fully connected layer followed by LeakyRelu layer to add non-linearity and decrease the number of channels to a certain ratio. Then the second fully connected layer followed by a sigmoid activation layer is added to provide each channel a smooth gating function. Finally, we add the outputs of both branches that will be used to re-weight the input feature maps. As shown in Figure.4 the channel attention block consists of two branches. Global Max pooling (G max ) and Global Average pooling (G avg ) are used to squeeze the input features in first and second column respectively. In first branch where W represents parameters in channel attention, σ represents the simoid function, f C 1 , f C 2 , f C 3 and f C 4 represents the fully connected layers and δ irepresents the LeakyRelu function. The final output will be obtained by weighting the input features with CA and is given in Equation (5).

C. SPATIAL ATTENTION
Electroluminescence images have heterogeneous complex background containing more background region as compared to the defect region. So, the feature maps may contain a lot of unnecessary information which may result in bad segmentation. In order to focus more on defects rather than on background, we adopt spatial attention block which suppresses background details while preserving defects details. From [26] large receptive field can increase the representational power of the network leading to an increase in accuracy. Dilated convolutions [27] or Atrous convolutions [28] are used to increase the receptive field size deprived of increasing kernel size but they are slow. Inspired by [29] we use separable large kernel convolutions in spatial attention block in order to increase the receptive field and minimizing memory consumption with a smaller number of parameters. As shown in right side of Figure.4 we separate k×k convolutional kernel into 1×k and k×1 kernels. We tried different values of k such as k = 5, k = 9, k = 11 and k = 15. However, k = 5 has given best results, so, we set k = 5 in each experiment. Each convolution operation is followed by a batch-normalization and a non-linear LeakyRelu layer. Furthermore, we use a sigmoid gating function to map the feature maps to [0, 1]. This output feature map is mapped to input features to generate effective features for defects segmentation.
where W represents parameters in spatial attention, σ represents the simoid function, conv 1×k and conv k×1 represents the convolution operations and we set k = 5 in the experiments.
The final output will be obtained by weighting the input features with SA and is given in Equation (9).
To test the effectiveness of Multi-attention network, we incorporated it into the modified U-net and demonstrated the results in the experiments. The results have shown that by adding multi-attention network into U-net, the defects are detected robustly by suppressing the complex background information.

D. MULTI-ATTENTION U-NET
Based on multi-attention network for effective features extraction, we offer a fast and robust Multi attention U-net architecture. The inputs of the network are grayscale images with size 1 × 512 × 512 and the output is a black and white segmentation map of size 1 × 512 × 512. Each pixel represents a defect or background. In this paper, defects are represented as white pixels while background as black pixels. The structure of MAU-net with definition of each layer is shown in Table.1 in which • Conv represents the convolution layer.
• Max-pooling represents the Max Pooling layer.
• Up-sampling represents the Up-sampling layer.
• Sigmoid represents the sigmoid activation later.
• Concat represents the concatenation layer. The MAU-net architecture is illustrated in Figure.5. It is composed of 9-level encoder and decoder path. Each level on the encoder path consists of two 3 × 3 convolutions followed by 2 × 2 Max-pooling operation for down-sampling the feature maps. As the model grows, the numbers of feature maps are doubled while the size of the feature map is halved. A multi attention network is incorporated at each encoder-decoder path to weigh the feature maps to achieve the attention mechanism. The multi attention network allows the network to focus on the defect features while suppresses the background information. The working principle of the multi attention network is explained in the previous section. On each level in decoder path, we firstly apply convolution operation to halve the number of features using 3 × 3 convolution, then these features are up-sampled using 2 × 2 up-sampling layer. We use concatenation operation to fuse these up-sampled features with low-level features passing through the multiattention network which are at the same level in the encoder path. Then a set of two 3 × 3 convolutions is applied on these feature maps. It should be noted that each convolution layer is followed by a leaky-Relu layer. In the final layer, 1 × 1 convolution with sigmoid activation is used to output the segmentation map of the original image. It should be noted that unlike U-net we did not use any cropping. The MAU-net simplifies the U-net architecture and increases it representation power with a smaller number of parameters. In addition, the MAU-net is superior to U-net in training speed, test speed with improved segmentation accuracy.

E. HYBRID LOSS
During the training of the network, it is necessary to estimate the weights to increase robustness and accuracy of the network. This is achieved by using a proper loss function which will be minimized during training. Segmentation of objects exhibiting various sizes and shapes is a challenging task in semantic segmentation. Electroluminescence images has imbalanced pixels distribution i.e. the images contain about 95% of background and crack covers only about 5%. As the contribution of these cracks to the loss is less as compared to the background, so the network may result in low performance In order to overcome this problem, as suggested in [30], we train our model using a hybrid loss function, which combines both dice loss L D [31] and the binary focal loss L BFL [32]. L D helps the network to mitigate the class imbalance problem. On the other hand, L BFL helps the network to learn poorly classified examples in an efficient way.
The dice loss L D helps to overcome the class imbalance problem in Binary segmentation task by formula given in Equation (10).  where p j ∈ [0, 1] be the j th output of the final layer of the model passed through sigmoid activation layer and g j ∈ [0, 1] is the j th ground truth label. The Binary focal loss can be given by Equation (11).
where (1 − p i ) γ is the modulating factor with γ ≥ 0 be the focus parameter to make the loss focus on problematic classes. We experimented different values of γ and γ = 2 produced the best results. The final hybrid loss can be calculated by Equation (12).
where β is a parameter introduced to regulate the stability between dice loss and focal loss. The value of β is set to 0.5.

III. EXPERIMENT RESULTS AND ANALYSIS A. EVALUATION METRIC
After training, it is compulsory to assess the performance of the proposed architectures. Many metrics are used to determine the performance of segmentation results. In reference to [18], predicted results are compared with ground truth images using five metrics: mean-Intersection-over-Union (m-IOU), accuracy, recall, precision and F-measure. The IOU, accuracy, recall, precision and F-measure are computed by Equations (13), (14), (15), (16) and (17)

Precision
where TP, TN, FP and FN are the number of correctly classified defect pixels, the number of background pixels that are classified correctly, the number of incorrectly classified defect pixels and the number of background pixels that are incorrectly classified respectively. IOU calculates the spatial overlap between predictions and the ground-truth. The value of IOU is between 0 and 1 while 0 means no overlap and 1 means full overlap. Accuracy gives the probability of correctly classified pixels by our model. Recall measures the number of defective pixels in the ground truth image that are also detected as defect by the segmentation model. While precision is the measure of correct positive pixels in the ground truth that are also detected as positive by the segmentation model. F-measure gives the harmonic mean of precision and recall.

B. IMPLEMENTATION DETAILS
The Multi attention U-net and other models for comparison experiment are implemented in Python using Keras 2.2.4 library. All the models are trained for 100 epochs on GTX 1080 Graphics processing unit (GPU) with 12GB of memory. We use RMSprop optimizer with a learning rate of 0.0001 to update the weights of the model. All the parameters are initialized using Xavier uniform initialization. The main CPU parameters are given in Table.2.

C. SOLAR CELL EL IMAGES DATASET
As shown in Table.3, the solar cell EL images dataset consists of total 828 images. There are 406 crack defect images, 359 finger interruption defect images and 63 defect-free images. We only used defective images for training and algorithm  is tested on both defective and defect free images. All the images are captured in a real industrial environment. Each image has a different contrast, brightness and patterns. All the images are grayscale and are resized to 512 × 512 pixels for faster training. The images are annotated by a specialist using LabelME to provide images for training and testing the segmentation network. Each image has a binary label that means each pixel is either a defect or background: 0 or 1. We marked the defect as white; in which pixel value equals to 1, and the background as black; in which pixel value equals to 0. Usually, data augmentation is used to enlarge the dataset and overcome the overfitting problem. There are many techniques that are applied to data augmentation for specific problems. In this paper, we use rotation = 40, flipping, height shift = 0.05, width shift = 0.05, shear range = 0.05, horizontal flip, vertical flip and adaptive histogram equalization. In adaptive histogram equalization, several different histograms are computed for each different section of the image and used to adjust lightness values of each section of VOLUME 8, 2020 image. AHE refines the local contrast of images and improve the quality and definitions of edges in the whole image. Data augmentation increase the generalization abilities and the accuracy of the model.

D. SEGMENTATION RESULTS
Segmentation of cracks in photovoltaic EL images is an arduous task due to the existence of heterogeneous background and various crack shapes. A fast, robust and efficient method for defects detection in polycrystalline Electroluminescence images is proposed in this paper. The proposed method leverages the advantage of multi attention network and hybrid loss. After training and evaluating the network, we obtained promising segmentation results on EL finger interruption and cracks dataset. Thus, we offer an automatic defects inspection algorithm that can segment various defects with high accuracy. It should be noted that the Multi attention U-net offers improved results with only 8.1M parameters.
In this paper, we compare the multi attention u-net with Tsai's method [9], SEF method [11] and Han's u-net [15]. We evaluate the segmentation performance of proposed method and state of the art methods using k-Fold cross validation. We set k = 5 for each method. K-fold validation is commonly used to determine the performance of deep learning models on small dataset. We randomly divide the finger interruption and crack defects datasets of 745 images into 5 folds. Thus, each group exhibits 149 images. It should be noted that we don't use defect-free images during testing at this stage. On each fold, four groups are used for training and the fifth group is used for testing. For each fold we calculate the mean intersection over union, precision, recall, accuracy and F-measure of proposed method and state of  Table.4. The last row of each method illustrates the average of 5-folds. From Table.4, it can be seen that Tsai's method has high recall as compared to SEF method. While SEF outperforms Tsai's method in terms of other metrics. Furthermore, in terms of recall proposed method has almost similar results as compared to Han's methods. From the results it can be seen that the proposed method gives the best segmentation results in each metric as compared to other four methods. The comparison of F-measure of 5-fold cross validation is illustrated in Figure.6. Furthermore, we illustrated the segmentation results of proposed method and state of the art methods on crack defects and finger interruption defects in Figure.7. The first and second column indicates the input images and ground truth images while third to sixth column represents corresponding segmentation results.  The Tsai's method uses only defect-free images for training. Haar-like features are used to extract the information of background patterns and the fuzzy c-means is applied for clustering. The detection results vary in accordance to the number of clusters c and the control constant t. In this paper we use c = 30 and t = 0.02. The segmentation results are shown in Figure.7(c). It can be seen that the method easily locates the crack and finger interruption defects but the segmentation is inaccurate and number of background objects similar to defects are also detected as defects. The method shows poor performance when the defects are submerged into background objects.
The SEF method uses a novel steerable evidence filter to create the contrast enhanced saliency map of defects. Then local threshold and minimum spanning tree is applied for defect segmentation. We set all the parameters same as in original SEF paper. The segmentation results are shown in Figure.7(d). SEF method segments both finger interruption and crack defects better than Tsai's method. However, some defects are not segmented accurately and background objects near to defects are also considered as defects by this method.
Hui uses a region proposal network to generate images patches which might contain defects information. Then these patches are fed into modified u-net for defects segmentation.
The modification is done in a sense that the first convolution layers of 4 th level and bottleneck are replaced by dilated convolutions. In this paper we use the dilation rate of 2 and 4 in 4 th level and bottleneck respectively. The results are illustrated in Figure.7(e). This method segments all finger interruption and crack defects and does better segmentation as compared to the former methods. However, some background objects similar to cracks are also detected as defects.
Context information is very important when analyzing the defects from the background [33]. The proposed architecture considers the contextual information of defects and can robustly differentiate them from the heterogenous background patterns. The insertion of Global attention block between encode-decoder pass highlights the defect region enabling the network to focus more on defects, improving the m-IOU, precision and recall of defects segmentation. The results of proposed method are given in Figure.7(f). It is clear that all the defects are segmented accurately and the background crystal grain patterns are successfully neglected. Furthermore, the multi attention U-net is also prone to occlusions. For-example, at the regions where crack area mixes with the background, the proposed method robustly detects and segments all the defects. VOLUME 8, 2020   To test the robustness of the MAU-net against background information, we concluded experiments on defect-free EL images. We use 63 defect-free images for this comparison. The segmentation results of Tsai's method, SEF method, Han's U-net and MAU-net on defect-free EL images are shown in Figure.8. Figure.8(a)-(b) shows the input images and ground truth images respectively. On the other hand, Figure8(c)-(f) shows the corresponding results of proposed method and state of the art methods. It can be seen that the SEF method shows poor performance on defect free images. While Tsai's method segments the background objects which are similar to cracks and finger interruption defects. Han's U-net has better segmentation performance but still some crystal grain patterns are detected as defects. On the other hand, MAU-net did not segment any background object making it superior to other methods.

E. EVALUATION OF MULTI-ATTENTION NETWORK AND HYBRID LOSS
We modified the original U-net by starting the number of feature maps from 32, however, in the original U-net the features start form 64 in the first-level of U-net. From experiments, we found out that if the features of first-level start from 32, it gives the same result as of 64 features maps in the firstlevel of U-net. The effect of multi-attention network, channel attention and spatial attention on modified U-net is shown in Table.5. It can be seen that the introduction of channel attention network into modified U-net increases the precision and m-IOU of the model enabling the model to focus on defects information. While the spatial attention successfully suppresses the background noise and gives the better recall rate. Furthermore, both channel attention network and spatial attention network alone increase the performance while the multi-attention network gives a further increase in segmentation performance. The final m-IOU of modified U-net with multi-attention network is 0.699 which is significantly better than former methods. Furthermore, we experimented to test the effectiveness of hybrid loss on multi-attention U-net. Table.6 shows the effect of dice loss, focal loss and hybrid loss on the performance of the proposed method. The first, second and third rows shows the results of MAU-net with dice loss, MAU-net with focal loss and MAU-net with Hybrid loss respectively. It can be seen that the multi-attention U-net trained with hybrid loss gives better m-IOU, F-measure and accuracy.

F. DETECTION RESULTS
In order to further evaluate the detection performance of multi attention U-net, we use the scheme that any white object larger than 3×3 pixels is considered as defect. Following this scheme, we calculated the detection results of Tsai's method, SEF method, Han's U-net and MAU-net on 78 crack defects images, 67 finger interruption defect images and 63 defectfree images. All the results are listed in Table.7. It is worth  noticing that all the methods gave good defect detection results, however, the multi attention U-net detects all the defected images correctly. Furthermore, the Tsai's method and SEF method show poor detection performance on defectfree images while Han's U-net detects 12 defect free images as defects.
We also made a comparison between Tsai's method, SEF method Han's U-net and MAU-net in terms of test time speed. The test time speed of MAU-net and state of the art methods on 512 × 512 images are shown in table.8. It can be seen that Tsai's method is faster than SEF method and Han's U-net. While the MAU-net is faster than all methods and processes one image in only 75 milliseconds. Thus, the MAU-net outperforms all state-of-the-art methods in terms of test time speed, segmentation and detection performance, enabling the network to use in real-time industrial applications.

IV. CONCLUSION
In this paper, we present a fast, robust and efficient method for defects inspection in photovoltaic Electroluminescence images. For evaluation, we used crack and finger interruption defect datasets. The proposed method has two key features: the multi attention network and the hybrid loss. The multi attention network helps the network to focus on defects while suppressing the complex heterogenous background information. Thus, enabling the network to segment complex defects robustly and with higher accuracy. The hybrid loss helps to overcome class imbalance problem and allowing the network to learn poorly classified defects. The dataset for training and testing is collected from real industrial environment. The proposed network is evaluated on defected and defectfree Photovoltaic EL images. Overall results show that the proposed method is superior to the former methods and can contribute to the establishment of more efficient and robust methods for defects inspection in polycrystalline solar cells. Further results show that the proposed method is fast and can be used for real industrial applications.