Review: Application of Convolutional Neural Network in Defect Detection of 3C Products

Based on the rapid development of semiconductors, integrated circuits and the Internet. 3C products such as computers, tablets, mobile phones and smart TVs have become an indispensable part of people’s lives. With the prosperity and development of the 3C product market, the demand for the quality of display panels and related detection technologies are increasing. As the iconic network of deep learning, has been extensively studied in the field of image recognition and defect detection. Based on the development of CNN, this article summarizes the defect detection method of 3C products by CNN with different depths. First, we reviewed the origin of CNN and its structural components, then introduced the upgrade and improvement of important components, and finally introduced and compared the applications of CNN with different depths in defect detection. Through the comparison and summary of the effect of defect detection, we analyze the opportunities and challenges of different CNN frameworks, and exhibit the strategies for different application scenarios.

With the continuous increase of 3C products outputs and the improvement of quality requirements, product quality control has become a challenging task. In the surface inspection of 3C products, it is extremely difficult to inspect the display screen and circuit board. The traditional 3C product surface defect detection method is mainly manual, that is, the quality of the product surface is inspected by the quality inspectors through the eyes [3]. However, the method can not meet the requirements of modern industry for high-speed and precision detection due to its strong subjectivity, large uncertainty, and low efficiency. With the development of the Industrial Internet of Things and artificial intelligence, the fourth industrial revolution, Industry 4.0, is developing rapidly [4,5]. On the basis of advanced digitization of factories, the combination of Internet technologies and future-oriented technology in the field of smart objects has created a new fundamental paradigm shift in production [6]. With the advent of Industry 4.0, the production process has been given an intelligent cyberphysical system that can generate a large amount of streaming sensor data [7]. With the introduction and development of "Industry 4.0", intelligent manufacturing, and "Made in China 2025", the production process has been endowed with an intelligent cyber-physical system that can generate a large amount of streaming sensor data, industrial product defect detection has also received more and more attention [8]. Machine vision has become more and more widely used in production lines with its real-time, high efficiency and accuracy.
But for machine vision, detecting surface defects of reflective displays is a challenge. When the camera is placed vertically on a transparent part, it may consider the reflected light spot to be a defect. The illumination of transparent parts is another important issue. Glare will appear when the light is high. If the light is very low, there will be a situation where the defect cannot be detected. Even if the light source is sufficient, the illuminance on the glass surface usually fluctuates [9]. In order to ensure production quality, quality inspection must be added in many production links of PCB [10]. Common surface defects of mobile phone screens, such as scratches, debris and dirt, should be identified and removed in real time during the production process. Some sizes of the defect are significantly small (about 0.05 mm), which complicate the defect detection [11].

B. RELATED WORKS
With the development of machine vision technology, researchers use image processing to detect screen defects of 3C products. Liu et al. [12] proposed an algorithm combining the line intercept as the threshold and particle swarm optimization, which solved the problem of low detection accuracy of mobile phone screen defects. Gao et al. [13] improved the median filter algorithm and utilized the image processing algorithm to preprocess the defective image. In addition, they proposed an image difference algorithm based on fast image matching for detection. Zhang et al. [14] posed an improved differential image detection method for mobile phone screen defects to improve detection efficiency and accuracy. However, the performance of pure image processing technology cannot achieve satisfactory detection results [15].
Convolutional Neural Networks (CNN), a research hotspot in the field of image detection and pattern recognition, has received more and more attention from scholars [16]. Ma et al. [17] designed the CNN based on the GoogleNet network, which greatly reduced the number of parameters without affecting the prediction rate. Experimental results showed that the defect detection rate of the designed CNN could reach 99.5%. Wei et al. [18] proposed a multivariable CNN-based defect detection method in the production process of cover glass, touch screen and mobile phone display under parallel light sources. Experiments verified that the method had higher accuracy, better stability and faster speed. The CNN architecture mainly includes weight parameter sharing and pooling operations which is more complicated. However, CNN is easy to train, and the learned features are translation invariant [9].
The main content of this research is to review the surface defect detection methods with different levels of CNN in 3C products such as glass display screens, PCBs and TFT-LCD screens. Then, the detection methods and the problems of various methods are summarized and discussed. First, we outline the types of defects in 3C products and the comparison of different detection methods in the context of Industry 4.0. Subsequently, the improved CNN structures in various scenarios are compared, and the technical limitations and detection performance of different methods in practical applications are shown. This research summarized and compared noteworthy research based on CNN in recent years to overcome the challenges of surface defect detection in 3C products.
This literature review mainly focuses on the following topics: 1) What are the defect types of parts in 3C products? 2) What are the main methods of defect detection in the production line?
3) Compared with manual inspection and machine inspection, what are the advantages of machine vision? 4) What are the advantages of CNN's application in 3C product defect detection? 5) Summary of CNN's in-depth development framework and its application in 3C product defect detection. 6) What is the future development trend of CNN in the 3C industry?

C. METHODOLOGY
In this study, published literatures were selected from databases such as Web of Science, Scopus, CNKI, Google Scholar and the Engineering Index, as well as publishers' databases such as Elsevier, IEEExplore and Springer. These literatures cover the problems related to traditional machine vision defect detection and deep learning detection methods in 3C products. Among them, Web of Science, IEEExplore and other academic databases are rich in literature with a wide range of research. As a major producer of 3C products, China has many researchers studying the testing of 3C products. There are also many excellent Chinese journals and dissertations in the research results included by CNKI, so we also chose a small number of classic Chinese journals. Glass screen, TFT-LCD and PCB are the main parts of 3C products, which directly affect the quality and use of products. Therefore, when searching literature, keywords related to traditional machine vision (such as "machine vision", "visual algorithm" and "image processing") and keywords related to deep learning (such as "deep learning", "CNN" and "GAN") are used. Categories and application areas of 3C products (i.e., "mobile phone screen", "TFT-LCD", "PCB", "Defect detection" and "Defect classification").
The current literature review covers sources including journal articles, conference papers, dissertations and a small number of classical Chinese journals. The publication year is 2010-2021, because the past decade is a period of rapid development of deep learning. During the review process, we have summarized and analyzed the classic literatures on related topics, because a more detailed literature review will help scholars to the research work of this research field. However, due to the limitation of space, we reduced the description of the literature less related to the research content. Therefore, this paper mainly reviews most of the representative literatures, analyzes the literatures with significant influence in detail, and summarizes the literatures with insufficient influence.
The selection of references for this paper is based on the following considerations： 1) Research published in peer-reviewed academic journals; 2) Literatures published after 2010; 3) This paper aims to use machine vision and deep learning methods to study the field of 3C product defect detection; 4) Research on improvement method of 3C product defect detection; 5) Research through experiments, analysis, evaluation and modification.

A. OBJECTS
As an important interactive component of smart mobile devices, the display quality directly affects the operating status of the device, so the quality of the glass panels and screens of 3C products is very important. At the same time, as a provider of electrical connections for all electronic components, printed circuit boards have the advantages of small size and high circuit reliability. Therefore, this paper focuses on three product parts, glass display screens, TFT-LCD, and PCB, as the main object of research. The surface defects of 3C products are shown in Figure 2. Figs. 2 (a1) -(a6) are common defects of glass display screens. During the glass screen production process, production defects such as cracks, floating objects, and point defects will appear [11]. These defects affect the overall appearance of the product and also affect the normal use of users. Therefore, the factory attaches great importance to the inspection of the surface quality of smart devices. Figs. 2 (b1) -(b6) are common defects of PCB. Due to the production environment or process problems, there are often open circuit, connection [19], wrong hole, miscellaneous copper [20] and other problems in the production of PCB. Figs. 2 (c1) -(c6) show that TFT-LCD often has defects such as color difference, uneven ring, uneven gravity, etc. during the production process [21].

B. METHODS COMPARISON
Traditional 3C product surface defect inspection methods are mainly divided into manual inspection and machine inspection. Table Ⅰ compared three common defect detection  methods. Compared with traditional machine vision algorithms, CNN feature extraction and classifiers can be automatically trained end-to-end from the input image. The method overcomes the shortcomings of traditional methods [25]. For example, CNN can extract image features more accurately for training, and has better robustness than traditional machine vision algorithms. However, the 3C product defect detection based on CNN is facing several important challenges.
Traditional machine vision has problems such as poor detection feature extraction robustness, high time complexity, and window redundancy [26]. But detection methods based on deep learning can effectively improve these problems. At the same time, the neural network is driven by big data, and the detection effect can be improved by increasing the data set [27]. However, it is difficult to obtain so many defective samples on actual industrial production lines. It needs a lot of data samples.

III. CNN IMPROVEMENT RESEARCH
In 1998, LeCun [28] proposed a CNN structure based on gradient learning (the modern structure of LeNet-5), which was successfully applied to handwritten digit recognition. Compared with the traditional multi-layer neural network, CNN mainly adds three basic concepts: local acceptance domain, shared weights and pooling layers. Compared with a fully connected neural network, the widespread use of shared weights reduces the number of degrees of freedom parameters without loss of expressive power. This allows CNN to be trained by simple gradient descent [29]. Therefore, the appearance of LeNET-5 laid the foundation for the application of CNN in the field of image recognition.
Since the AlexNet proposed by Krizhevsky et al. [30] in 2012 won the ImageNet image classification competition, CNN has become the core algorithm in the field of image classification and opened a new chapter in deep learning. This section describes the improvements of CNN key components from three aspects: convolutional layer, pooling layer and fully connected layer.

A. CONVOLUTIONAL LAYER
Convolutional layers are one of the most important parts of CNN. The main function is to extract sample features from the input image. It consists of multiple filters and is used to calculate different feature maps. As the first layer of CNN, the convolutional layer is the core of CNN. Most calculations are performed in the convolutional layer. The traditional calculation method of convolution can be expressed by Equation (1). Convolutional networks are usually stacked alternately by convolution and pooling, and finally connected to complete the model construction. The convolution is multiplied by the linear filter corresponding to the position of the feature map and summed, and then nonlinear activation is performed to obtain the feature map. Later, the researchers conducted a more in-depth study on the convolutional layer. Wei et al. [31] found that dilated convolution can effectively integrate the surrounding environment by expanding the receptive field size of the kernel, providing a promising solution. By expanding the expansion ratio of a 3×3 kernel from 1 to 3, the discriminating ability of the convolution kernel can be enhanced. This proves that dilated convolution uses class activation mapping [32] to generate location maps at different dilation rates, which can improve the recognition ability of low-response target regions. Lin et al. proposed a network in network (NIN) model [33]. The idea is to replace the traditional convolutional layer with a multilayer perceptual layer, which consists of multiple fully connected layers containing nonlinear activation functions, which makes it more versatile. For the multi-layer perception layer, the calculation formula of feature mapping is as follows: , , = max( , , 0) (1) here (i, j) is the pixel index in the feature map, x (i, j) represents the input patch centered at (i, j), and k represents the channel index of the feature map.
, , 1 among them, n is the number of layers of the multilayer perceptron, and the activation function is Relu function.
In order to extract high-level features in CNN, a common method is to perform a deeper convolution, but the problem is that the network becomes larger consequently. GoogleNet's Inception module draws on the idea of NIN's multilayer perceptual layer, reducing parameters and extracting highdimensional features under the premise of ensuring the quality of the model [34]. The idea of the Inception module is to cluster sparse matrices into denser sub-matrices to improve computational performance.
When training a deep neural network, the input of each layer will change with training, because the change of the parameters of the previous layer will cause the input of this layer to change. This phenomenon is called internal covariate shift (ICS) [35]. Inception V2 proposes the use of batch normalization (BN) to solve this problem, which can alleviate ICS and speed up the training speed of deep neural networks. After using BN, a higher learning rate can be allowed without the risk of divergence. BN also has the effect of a slight regularization model. On the other hand, even if the sigmoid activation function is used after using BN, the gradient will not disappear.
Inception V3 [36] turned 7×7 convolutions in GoogleNet into a two-layer concatenation of 1×7 and 7×1. In the same way, the 3×3 convolution is changed to 1×3 and 3×1. This method not only speeds up the calculation, but also increases the nonlinearity of the network and reduces the probability of overfitting. The method of Inception V4 [37] was to add ResNet's residual module on the basis of the original inception, but the residual module of ResNet is not to increase the accuracy by increasing the depth, but to increase the calculation speed.

B. POOLING LAYER
Pooling layer is generally used after convolution layer. It can simplify the information output from the convolutional layer and reduce the dimensionality of feature mapping [38]. There are two kinds of classic pooling operation: average pooling and maximum pooling. As shown in Fig. 3, the maximum pooling operation is a pooling operation that uses the maximum value in the data block as the output and extracts the maximum response of the feature plane. The average pooling operation is the operation of outputting the arithmetic average of the elements in the block as a function and extracting the local corresponding average of the feature plane. The feature of the maximum pooling operation is to retain the image texture features, while the feature of the average pooling operation is to retain the overall data features. In addition to the common average pooling and maximum pooling, scholars have also proposed improvements in the pooling layer to improve network performance. Krizhevsky [30] et al. proposed overlapping pooling in the AlexNet network. Compared with the traditional nooverlapping pooling, the use of overlapping pooling can not only improve the prediction accuracy, but also reduce overfitting to a certain extent. He [39] et al. proposed a spatial pyramid pooling method in their network, which could convert the convolutional features of images of any scale into the same dimension, which not only allows CNN to process images of any scale, but also avoids the problems of cropping, warping. The ROI pooling proposed by Girshick et al. [40] in Fast R-CNN is an operation widely used in target detection tasks using CNN, which greatly improves the processing speed. Spatial pyramid pooling (SPP) uses multiple pooling operations of different sizes for the same input, and stitches the results of different scales as the output. And ROI pooling can be regarded as a single-scale SPP, and only one pooling operation is performed for an input.

C. ACTIVATION FUNCTION
The activation function is a vital part of the neural network, it can improve the nonlinear expression ability of the model. The activation function is divided into linear and nonlinear activation functions. Different types of activation functions such as sigmoid, tanh, ReLu, lReLu, pReLu, etc. can be used according to different situations. Among them, the mathematical expression of the sigmoid function is: the range of σ(x) is 0-1. When the ultimate goal of the network is to predict probability, it can be applied to the output layer. The activation function of tanh is also called the hyperbolic function. Similar to sigmoid, tanh also compresses a real value. Unlike sigmoid, tanh has zero mean in the output range of -1 to 1. The mathematical expression of the tanh function is: Both tanh function and sigmoid function have the problem of gradient disappearance. In order to solve the problem of vanishing gradient, the nonlinear activation function of linear rectification function (ReLu) solves this problem well. It is better than the two activation functions, and it is also the most widely used activation function today.
The mathematical expression of the ReLu function is: the so-called nonlinearity means that the first derivative is not a constant. The definition of ReLu is max (0, x), so the derivative of ReLu is: Obviously, the derivative of ReLu is not constant, so ReLu is nonlinear. ReLu can make the output of some neurons zero, by diluting the network and reducing the mutual dependence of parameters, to achieve the purpose of reducing the occurrence of overfitting.
However, since ReLu has a gradient of 0 when the neuron is inactive, this may cause the initially unactivated unit to never activate. For this reason, in 2013 Maas [41] proposed leaky ReLu, which compresses the negative part instead of mapping it to a constant zero point, so that when the neuron is in an inactive state, it allows a small non-zero gradient. The mathematical function of leaky ReLu is: In 2015, He [42] proposed parametric ReLu with adaptive learning parameters. Its mathematical function is: In 2017, inspired by the use of sigmoid function for gating in LSTM and highway network, Google researchers proposed the swish activation function [43]. Like ReLu, the swish function has a lower bound. Unlike ReLu, swish is smooth and monotonous, and its mathematical function is:

D. IMPROVEMENT OF CNN ARCHITECTURE
Scholars have begun to study the depth of CNN, and the network architecture has also begun to develop in a deeper and deeper. If there are more convolutional layers, CNN can easily detect complex objects or patterns [44,45]. By increasing the depth of the CNN, the non-linearly increased objective function can be better approximated to get better results [46]. This section reviews the improved research methods of CNN from the aspect of network architecture. In 1998, LeCun [28] proposed the LeNet-5 architecture and applied it to the recognition of handwritten digits, laying the foundation of CNN in the field of image recognition. LeNet-5 consists of two convolutional layers, two pooling layers and two fully connected layers. Each convolutional layer has a different number of 5×5 filters. There are 6 and 16 filters on the first and second layers respectively. Feature map uses the sigmoid function to activate each time it passes through the convolutional layer, and then uses the mean pooling operation. Since 2012, CNN has begun to rise, and the development of CNN architecture is shown in Table Ⅱ. In 2012, Krizhevsky [30] proposed AlexNet and used ReLu as the activation function in the network. Successfully solve the problem of gradient disappearance in the deep network, Sigmoid is used as the activation function. As shown in Fig. 4, the architecture of AlexNet consists of 5 convolutional layers and 3 fully connected layers, and the output of each fully connected layer is 4096 neurons. The third, fourth, and fifth convolutional layers are directly connected, and there is no pooling layer and normalization layer in two convolutional layers. The third convolutional layer has 384 cores of size 3×3×256, which are connected to the standardized, merged output of the second convolutional layer. The fourth convolutional layer has 384 cores with a size of 3×3×192, and the fifth convolutional layer has 256 cores with a size of 3×3×192. In order to avoid model overfitting, AlexNet uses dropout to randomly ignore some neurons during training. VGGNet (Visual Geometry Group Net) [45] is a classic convolutional neural network in 2014. Except for the addition of a convolutional layer, VGGNet is similar to AlexNet. VGGNet consists of 10 convolutional layers, 5 pooling layers and 3 fully connected layers. Simonyan et al. [45] used a very small convolution filter to achieve significant improvements to the current technical configuration by increasing the depth of the weight layer to 16-19 layers, proving that classification tasks can increase the depth of CNN by using a small convolution kernel to improve accuracy. The structure of VGGNet is very simple, and the entire network uses 3×3 size convolution kernels and 2×2 pooling size. Compared with AlexNet, the simpler structure of VGGNet has better performance.
In 2015, the Google team used the idea of a small network to conduct research on GoogleNet, which replaced the traditional convolution operation with a point-by-point group convolution operation, thereby improving computational efficiency. The parameters of this network are 12 times less than that of the 2012 championship team, but it is more accurate. GoogleNet adopts the method of NIN to improve the performance of the network. This method can be seen as an additional 1×1 convolutional layer plus a ReLU layer. The most important thing about NIN is dimensionality reduction, which solves the computational bottleneck and thus solves the problem of limited network size. In this way, the depth and width of the network can be increased without significant performance loss.
In 2015, ResNet won the first place in the ImageNet competition classification task. Deep convolutional networks have added more possibilities to the field of image classification [55]. Deep networks naturally integrate features and classifiers in an end-to-end multi-layer manner, and features can be enriched by the depth of stacked layers. However, the number of stacked layers is not the better. Recent years, scholars have found that too many stacked layers will cause the problem of gradient explosion [56]. Gradient explosion has hindered convergence from the beginning. This problem is because deepening the network will make the optimization of the stochastic gradient descent algorithm more difficult, and the network parameters cannot be updated, which makes the network training effect worse [57]. He et al. [49] proposed a deep residual network composed of many residual network structures, namely ResNet. The network using the residual structure with short connection is similar to performing the same mapping, so that the deep convolutional neural network can obtaining the front layer gradient. Then, the emergence of the deep residual network has improved the representation ability and learning ability.
DenseNet [52] is another deeper convolutional neural network. As shown in Figure 5, like the GoogLeNet network is composed of inception modules, the ResNet network is composed of residual blocks, and the DenseNet network is composed of dense blocks. Each layer gets additional input from all previous layers, and transfers its own feature map to all subsequent layers. Using concatenation, each layer is receiving "collective knowledge" from the previous layers. It has the following advantages: (1) DenseNet has fewer parameters than ResNet, so it is easier to calculate.

E. SUMMARY
Compared with traditional learning techniques, CNN is more scalable because it can obtain higher accuracy by increasing the number of network layers and changing the size of the training dataset. Because the characteristics of natural images are very complex, a large number of parameters design and deep network models are required. At the same time, researchers have also done research to reduce network parameters, such as removing the last fully connected layer and replacing it with global average pooling. Currently commonly used CNN models such as VGG Net, Google Net, ResNet, DenseNet, etc. are designed for the Image Net data set, and higher performance can be obtained by increasing the number of convolutional layers.

IV. APPLICATION OF CNN
As a deep neural network model, CNN performs well in the ImageNet challenge [29]. Since the emergence of LeNet-5, the architecture of CNN has basically been established [57]. As shown in Fig. 6, it is the detection process of defect detection in 3C production. After the first two steps of defect feature extraction, the feature information is input to the most classic shallow CNN in the last step. The architecture has a VOLUME XX, 2017 8 data input layer, two convolutional calculation layers, two pooling layers, and a fully connected layer. Limited by the software and hardware facilities at the time, LeNet-5 did not perform well in large-scale data set classification tasks. Traditional CNN is also widely used in the field of industrial product inspection, but its sensitivity is relatively low, and it is difficult to detect small defects in 3C product inspection.

A. APPLICATION IN MOBILE PHONE GLASS PLATE
In the quality inspection of 3C products, defect inspection is one of the most important links in the manufacturing process of touch screens [58]. ùaban et al. [59] proposed a fast and effective glass surface defect detection and segmentation method BiasFeed CNN through comparison with traditional CNN experiments, which solved the problem of the transparency and reflection characteristics of the glass surface. They converted the single-value bias input of the traditional CNN algorithm into a bias template. Adding a bias template solves the problem of light fluctuations in the lighting system. Compared with traditional CNN, its specificity and accuracy are improved. Zhao et al. [60] proposed an automatic scratch detection method (ScratchNet) that combines the LeNet-5 structure and the VGG network convolutional layer. They used the method of connecting two main modules in series to optimize CNN for small target defect detection. When using the same data for training, the accuracy of LeNet-5 was 95.97%, and the accuracy of scratchNet was 96.35%. Jin et al. [61] proposed a multi-channel self-encoding convolutional network (AECNN) model to deal with the problem of false detection due to small difference in feature space in glass detection. Generally, in order to make the network achieve better information capture ability, it is necessary to increase the number of convolution kernels in the network, which also makes the training time longer. Therefore, they introduced the unsupervised convolutional autoencoder into the convolutional neural network to reduce the training time. In addition, to prevent the network from overfitting in defect recognition, the end classifier was changed to a fuzzy support vector machine. The final detection accuracy rate of this model is increased from 92.6% to 97%. Less data sets, reliability needs to be improved.
The method is not end-to-end.
The recognition rate of bubbles and tumors is low.
As shown in Table Ⅲ, the detection accuracy of the above method for mobile phone glass has been improved, and the real-time performance of the experimental results were high, but the number of detection samples were small.

B. APPLICATION IN PCB
With the improvement of the 3C products quality, the surface defect detection of printed circuit boards (PCB) has become an important issue. Since PCB defects are usually intensively occurring, this detection is a multi-label classification problem. In response to this problem, Zhang et al. [20] proposed a multi-task CNN model, which has three blocks, and each block include a convolutional layer, an activation layer and a maximum pooling layer. Finally, six types of classification were performed through the fully connected layer. The model with 1200 data samples were trained according to different types. According to experiments, the accuracy of the model reached 92.86%. The model proposed by Zhang and Ma et al. [62] was compared in multiple categories, and the results are shown in Fig. 7.
Among them, Ma uses a defect detection method based on machine vision. The overall accuracy of Zhang's model is much higher than that of Ma's. Adibhatla et al. [63] used a large number of images to train CNN to classify defective or intact PCBs. The network has 60 million parameters, 500,000 neurons, and is composed of 5 convolutional layers. The convolutional layer is followed by the largest pooling layer, and the last is two fully connected layers. After training, the method has a higher accuracy rate. The overall accuracy of PCB defect classification has reached 85%. Wang et al. [64] proposed a precise PCB defect recognition algorithm based on CNN. The model performs a differential operation on the reference image to find the defect area, and batches normalize the PCB defect images. The model chooses ReLu and Maxpooling as the activation function and down-sampling methods respectively. Finally, the model used the softmax regression classifier for training and optimizing CNN. Experiments showed that the correct recognition was significantly improved, and the detection accuracy of 10 types of PCB defects was as high as 96.67%.

Figure 7. Comparison of PCB model detection defects
As shown in Table Ⅳ, the above detection methods can realize multi-task classification detection aiming at the diversity of PCB defects. Compared with machine vision method, CNN method has higher accuracy, and the accuracy rate can reach 92%-97%. However, the detection time of the above methods is long, the real-time performance and the number of detection datasets need to be improved. The accuracy rate needs to be improved.

C. APPLICATION IN TFT-LCD
Traditional image processing algorithms are prone to problems such as missed detection and misjudgment in the recognition of circuit defects, He et al. [65] compared the use of CNN in electronic circuit defect recognition in industrial inspection. First, the input image was preprocessed by histogram equalization. Second, the image features were extracted on the 8-layer CNN structure. Finally, the softmax classifier is used to realize the recognition and classification of image features. After experimental verification, the model algorithm had high accuracy, robustness and generalization ability, meeting the needs of industrial testing. They compared the method with Faster RCNN, RCNN and Deformable Part Mode (DPM), as shown in Fig.8. It can be seen that the accuracy of the Depth-2 is higher than that of other models.

D. SUMMARY
To sum up, compared with traditional machine vision methods, the defect detection model based on CNN improves the accuracy, which can reach 92%-98%. Multi-task classification can be realized, and the overall accuracy is improved, but the detection results of some defect types need to be improved. At present, the test object of the experiment is relatively simple, and the training dataset of the model is limited, which affects the accuracy of the model to a certain extent. Data sets similar to the actual production line products should be added. They still have great progress to make and a long way to do in the detection of small defects. The real-time performance of detection needs to be improved. The detection model based on shallow CNN can not fully meet the requirements of defect detection. In order to improve the shortcomings of CNN detection model, scholars constantly explore new technologies and methods.

V. APPLICATION OF DEEP CONVOLUTION
Looking back at the development of deep convolutional neural networks in the field of image recognition, we can clearly find that the expressive power and feature extraction capabilities of DNN increase with the increase of the depth of the network increase. Compared with shallow neural networks, the performance of DNN is further improved [30]. Since the AlexNet proposed by Krizhevsky et al. won the championship in the ImageNet competition, scholars have found on the exploration of deep convolution, and it has also been widely used in the detection of 3C products. As the number of layers of the convolutional network increase, its ability to extract semantic features is significantly enhanced. However, the increase number of layers is also accompanied by the problem of the gradient disappearance. After the researchers' exploration, it is observed that the phenomenon of the disappearance of the gradient can be solved by the residual network and the dense network. With the emergence of ResNet and DenseNet, researchers began to conduct more indepth research in the field of defect detection.

A. APPLICATIONS ON THE PHONE SCREEN
Based on the detection of glass cover defects of smart device, Park et al. [25] proposed a multi-channel defect detection structure 4-DarkNet, which was based on the model superposition in the machine learning integration method. It combined a variety of classification or regression models. They used DarkNet-19 as a defect classifier, with a total of 19 convolutional layers and 5 pooling layers. They also applied the weighted average method of the stack model to enable the independent classifiers in the detection structure to perform best. Although the structure is relatively large, it has the advantages of fast speed and high precision, and is suitable for defect detection on the production line. In view of the multiple types of problems on mobile phone screens, Li et al. [22] proposed a detection model based on a region of interest algorithm. The model structure added a multi-layer perceptron (MLP) and deep learning to the ROI and deep learning algorithm. Using different deep learning models such as VGG-16, ResNet and GoogLeNet to detect with 400 sample, the results reached an accuracy rate of more than 97%. Chen et al. [23] designed a defect detection method based on CNN to solve the problem of the difficult appearance detection of smartphone protective screens in the production process. The method first divided the sample image into 256×256 pixel sample images, and then used 22 layers GoogleNet was trained and fine-tuned. In order to make the network model less space and time consumption, the hidden layer with more parameters was deleted through multiple adjustments of parameters, and the 1000-dimensional vector of softmax was adjusted to a three-class three-dimensional vector. Finally, through five experiments of fine-tuning, the model has obtained good detection results. Ma et al. [17] proposed a mobile phone surface defect detection method. An industrial line scan camera was used to obtain the original surface image of the mobile phone. Through the preprocessing steps proposed in this article, the obtained image is automatically divided into specified sizes. The trained CNN can be combined with sliding window technology. Experiments showed that the defect detection rate can reach 99.5%.
Among various glass defects, the detection of dent defects is one of the most difficult because of its small depth changes and smooth edges. In the defect detection system based on machine vision, the dent image has the problems of uneven gray scale and low contrast. Wang et al. [66] proposed a dent defect detection method based on deep convolutional neural network for such problems. By improving the DenseNet-121 [67] model, they designed a compact model that can be meet the requirement in real-time production. However, as the number of network layers increases, there is a problem on gradient disappearance, which allows shallow features to be discarded, resulting in unfavorable detection of small dent defects. The fusion strategy proposed by DenseNet can solve this type of problem well. Wang chose DenseNet as the infrastructure, which can overcome the data dependence problem of the DCNN model and significantly improve the recognition accuracy. Experimental results showed that this method can further improve the recognition accuracy of the dent detection task with 85.42% on 70 test images. In future work, this method can be applied to the formula against network (GAN) to further improve the robustness. In order to quickly and effectively detect and identify touch screen glass defects, Zhang et al. [26] proposed a detection and recognition method based on Mask R-CNN technology. This method used Mask R-CNN as the basic model and used the multi-picture stack method to obtain sample data, and then process, label, and sample amplification. Since the dataset of this experiment is relatively small, they tested three different networks (VGG16Net, ResNet50+FPN, ResNet101+FPN) in the feature extraction stage. The experimental comparison showed that the backbone network detection effect of ResNet50+FPN is better, and the accuracy rate was high and it had good robustness compared with traditional methods. Scratch, dent, chips 4220 99% Effective use of multiple channels to measure data.
Accuracy has improved, but realtime performance needs to be improved. ROI+MLP (16 floors) [22] Scratches 4600 97% Optimize the detection method for small objects.
Real-time performance needs to be improved.

98%
The model is more lightweight and the training speed is faster.
There is a difference between the detection target and the product defect. In summary, as shown in Table Ⅴ, these detection models based on DCNN have higher accuracy than shallow CNN models, ranging from 96% to 99%. These models have high real-time performance, and the ability of processing datasets has been improved, which is also an important factor affecting the accuracy.

B. APPLICATION IN PCB
Zhang et al. [68] proposed an improved defect detection method for bare PCBs, using VGG-16 as the basic network for feature extraction, which is achieved by learning deep identification features. It reduced the high requirements for deep learning methods for large datasets. This method first used artificial defect data samples and expanded datasets, then adopted a deep pre-trained CNN to learn defect features, and finally utilized a sliding window to further locate defects. This algorithm is helpful to establish a robust model for multi-class recognition tasks. Cheong et al. [69] introduced a PCB automatic component recognition system based on CNN. In addition, the defects of PCB components were also localized. A simple component recognition classifier based on CNN was developed, and the pre-trained model was used for transfer learning. Pre-trained models, such as VGG-16, Inception V3 and DenseNet-169, are used to study which model is best for component recognition. The test results showed that using transfer learning in VGG16, the best result obtained was 99% accuracy, and the main accuracy of the system could reach 96.54%. Volkau [70] proposed a variant of transfer learning, which includes a combination of unsupervised learning used on VGG16 and pre-trained on ImageNet weight coefficients. The goal is to extract significant semantic features from normal samples without supervision. To demonstrate defect detection, they utilized a set of PCBs with different defects scratches, missing gaskets, extra holes, frayed, and damaged PCB edges. The trained model can cluster the normal internal representation of PCB features in the high-dimensional feature space, and locate the defect blocks in the PCB image according to the distance from the normal cluster. Preliminary results showed that more than 90% of defects can be detected.
As shown in Fig. 9, Xia et al. [71] combined SSIM and MobileNet-V3 to propose a new PCB defect detector structural similarity index (SSIM-NET). Compared with YOLO-V3, Faster-RCNN, and tiny defect detection network (TDD-NET), SSIM-NET has higher accuracy and speed. The method has two stages: the first step is to adopt SSIM to detect suspicious areas; the second step is to use the latest lightweight backbone Mobilenet-V3 to classify the suspicious areas. After testing, the accuracy of the model reached 97.06% and the speed of 60fps, achieving real-time detection and high precision in the test set. Ding et al. [72] proposed TDD-NET in which online hard case mining was used throughout the training phase to improve the quality of recommendations from the region of interest and thus make more effective use of data information. In order to reduce redundancy, nonmaximum suppression was adopted in the proposed regions according to the classification score. TDD-NET integrated the multi-scale feature fusion strategy to obtain strong features in structure and enhance the ability to detect minor defects, with an average accuracy of 98.90%. To sum up, as shown in Table   Ⅵ, these models not only improve the accuracy, but also reduce the requirements for large data sets, because the feature extraction ability of the models has been improved.

C. APPLICATION IN TFT-LCD
Aiming at the defect ii s of TFT-LCD circuit, He et al. [73] proposed an improved Faster R-CNN algorithm to detect defects. They adopted different convolution kernel sizes and network layer depths to test the detection performance of the model. After testing, the 16-layer neural network structure had achieved good detection results, further improving the accuracy and practicality of the neural network in the field of automatic detection. Kim et al. [74] used VGGnet to detect TFT-LCD defects and set the first convolutional layer of VGGnet into two spaces. In addition, instead of using a 2×2 maximum pooling layer on the last pooling layer, a global average pooling layer is used, that is, the elements of each channel are averaged. After they slightly adjusted VGGnet, the model reduced the number of parameters and learning time. As shown in Table Ⅶ, these models have improved data processing capabilities, real-time performance has been improved, and can be adapted to the detection of production line defects.

D. SUMMARY
To sum up, the aforementioned deep convolutional network model is widely used in industrial manufacturing and image classification and can achieve good performance. At present, the commonly used CNN architectures for defect detection are GoogleNet and VGGNet. According to researchers' studies, the nonlinearity of the network increases as the depth increase, and at the same time it is closer to the objective function to obtain a better feature representation. However, as the depth increases, the system structure becomes complicated and cumbersome, and the real-time performance is also weakened. Therefore, how to solve the problem of the cumbersome system structure is also worthy of attention. Scholars use the residual structure of the residual network to establish models and conduct experiments, which proves that the residual network is also feasible for defect detection in the 3C industry [49]. Compared with the ordinary network, the residual network introduces one or more jump connections, which can make the information of the previous residual block flow into the next residual block unimpeded. This improves the information flow and avoids the problem of gradient disappearance and degradation caused by excessive network depth.

Ⅵ. THE APPLICATION OF GAN IN 3C INDUSTRY
The idea of confronting network comes from the twoperson zero-sum game in game theory, which is equivalent to the two-party game of minimization and maximization [75]. The deep learning model is driven by big data, and the training effect depends on the sample size, and the training effect is proportional to the sample size [76]. However, it is not easy to obtain a large number of defective samples from an industrial production line [77]. As shown in Fig. 10, scholars can generate random defect samples through GAN. Li et al. [78] introduced another CNN called deep convolutional generative adversarial networks (DCGAN). Through training on various image data sets, they proved that it was powerful candidate for unsupervised learning. DCGAN is an extension of GAN, with convolutional network as discriminator and deconvolution as generator. It can automatically extract and fuse defect features, expand defect samples. Therefore, scholars began to use GAN to generate data sets to expand the number of samples.
Based on the method of Yuan [79], Lv et al. [80] designed a display glass defect detection model suitable for small sample learning. They designed DCGAN, which introduced residual module to improve the extraction capability of feature extraction network. The system automatically extracts and merges the defect features from the sample images, and expands and generates the defect samples. Then, based on the expanded defect sample data set, the detection model of FAST R-CNN is improved and trained. After the comparison between the original model and the model experiment with DCGAN added, in order to evaluate the test results, "over-detection rate" (ODR), "missed detection rate" (MDR) and "Accuracy" of the data samples were compared and evaluated, as shown in Fig. 11. Finally, the improved detection model obtained better detection results, and solved the problem that the number of defective samples in the industry was small and deep learning required a large number of samples. The experimental results proved the effectiveness and feasibility of combining DCGAN and Fast R-CNN for display defect detection. Lu [57] proposed a mobile phone display TFT-LCD surface defect detection model based on small sample learning. In response to the shortage of negative samples on the actual automated production line, Lu used the collected small samples and the DCGAN network model to generate new negative sample data. The algorithm uses DCGAN to target a large amount of newly generated data, which makes up for the lack of training data and makes the distribution of training data more reasonable.
By sending the generated samples into the model trained by migration learning, the secondary intensive training is performed to obtain better image defect characteristics. In summary, unsupervised learning can extract patterns and structures from raw data without additional information. The introduction of GAN, a defect detection species, solves the problem of lack of actual defect samples. There has also been a major breakthrough in detection accuracy, which can reach 99.23%. But while introducing GAN, the real-time performance of the model is not very ideal, therefore the realtime performance of the model needs to be improved.

Ⅶ. DISCUSSION
With the rapid development of information technology, smart 3C products have become necessities in people's lives. In the automated production of 3C products, the quality of each component of the product must be ensured. At present, defect detection based on machine vision is the mainstream method. Based on the introduction of the development history of CNN, this article combines several representative algorithm improvement models to provide an overview of multiple defect detection methods using CNN. Neural network can solve almost all detection and classification problems, and it is a commonly used image processing technology. CNN's ability to use spatial patterns is particularly conducive to the value of very high spatial resolution data. More and more visualization techniques will not only help explain, but also help learn from these models to improve the efficiency of defect detection in industrial production. In these researches, VGG16 and GoogleNet are the two most commonly used architectures. However, most of them have specific limitations and rely heavily on the size of the data set, image processing and texture. Solving the problem of lack of data sets is still a difficult problem for many researchers. Moreover, the largescale neural network used for deep learning requires a lot of computing resources, which also leads to the inevitable large computing cost. For the application of different types of CNN frameworks, the summary is shown in Table Ⅷ:  (1) Many studies have shown that CNN is superior to simple machine learning methods [61,63,64]. The traditional shallow CNN has the advantages of less time consumption, light and simple network structure, and low hardware requirements. The classic LeNet-5 is often used as the representative architecture of the shallow CNN. Although the LeNet-5 network has been rarely used in research, it has laid a foundation for the development of subsequent convolutional networks. In short, in terms of 3C defect detection, shallow CNN is more robust than traditional machine vision. The convolutional layer can accurately extract image features and improve the accuracy of detection. The small space complexity enables shallow CNN to adapt to the real-time requirements of the production line. However, the ability of shallow CNN to obtain data in the network training process is limited, which directly affects the accuracy of the training model. In future research, scholars should focus on improving the detection capabilities of some small defects.
(2) In the development of depth and adaptability of different structures, the learning ability of CNN has been significantly improved [23,25,71]. Deep CNN benefited from the increase in network depth, and its accuracy and precision have been improved. According to the summary of this article, in the study of 3C defect detection, the most commonly used architectures are VGGNet, GoogleNet and ResNet. As the network depth increases, the system structure becomes complicated and cumbersome, and the requirements for hardware are also higher. It also requires a large number of data samples to improve the precision and accuracy of the system. However, the residual network is composed of multiple shallow networks. It does not fundamentally solve the problem of vanishing gradients, but avoids vanishing gradients. Because shallow networks do not have the problem of vanishing gradients during training, ResNet uses this point to avoid vanishing gradients. In summary, the detection effect of deep CNN is higher than that of shallow CNN, but as the depth increases, the data parameters that need to be processed increase, and the real-time performance required by the production line is not easy to achieve.
(3) GAN has many advantages in defect detection, for example, it can generate real images or videos; its addition can reduce the direct data required. At the same time, GAN also has many limitations. For example, GAN makes the system need more time to train the data; different types of data are required to continuously check the results and training data (whether it is used correctly), and the model is prone to collapse. Therefore, the future research direction of DCGAN must be to solve model collapse, non-convergence and training difficulties. DCGAN replaces the multilayer perceptron in the original GAN with a convolutional neural network in the generator and discriminator feature extraction layer. When the sample data is limited, DCGAN can improve the detection accuracy by extracting and fusing features. It effectively solves the limitation of low detection accuracy of the detection system when the sample is insufficient. At present, GAN is a strong competitor in unsupervised learning technology, and DCGAN will become the trend and main technology in the field of detection in the future.

Ⅷ. OUTLOOK
The proposal of Industrial Manufacturing 4.0 indicates that information technology and intelligent manufacturing will be the core development direction. Research shows that, in the continuous development and improvement, the application of CNN in the detection field has achieved good development, showing the superiority of feature extraction and classification detection. For example, the shortcomings of manual inspection of products have been greatly improved, and the manual inspection of products has been improved. The product qualification rate is high and the production quality is guaranteed. As an algorithm that has attracted attention from scholars since its emergence, CNN has solved to certain extent problems that could not be solved or solved difficult problems before, and greatly improved the efficiency and accuracy in the detection field.
(1) In the research of scholars, various improvement strategies have improved the performance of CNN to a certain extent, but there are still shortcomings. For example, the problem of gradient explosion has not been solved. For the problem of complex multi-level network structure, there are also problems such as difficulty in obtaining training samples and long training time.
(2) In addition to methods based on increasing the depth of convolution, CNN's block-based architecture also encourages learning in a modular manner, thereby making the architecture simpler and easier to understand. The concept of a block as a structural unit will continue to exist and further improve the performance of CNN. So, the specific and localization of 3C detection objects is expected to be better developed.
(3) Deep learning requires a large number of training samples. However, the samples generated by DCGAN summarized in this article still need to be manually labeled, which is time-consuming and labor-intensive. In recent years, supervised learning algorithms have developed rapidly, but it is still unsupervised learning algorithms that really determine the degree of intelligent development. Future research should focus on how to enable machines to automatically learn defect features under unsupervised learning.
(4) Finally, existing defect detection methods are generally carried out on two-dimensional picture samples. Future research can be improved and upgraded on the original basis to realize detection on three-dimensional model, which will enrich and improve the theory and application of CNN. After the application of 3C product testing, the system can directly perform testing on the basis of product parts, which will reduce more inspection time.