Deep Regression Neural Network for Industrial Surface Defect Detection

Industrial product surface defect detection is very important to guarantee high product quality and production efficiency. In this work, we propose a regression and classification based framework for generic industrial defect detection. Specifically, the framework consists of four modules: deep regression based detection model, pixel-level false positive reduction, connected component analysis and deep network for defect type classification. To train the detection model, we propose a high performance deep network structure and an algorithm to generate label data to capture the defect severity information from data annotation. We have tested the method on two public benchmark datasets, AigleRN and DAGM2007, and an in-house capacitor image dataset. The results have shown that our method can achieve the state-of-the-art performance in terms of detection accuracy and efficiency.


I. INTRODUCTION
Surface defect inspection, which examines the surface of products, is an important step in quality control to guarantee the product quality and manufacturing efficiency. Defective items must be detected in time and removed from the production line, otherwise they will seriously affect the subsequent assembly line and lead to a decrease in the level of overall quality [1]. Defect detection by eye-checking has several problems such as low sampling rate, low accuracy and low efficiency, high labor intensity, high sensitivity to human experience, body and emotional status. Automated defect detection based on machine vision is an undergoing trend as it can significantly overcome these disadvantages. Machine vision based defect detection systems capture product surface images and apply image analysis to detect the defects. Fig. 1 shows an example of steel surface image with several types of defects. Surface defect detection is to find whether an image contains defects or not. Ideally, defect detection needs provide the pixel level location of the defects as well as the defect types, which can be used to improve the manufacturing quality and decide the actions to take on the defective product in subsequent pipeline.
The associate editor coordinating the review of this manuscript and approving it for publication was Guitao Cao . Various image processing methods have been proposed to detect defects in images such as thresholding based [3], segmentation based [4] and edge detection based methods [5], [6] using operators such as Sobel [7], Prewitt [8], and Canny [9]. However, this kind of pure image processing methods only works for a few simple cases. In the past decades, feature based methods along with traditional machine learning are widely used to detect defects, in which image processing techniques are used to extract features. Traditional machine learning methods include neural network [10], support vector machine (SVM) [11] etc.. For example, Jeon et al. use Gabor filter and carries out edge-pair detection, and the histogram-based and gradient texture features are input to an SVM to detect the scratches on slab surfaces [12], [13]. This category of methods relies on hand-crafted features from domain knowledge and the designer's subjective experience.
Deep neural networks (DNNs) have gained significant success and achieved the state-of-the-art performance in image analysis and recognition [14]- [18] as deep neural networks are able to construct complex representations and automatically learn a compositional relationship between inputs and outputs, mapping input images to output labels. In surface defect detection, deep learning based methods are the undergoing trend recently, which roughly fall into three categories, 1) image classification based, 2) object detection based, and 3) pixel segmentation based methods. In order to provide high precision defect detection for real application purpose, we adopt the pixel segmentation strategy as it can provide pixel-level defect location. Many proposed deep learning based segmentation methods try to design a complicate network structure aiming to improve the performance of detection power and defect classification [19], [20]. However, such a complicate network needs a lot of data to train, and is difficult to debug and upgrade, which is not suitable for real industrial applications. Therefore, we adopt a design in which the detection and defect classification are separate modules.
Another important aspect is that defect severity information is very useful and often required in industrial defect detection and most of the segmentation based methods label the input image pixels as defective or non-defective and use the output confidence score to indicate the defect severity. Ideally, the confidence score should be linearly correlated with the defect severity, i.e. significant defects should have high confidence. However, this confidence score given by most deep neural networks are often not well-calibrated [21]. We believe the annotated defects are good ground-truth for defect severity prediction. Based on these major considerations, in this work, we propose a generic regression based four-stage framework for surface defect detection in which 1) the first stage is a regression based pixel segmentation module to assign a score to each pixel to indicate the defect severity, 2) the second stage is a filtering module to reduce the number of falsely reported defective pixels, 3) the third stage is a connected component analysis module to form defects from detected defective pixels, and 4) the fourth stage is defect type classification. We also optimize the segmentation deep network structure by reducing the number of layers and filters to meet the balance between detection accuracy and speed. Overall, our work has the following major contributions.
• We formulate the defect detection problem as regression based pixel segmentation and defect classification, which can provide both pixel-level defect severity and defect types at defect level.
• We propose a four-stage defect detection framework which is more generic and flexible for end-user's needs.
• We evaluate the method on two benchmark datasets and one in-house real capacitor image dataset.
The rest of this paper is organized as follows. Section II reviews the related works on this topic. Section III introduces our four-stage framework for defect detection. Experimental results are presented in Section IV. Finally, Section V concludes this paper and discusses the future work.

II. RELATED WORK
Many deep convolutional network based methods have been proposed for defect detection. According to the way of handling the defect detection problem, deep learning based methods can roughly divided into three categories, pure image classification methods, object detection based methods and pixel-level segmentation methods.
Pure classification methods divide the input image into overlapping blocks, and classify the block images into different classes. If a block contains a certain number of defect pixels or more, the block is labeled as defective. The size of the block depends on the input size of the deep learning model, typically are 256, 128 or 32. For example, in [22], the authors use MatConvNet [23] to classify the input pavement 256 × 256 images into defective and non-defective. In [24], Li and Zhao modify the structure of GoogleNet [25] to do the classification. Similar methods include [26]- [29]. Binary classification is quite simple. Multi-class classification is also used to classify image blocks. For example, in [30]- [32], a single CNN model is designed to classify defect images of resistance welding spot, panel glasses, wafer images respectively.
Defect detection can also be treated as object detection problem, which is an important task in computer vision. Its goal is to locate the object with a bounding box and decide the object type. Many deep CNN models have also been proposed to improve the accuracy and efficiency, such as Faster R-CNN [33], SSD [34], YOLO [35]. In [36], Suh and Cha use Faster R-CNN to detect the damages in civil infrastructure. Young-Jin et al. modify the Faster R-CNN using ZF-net [37] to speedup the feature extraction in [38]. Similarly, in [39], the authors modify Faster R-CNN for railway subgrade defect detection. Yiting et al. adopt a MobileNet-SSD framework in [40] to detect defects like breaches, dents, burrs and abrasions on the sealing surface of containers.
Defect detection based on pixel-level segmentation is more generic than the above pure classification based and object detection based method as it can provide pixel level location for defects as well as the defect types. Auto-encoder [41] and FCN (fully convolutional network) [42] are widely used for this purpose. In [19], Tao et al. propose a two-stage algorithm in which the first stage is using two cascaded auto-encoders to segment the defects, and the second stage is a CNN classifier to classify the defect regions cropped from the image using the region information provided by the first stage. Similarly, Qin et al. propose DeepCrack which uses an encoder-decoder architecture to segment pavement image into crack and background [43]. Zhiyang et al. combine a segmentation stage and a detection stage using two separate fully convolutional networks (FCN) [20]. The work of [1] also uses a FCN to segment the input image pixels.
As pointed out previously, these methods try to design a complicate structure to realize both defect detection and classification, and the defect severity problem is not well handled. In this work, we propose a four-stage regression based deep learning method for defect detection, which is FIGURE 2. Diagram of overall method. the first stage is a regression model to predict the defect severity using the annotated defect pixels as the training data. the second stage filters out some defect pixels to reduce the false positives. the third stage forms the defects from the pixels by connected component analysis and the last step is defect type classification.
easy to train and upgrade, and able to provide accurate defect severity information.

1) DEFECT SEVERITY PREDICTION BY REGRESSION
In order to predict the defect severity at each pixel, the first step is a regression model using a deep convolutional neural network (CNN). The image block is divided into non-overlapping sub-blocks with size of 4 × 4 or 2 × 2. Each block is mapped to a number at the output within range of [0,12]. The output size depends on the detection resolution. If the resolution is 4 × 4, which means a 4 × 4 input block is mapped to one output number. Suppose input image patch size is 256 × 256, the output size is 64 × 64 = 4096. Correspondingly, if the resolution is 2 × 2, then the output size will be 128 × 128 = 16384. And further if the resolution is 1 × 1, the output size will be 256 × 256. This will make the computation very intensive.

2) FALSE POSITIVE REDUCTION BY PIXEL FILTERING
The output of the first module might contains falsely reported defect pixels (false positives). The purpose of this step is to filter out those false positives. We observe that pixels with lower values are more likely to be false positives. Therefore, we simply adopt a step function, which means only pixels with values greater than the threshold are kept, i.e.
where I i is the score of pixel i from the first module and T is a tuneable parameter. Rightly after the detection module is trained, the value of T can be determined by maximizing the mean IOU between the prediction and the ground truth.

3) FORMING DEFECTS BY CONNECTED COMPONENT ANALYSIS
In our method, we choose the decide the defect type by using a classification module on the whole defect. After filtering the false positive pixels, connected component analysis is applied to cluster the pixels into defect. The minimum dis-   tance between clusters is set to 4, which means if the distance between two clusters are less than 4 pixels, they will be joined into one.

4) DEFECT TYPE CLASSIFICATION
Defect classification is applied using a deep CNN model to predict the type of the defects. Suppose the data contains N types of defects, the model will output N +1 classes, in which the extra class is the normal class or background. From this perspective, the classification module serves two purposes: decide the defect type and reduce the false positives at defect level. The loss function we use is cross entropy.

B. NETWORK STRUCTURE 1) DETECTION MODEL
The detection model is modified from Resnet18 [44] for regression purpose. The overall network structure is shown in Fig. 3. The structure mainly consists of convolution layer, average pooling layers and two similar residual units, followed by two convolutional layers and a flatten layer before the output. The flatten layer has 4096 number of elements, if the detection size is set to 4 × 4. The output layer is a linear regression which approximates the label by minimizing the L2 loss. Table 1 shows the detailed model configuration. Fig. 4 shows the structure of a residual unit, which mainly VOLUME 8, 2020  consists of BN (batch normalization), convolution and ReLU layers. Table 2 and 3 shows the parameter configuration of these two stage units.

2) CLASSIFICATION MODEL
The defect type classification model is simply the Resnet101 [44]. Considering the fact that in practical defect detection, the defects are less likely to occur, the computation burden of this part is much less than the first module. Therefore, the module can be replaced with even more complicated structures, such as Resnet152 to ensure high classification accuracy.

C. END-TO-END MODEL TRAINING 1) TRAINING DATA GENERATION
In the method, the two CNN models need to trained. The first one is a regression model, and the second one is a common multi-class classification model. Before the training process, the defects in training images need to be annotated as accurate at pixel level as possible. If the data contains multiple types of defects, the defects are also marked using different colors.

a: DEFECT SEVERITY DATA FOR REGRESSION MODEL
In order to predict the defect severity information to endusers, our method chooses to explicitly generate defect severity data from the annotation, unlike the most of existing methods which rely the confidence score of the CNN model. We cut the input image into overlapping patches. For each image patch with significant area of defect, we analyze the image intensities of the defect pixels and the neighboring pixels around the defect. High contrast between the defect and its neighbors means more severe defect. The following algorithm is used to generate the defect severity data. 1) Divide the defect area into non-overlapping cells. If the detection resolution is 4 × 4, the cell is 4 × 4.
where λ is a parameter. 5) Normalize all the cell scores to range of [0, 12]

2) TRAINING DATA FOR CLASSIFICATION MODEL
The classification model is trained after the detection model is finished. This model servers to distinguish among different defect types, as well as between defect and non-defect. For this purpose, the training data contains not only the marked defects, but also the false positives given by the detection model. As shown in Fig. 2, the image patch after connected component analysis contains 4 defects, 3 are false positives. All these 4 defects will be used to train the classification model.
It should be noted that the defects have various shapes and sizes. The most straightforward way is to resize the defect image patch to the size of the classification model. However, this will greatly deform the defect and misguide the model. In our method, we use a more systematic way to generate the data. We use a sliding window of a certain size to cut the image into overlapping blocks. And the blocks with significant area of defect are used to train the model. If the window is not large enough to hold the defect, we double the window size and then down-size the patches to the model input size. With this method, suppose defect d i is cut into a list of blocks (d i1 , d i2 , . . . , d in ), the corresponding class labels given by the model will be (c i1 , c i2 , . . . , c in ). The overall defect class c i is selected by a simple voting strategy which selects the most frequent label.

3) DATA AUGMENTATION
Defect detection in real industrial applications must consider various factors such as the illumination change, the motion of the objects. Data augmentation plays an important role to increase the detection stability and repeatability, which is often required by the detection devices. we adopt a list of such techniques, including image flipping, rotation with random angles, random noise, illumination and contrast change, to increase the data diversity. This will also help reduce the amount of the data labeling, which might be expensive and laborsome. Especially, to increase the prediction stability against the shift variations, we cut a number of blocks from the image around the defect so that the defect will appear at different places in the image patch. This will help reduce the prediction difference when the same product is tested on the device a number of times.

A. IMPLEMENTATION DETAILS
To generate the data to train the detection model, we cut the input images into patches of 256 × 256 using a sliding 35586 VOLUME 8, 2020   2) is 500. The input size of the classification model is 64 × 64. To speed up the training process, the model is previously pre-trained on ImageNet [45] and fine-tuned using the training data.

B. DATASETS
To compare the proposed method with existing works, we use three datasets, namely, two public datasets (AigleRN and DAGM 2007) and one in-house dataset (CapacitorDB) which contains capacitor images for defect detections. Each dataset will be divided into two parts, one for training and the rest for testing. All defect pixels are annotated using different colors to represent the corresponding type.
AigleRN dataset contains 38 pre-processed grayscale images on French road surface, including 991 × 462 and 331 × 462. In this test, we select 24 of them as the training set and 14 as the test set. Fig. 5 shows three example images and defects.
DAGM2007 dataset consists of 10 subsets, each corresponding to a class of defects. Each class has 1000 defect free images and 150 defective images with one defect on texture backgrounds. The image size is 512 × 512. We follow the work of [46] to select the first six classes to conduct the experiment. So, a total of 900 defective images are selected, where 720 images for training and the rest 180 for testing. Fig. 6 shows some example images.
CapacitorDB contains more than 3,200 training images and 639 test images with 673 defects. The dataset consists of 8 types of defects, such as scratch, bubble, damage, and broken covering resin. Fig. 7 shows two sample images.

C. PERFORMANCE METRICS FOR DEFECT DETECTION
Our proposed method provide pixel-level defect location as well as defect type classification at the defect level or object level. Defect detection can be evaluated at pixel level. Predicted pixels that are truly defective are true positives (TP), otherwise false positives (FP). False negatives (FN ) are those marked defect pixels that are missed by the detector. Therefore, precision, recall and F1 are defined as follows The IOU (intersection over the union) value between the ground-truth defect and predicted one can also be calculated by counting the pixels. Defect detection can also be evaluated at defect level or object level in terms of object detection accuracy. In this case, a defect is detected if the IOU between the ground truth and the prediction greater than a certain threshold. A true positive is a true defect that is detected with the correct type. In this setting, recall, precision can be accordingly calculated.
It should be noted that the AR column in Table 5 and 7 is defined by Eqn. (3) [47].
where n is the number of defects. Table 4 shows the comparison of pixel-level segmentation accuracy on AigleRN dataset in terms of precision, recall and F1 score. All the cracks in the images have the same class. Therefore, the classification module is used to distinguish normal and defective pixels. We can see that our method achieves the best performance with precision of 0.9313, recall of 0.9440 and F1 score of 0.9375. Fig. 8 shows the four test examples. Table 5 shows the segmentation performance on DAGM2007 dataset. In this test, we also show the precision, recall and F1 score, which is not provided from the literature. As we can see that our proposed method has the highest AR and Mean IOU scores. The AR score is 0.69, which is slightly better than that of Qiu et al. [46], and greatly higher than the VOLUME 8, 2020   performance of ViDi [52] and FCN [42]. Similarly, in terms of Mean IOU, ours is 0.8450, better than all other methods with a big margin. The above result does not consider the correctness of defect type prediction. Fig. 9 shows the average recall and precision with respect to different IOU threshold on   DAGM2007 dataset. A detected defect is true positive only if the IOU is greater than the threshold and its type prediction is correct. In the figure, the difference between recall/precision with and without class is whether the class prediction is included in the calculation. From the curves we can see that the black and cyan curves fully overlap, which means the class prediction is 100% correct. When the IOU threshold is set to 0.5, the recall and precision is around 0.98 and 0.99 respectively, which means the defect-level accuracy is very high. Fig. 10 shows several test examples in this dataset. Table 6 shows the defect classification performance in terms of confusion matrix. We can see that the type of all the defects in the DAGM2007 dataset are correctly predicted. This is not surprising as the difference between different types of defects in this dataset are quite significant. Fig. 11 shows an comparison of different types of defects.

1) PERFORMANCE OF DEFECT CLASSIFICATION
On the other hand, in the method, we use the classifier to reduce the false positives by distinguishing between normal   image patches and defective ones. So we show the segmentation performance again in Table 7 when the classifier is turned off. From the table, we can see that when the classifier is off, AR, MeanIOU and Precision drops a lot, which means the classification module effectively reduces the number of false positive pixels. On the other hand, the recall score does not change (0.908), which means the classifier does not mistakenly classify the defect pixels to non-defective pixels. Fig. 12 gives an illustration, where the classification module in the second row removes some false positive predictions.

F. RESULTS ON CAPACITOR DATASET
We also test the proposed method on capacitor image dataset from real industrial defect detection. Fig. 13 shows the result. When the IOU threshold is set to 0.5, the recall and precision  with classification is 0.88 and 0.80 respectively and recall and precision without classification is 0.96 and 0.87 respectively. The defect classification significantly influences the recall and precision. The overall defect type classification accuracy is 0.92. We look into the classification result and find that the accuracy of class 8 is only 0.57 out of 28 cases, in which 16 are correctly predicted, 5 are predicted as class 4, 4 as class 7 and 3 as class 2. We look into the datasets and find that some defects of different types look very similar. Fig. 14 shows three examples that are very similar to a defect of class 8, which will confuse the classifier to make a wrong prediction. Fig. 15 shows two test examples, each with three images of the original one, annotation and prediction.

G. THRESHOLD PARAMETER DETERMINATION
In our method, the threshold parameter T in Eqn. 1 needs to be tuned to optimize the output of the regression model. The threshold value is set to 5.4 for AlgleRN dataset by searching the best value by maximizing the mean IOU. Fig. 16 shows the result. Similarly, Fig. 17 is for dataset DAGM2007, in which the corresponding threshold value is 4.1.

H. TIME PERFORMANCE OF DEEP LEARNING MODELS
For real industrial defect detection, the time performance is very critical for production efficiency. In our method, the regression detection model is the most time consuming part as it has to predict a score for each pixel in the input image, and the classification model only needs to work on the VOLUME 8, 2020  defect areas. Table 8 shows the time in unit of millisecond with respect to batch size under different model configurations. For example, when the input is color images and the output resolution is 4 × 4, one image block (256 × 256) takes less than 1.5ms and when the output resolution is 2 × 2, an image block takes around 2.5ms. The test configuration is: Windows 10, x64, CPU i7-8700K@3.70GHz, Memory 24GB and GPU is GTX 1080Ti 11GB. With this, we can finish testing a smart phone cover glass within 2.5 seconds, where there are 7 images with size 8192 × 12800 for each glass.

V. CONCLUSION
In this paper, we propose a generic framework for industrial defect detection based on CNN, which consists of four major stages. The first one is a regression based detection model which can predict the defect severity information, in stead of using the model confidence score. The second one is pixel filtering module aiming to reduce the false positives at pixel level. The third one is connect component analysis to form defects from pixels. And the final module is a classification model which serves to classify the defect types and reduce the false positives again at defect level. We propose an algorithm to generate defect severity information from annotation data to train the regression model. The test results on three datasets have shown that our method can not only achieve high detection accuracy, but also high speed performance. In our future work, we will plan to further improve the network structure to achieve even more accurate defect detection, faster test speed, and higher prediction stability against input corruptions or variations.