Automatic Detection and Classification of Defective Areas on Metal Parts by Using Adaptive Fusion of Faster R-CNN and Shape From Shading

Computer vision and deep learning approaches have an important role in industrial inspection systems. Computer vision technology is essential for fast, defect-free control of products in the production line. The importance of the computer vision concept is recognized when the problems of the classical methods are taken into consideration. Metallic defect detection is a challenging problem as metal surfaces are easily affected by environmental factors such as lighting and light reflection. Since traditional detection algorithms are inefficient in complex problems, we propose a novel method to detect and classify metal surface defects, such as cracks, scratches, inclusion, etc. The type and location of defects were detected by the Faster Regional Convolutional Neural Network (Faster R-CNN), combined with the Shape From Shading (SFS) method, which can extract surface characteristics. The Northeastern University (NEU) surface defect database was used for defective samples. The proposed algorithm has also been tested on an unlabeled dataset (KolektorSDD2/KSDD2) to show labeling performance. The results on both labeled and unlabeled datasets have demonstrated state-of-the-art performance in automatic defect detection, classification, and labeling. The proposed method has satisfactory results for the detection of defects on the metal surface, and the mean average precision is 0.83. The average precision of crazing, pitted surface, patches, scratches, inclusion, and rolled-in scale are 0.98, 0.81, 0,90, 0.79, 0.88, and 0.62, respectively.


I. INTRODUCTION
Nowadays, the need for automatic defect detection systems has been increasing in the production phase. Identifying the defects of products and determining their location is a necessary quality control process. It is also crucial to quickly identify the defect type and defective area in terms of quality control performance.
In many industrial applications, defects commonly occur on textile, metallic, and glass surfaces. Since metal surfaces are easily affected by environmental factors such as lighting and reflection of light, metallic defect detection is The associate editor coordinating the review of this manuscript and approving it for publication was Mouloud Denai . a complex problem. Metal surfaces may have various defects such as crazing, patches, inclusion, scratches, pitted surface, and rolled-in scale. In real-time metallic defect detection systems, fastness and high accuracy positively affect the production phase.
Today, human inspectors are still used to detect defects as a traditional method in the manufacturing process. Computer vision techniques are frequently used in production systems for quality control to increase production speed, reduce error rates and prevent human errors such as fatigue and distraction. Morever, the labeling process carried out before the training is still done manually. Defect regions must be individually labeled when classifying with Faster R-CNN before the training process. Automating this process is vital in industrial control systems, as it is very time-consuming to label the dataset one by one. This study aims to automate the labeling process and save time by synthesizing and using SFS and Deep Learning methods. In order to achieve successful performance in object recognition and classification applications with Faster R-CNN, a large number of labeled data is needed in the training process. For example, the NEU and KolektorSDD2 datasets used in this study contain 1800 and 3335 images, respectively. While manually labeling the images takes a long time, automating it with SFS yielded considerably better outcomes in a lot less period.

A. CONTRIBUTION
In this study, a novel method is aimed at detecting various defect types that may occur on the surfaces of metal components. Furthermore, automatic labeling of defective areas of images taken from the NEU and KSDD2 datasets, identified by Shape From Shading (SFS) methods which can infer surface characteristics. Thus, manually labeling large datasets can be performed automatically, avoiding the timeconsuming workload.
Moreover, it was intended to avoid human errors such as fatigue and absenteeism during labeling the positions of defects. More precise labeling of the defect locations was provided. Images processed with SFS were classified by Faster R-CNN, which has been popular recently and has yielded successful results.
The article is organized as follows: Section 1 explains the motivation and contribution of our study, Section 2 mentions the related works, Section 3 describes the basic methodology, and Section 4 explains the experiments and summarizes the results. Finally, the conclusion and future works are discussed in Section 5.

II. RELATED WORKS
Various image processing methods have recently been used for surface defect detection. The autocorrelation method [2], the gray-level co-occurrence matrix [3], the morphological methods [4] and the histogram feature statistics [5] which are used to describe texture properties; Fourier, Gabor and wavelet feature method [6], [7] which finds the textural structure of the image; the fractal body model [8], backscattering model [9] and random field model [10] used to define texture patterns by modeling other features with specific models are image processing methods used for surface defect detection. These traditional methods cannot detect texture defects involving complex textures or any new defect type. Various machine learning techniques such as Bayesian network classifiers [11], Principal Component Analysis (PCA) [12], Support Vector Machines (SVM) [13], Random forest [14], and Self-Organizing Maps (SOM) [15] were also used to detect and classify surface defects.
Traditional image processing methods require multiple thresholds that target various imperfections in algorithms that are often very sensitive to background colors lighting conditions, and light reflection. Furthermore, these thresholds need to be adjusted again when a new problem occurs. For this reason, CNN-based algorithms are preferred, and more successful results are obtained thanks to the developments in artificial intelligence. In the last few years, CNN has been widely used for defect detection, which can directly learn some robust features from the labeled images of surface defects and achieve a high recognition rate. Lin et al. [16] proposed a multi-scale cascade CNN called MobileNet-v2dense to detect defects. Yi et al. [17] proposed a defect detection system for steel strip surfaces based on CNN. Saliency map and image segmentation were used to detect the VOLUME 10, 2022 defective area. Kim et al. [18] used Siamese Neural Network using CNN structure, which can detect defects with a small number of images. Zhou et al. [19] conducted a study on the classification of surface defects on hot-rolled steel sheets using CNN. The system's robustness was tested by adding Gaussian noise to the images with different SNR values. Soukup and Huber-Mork [20] performed a CNN classification using photometric stereo images for steel rail surface defect detection. The gaps in the dark areas on the rail surface were made visible by different colored light sources. Amin and Akhter [21] used two deep learning methods, including U-NET and Deep Residual U-NET, to detect defects on steel surfaces and divided the images into five different classes. Lv et al. [22] created a new dataset called GC10-DET containing ten defect types for metallic surface defect detection. They also proposed a defect detection system based on the Single Shot MultiBox Detector. They compared the proposed method with classical machine learning methods and deep learning methods using the NEU-DET and GC10-DET datasets. Li et al. [23] conducted a study of real-time steel strip surface defect detection using the YOLO network. They developed the YOLO network and made it all convolutional. The network they developed includes 27 convolution layers. Fu et al. [24] studied steel strip surface defect detection using transfer learning. They used the previously trained VGG16 to extract features and CNN for classification. They also performed accuracy analysis by adding Gaussian noise at different SNR levels to the images. Lee et al. [25] proposed an approach to detect steel defects using a CNN with class activation maps (CAMs). They expanded the CNN defect detection model to support a real-time visual process instead of a simple classification task.
In this study, automatic labeling of defective areas of images taken from the NEU and KSDD2 datasets identified by SFS and classified by Faster R-CNN, which has been popular in recent years and has given very successful results.

A. SHAPE FROM SHADING
3D reconstruction, which aims to generate depth information and corresponding surface shapes using the various clues given in the images, is an exciting subject in computer vision. One of the most important clues that provide accurate information about 3D shapes is the spatial pattern of light reflected from surfaces, known as shading [26], [27], [28].
Shape from shading (SFS) tries to figure out how the intensity variations observed on the surfaces of objects provide information about the local surface by using shading as a clue. SFS methods are classified differently in the literature based on the solution search methods [29], [30].
SFS methods are used in many applications, including creating surface topographies, biometric studies, the reconstruction of medical images, surface inspection, and defect detection [31]. When appropriate lighting conditions are  commonly used in quality control systems, SFS methods may function on a single image and produce successful results.
The scene's surface can be described in terms of functions as Z (x, y). The relationship between observed image intensity I (x, y) and surface slopes (p = dz/dx, q = dz/dy) can be expressed using image irradiance equivalence as follows.
As shown below, the relationship between normal and surface slopes can also be expressed in terms of surface gradients.
In this study, an algorithm that creates depth maps based on the defect type was developed using the Frankot & Chellappa [33] method, which is classified in minimization approaches. The algorithm assumes that the surface Z (x, y) can be represented as a linear combination of basis functions. A finite set of basis functions represents a potentially nonintegrable estimate of surface slopes, and the orthogonal projection onto a vector subspace spanning the set of integrable slopes is used to enforce integrability [33]. The nonintegrable gradient field is projected onto a set of integrable slopes using Fourier basis functions [34].
The basis function chosen significantly impacts the solution, and the discrete Fourier basis is widely used due to its computational efficiency [35]. In our equations, we use the discrete Fourier basis. The final output Z can be written as VOLUME 10, 2022

B. OBJECT DETECTION AND FASTER R-CNN
Object detection and object recognition, indispensable elements of digital image processing applications, have been studied for many years, and various techniques and methods have been developed. Viola-Jones was the first algorithm to detect objects in digital images effectively. Thanks to the developments in graphics processing units (GPU) and deep learning, methods that can detect and identify objects with greater accuracy have been developed in recent years. Object detection is essential, and this stage's success also affects the next steps. Object detection can generally be defined as the prominence of the object in the video images and the separation of the object to be processed from the background.
In object detection studies, first R-CNN and then Fast R-CNN structure emerged. The Faster R-CNN [40], the widely used version today, first appeared in 2015. In the R-CNN family, variations between versions are often related to computational efficiency, reduced test time, and performance improvement (mAP).
Object recognition networks typically consist of the following components: a) a region proposal algorithm for creating ''bounding boxes'' of the positions of possible objects in the image. Usually, the properties of these objects are obtained using a CNN, b) A layer of classification to predict which class this object belongs to, c) A regression layer is to be more precise in the coordinates of the bounding box.
CPU-based region suggestion techniques are used in both R-CNN and Fast R-CNN. The Faster R-CNN generates region recommendations using another convolutional network, which reduces the region suggestion time from 2 seconds per image to 10 milliseconds.
As can be seen from Figure 5, the Faster R-CNN architecture consists of RPN (Region Proposal Network) as the region suggestion algorithm and Fast R-CNN as the detector network.
In Faster R-CNN, the first CNN is applied, and a feature map is created. In the region suggestions section, instead of selective region search, regions are selected by creating a separate region suggestion network.
The Faster R-CNN model uses a ''Region Proposal Network'' network while creating the recommended regions. This algorithm makes a ''sliding window'' and hovers over the feature map created in the convolution layer. And it assumes an object in each zone and assigns neutral scores to the zones. This process is done by looking at criteria such as adjacent pixels, color, and density. Then, a new feature map is created using the ''ReLU'' activation function. The rest are almost identical to Fast R-CNN

C. PROPOSED METHODOLOGY
The developed method aims to combine the powerful capabilities of SFS and Faster R-CNN methods. In addition, it aims to automate the training stages by automatically labeling the defect regions. Initially, depth maps obtained using the SFS method and image processing techniques are used to detect the defective areas on metal surface images. Using the Faster R-CNN method, a defect detection model is trained for newly  labeled data to classify and detect defective areas automatically. The developed framework is presented in Fig. 1.
Training images are first presented as input to the SFS method. Then, the depth maps (''Evaluating depth maps'' in Moreover, a comparison strategy is executed after the XML files are generated for each image. The comparison step has been added to examine the defects detected outside and inside of the ground truth. The original ground truth boundary boxes offered by the datasets and the boundary boxes produced using SFS are compared, and the model accuracy is analyzed. The obtained new defect labels (''Auto and more precise labeling'' in Fig 1.) and input images are processed with Faster R-CNN and used in the training steps. A new model was created for detecting defects and the results were discussed in the experimental results section.

A. DATASETS
The NEU dataset contains six different types of defects: rolled-in scale (Rs), inclusion (In), patches (P), pitted surface (Ps), crazing (Cr), and scratches (Sc). Collected defects are on the surface of the hot-rolled steel strip. The dataset has 1800 grayscale images and contains 300 samples in each surface defect class. As seen in Figure 2, the NEU dataset [1] has six different types of surface defects, with each image having a 200 by 200 pixels resolution.
This study detected defects on the metal using SFS and Faster R-CNN. SFS algorithm, which can infer surface characteristics, was used to determine the location of defects on metal surface images from the NEU dataset. As seen in Figure 4, more sensitive labeling can be achieved by using SFS. Faster R-CNN was used to classify the images processed with SFS, which has become popular recently years and has produced very successful results.  Detecting defects on metal parts is a complex problem as metal surfaces are easily influenced by environmental factors such as lighting and light reflection. In our study, for the labeling process of 1800 images, instead of manually marking, the defective areas were labeled using the SFS method. The dataset consisting of labels has been given as an introduction to the Faster R-CNN model. 80% of the NEU Surface dataset was used in training, while 20% was used in testing. The mean average precision of our study is 0.83 using 1440 data for the training set and 360 data for the test set.
The SFS algorithm has also been tested on the Kolektor Surface Defect Dataset (KolektorSDD2/KSDD2) to test SFS's labeling performance. KSDD2 consists of the different types of unlabeled and unclassified defects (scratches, minor spots, surface imperfections, etc.) that occur on the metal surface. It has 356 images with defects and 2979 images without defects. Each image size is 230 x 630 pixels. Dataset was divided as the train set with 246 (defect) and 2085 (without defect) images, the test set with 110 (defect) and 894 (without defect). In this dataset, only mask images are given as ground truth. The results are shown in Figure 3. As seen in the last column in Figure 3, the defective areas on the input images are effectively detected and labeled in detail. In addition, since the degree of sensitivity can be adjusted parametrically, SFS can obtain more sensitive results.

B. EXPERIMENTAL RESULTS
The Faster R-CNN model, an approach based on CNN architecture, was used in our study. The application was run on a computer with Intel i7 4700HQ processor and NVIDIA R GeForce R GTX 850M graphics card. Training of the NEU dataset lasted 12 hours on 1440 labeled defective images.
As a result of our training process, the loss graph was generated. At step 112000, the overall loss decreases to 0.4, as shown in Figure 6(a). The loss is greater than 1.2 in the early stages of training. The loss rapidly decreases as the training progresses, as seen in the graph. Loss graph for Bounding Box Classifier, as shown in Figure 6(b). At step 112000 epoch, the classification loss approaches 0.1. Figure 7 shows some image samples as an example for the test. Labels indicate the regions identified as defects and the type of defects (crazing, inclusion, patches, etc.). The numbers on the labels show the probability of defect type in the labeled area as a percentage. This study detected defective areas on metal surfaces using Faster R-CNN with SFS.
The ratio of overlap between the predict box and the ground truth box is represented by the intersection-over-union (IoU) ratio. When a predicted box's IoU reaches its maximum value, we assign it a positive label; when it falls below 0.5, we assign it a negative label; and the remaining regions are disregarded.
Precision (P) measures the accuracy of the model prediction, and Recall (R) measures the ability of the model detection for positives. Precision and recall values are calculated using the following formulas.
where FP, TP, and FN represent false positive, true positive, and false negative, respectively. AP and mAP: Mean Average Precision (mAP) is a metric for evaluating object detection models. It is the mean of the Average Precision (AP). Table 1 shows the comparison table  with the existing studies of the NEU dataset, and the results of Faster R-CNN and YOLOv5 algorithms used as hybrids with SFS are also shown. YOLO and Faster R-CNN have certain similarities. They both use boundary regression and use network structures based on anchor boxes. The way that YOLO performs classification and bounding box regression simultaneously sets it apart from Faster R-CNN. However, YOLO does have a drawback with object detection. Since only two anchor boxes in a grid can accurately anticipate one class of object, YOLO has trouble detecting objects that are small and close to one another. As seen in Table 1, the proposed method (SFS-Faster R-CNN ) gives the best mAP result.
According to Table 1, crazing has the highest average accuracy, while pitted surfaces are second, with values that can reach 0.96 and 0.93, respectively. The overall mean average accuracy is 0.83, and the average accuracy for the ''rolled-in scale'' is the lowest at 0.71. Figure 7 shows defects such as inclusions, patches, pitted surfaces, rolled-in scale, and scratches that can be accurately detected and located. In addition, applying the SFS-Faster R-CNN algorithm to the Kolektor dataset yielded 0.83 mAP, whereas SFS-YOLO v5 yielded 0.82 mAP.

V. CONCLUSION AND FUTURE WORKS
Detecting metallic defects is challenging as metal surfaces are easily affected by environmental factors such as lighting and light reflection. In this study, a novel defect detection method was developed and applied on metal surface images. This recognition system performs admirably in terms of detecting defect regions and recognizing defect categories.
SFS method was used to identify the defect regions more clearly and strengthen the training phase, and Faster R-CNN was used to determine the type and location of the defect. For defective samples, the NEU Surface database was used. 80% of the 1800 images dataset was used in training, while 20% was used in testing. It was also tested on the KSDD2 dataset in order to show labeling performance. As a result of the adaptive integration of SFS and the Faster R-CNN method, 0.83 % mAP was obtained. The performance of the developed method was compared with other methods in the literature.
The SFS method generally processes a single image with a single illumination system. Future studies may focus on systems with multiple illuminations and input images, such as photometric stereo. Using this method, defect detection studies can be performed for different surface materials (textile, glass, etc.). In addition, the proposed trained model can be accessed via a web interface, enabling online defect detection for users. The Mask R-CNN model can also be applied to extend Faster R-CNN to label the category to which each pixel in the image belongs. MUHAMMED KOTAN received the M.S. and Ph.D. degrees in computer and information engineering from Sakarya University, Türkiye, in 2014 and 2020, respectively. He is currently working with the Information Systems Engineering Department, Sakarya University. His current research interests include image processing, computer vision, and artificial intelligence.