Classification of Spot-welded Joints in Laser Thermography Data using Convolutional Neural Networks

Spot welding is a crucial process step in various industries. However, classification of spot welding quality is still a tedious process due to the complexity and sensitivity of the test material, which drain conventional approaches to its limits. In this paper, we propose an approach for quality inspection of spot weldings using images from laser thermography data.We propose data preparation approaches based on the underlying physics of spot welded joints, heated with pulsed laser thermography by analyzing the intensity over time and derive dedicated data filters to generate training datasets. Subsequently, we utilize convolutional neural networks to classify weld quality and compare the performance of different models against each other. We achieve competitive results in terms of classifying the different welding quality classes compared to traditional approaches, reaching an accuracy of more than 95 percent. Finally, we explore the effect of different augmentation methods.


I. INTRODUCTION
Spot welding plays a major role in joining technologies, especially in the automotive industry. Traditional methods to assure the quality of spot welded joints include random and periodic destructive tests like torsion testing or manual destructive testing, where the specimen has to be cut in half to be investigated. These methods are tedious and destroy the sample. Non-destructive testing methods (NDT) reduce the costs of quality assurance and imply an optimization of the method of spot welding, since every joint could be checked, and therefore the number of spot welded joints could be reduced. Among popular NDT methods for quality inspection of welded material are ultrasonic testing, X-Ray tomography [1], acoustic emission testing and laser thermography. X-Ray has been considered as reliable approach to assess the welding quality. Kar et al. [2] used X-Ray tomography to study the porosity of welded joints and asses the quality. Patil et al. [3] investigated weld defects using X-Ray radiography and found that the X-Ray method could reveal more defects compared to a visual inspection. While X-Ray approaches are a commonly used NDT method, the necessary radiation protection is a major limitation, thus it cannot be easily applied for in-situ inspection. In addition, X-Ray computer tomography is expensive compared to other NDT methods such as ultrasound or thermography. Furthermore, the wave's penetration degree is limited, especially with multi-layered material thus could not be applied to detect small defects as observed by Duchene et al. [4]. As an alternative, ultrasonic approaches are being increasingly considered. Yu et al. [5] proposed an approach which employed high order ultrasonic waves to detect damages in welded joints and thus, could enhance the detection sensitivity to detect small weld flaws. Tabatabaeipour et al. [6] proposed an immersion ultrasonic testing method by observing the backscattered energy C-Scan images. Papanikolaou et al. [7] used ultrasonic testing as NDT method to inspect various parameters such as the chemical compositions or mechanical properties of the specimen to determine the weariness of specimen. The researchers conclude enhanced results using ultrasound testing, compared to visual testing and liquid penetration testing. Acoustic approaches on the other hand, utilizes ultrasonic waves at a much higher frequency and have been employed by a variety of work. Shrama et al. [8] applied acoustic emission to inspect welded joints for damages. They conducted a variety of tests and conclude an enhancement in understanding of damage mechanism for early maintenance. Kubit et al. [9] utilized acoustic microscopy to evaluate the joint quality. Despite its increased sensitivity, the setup and operation is very complex. Active thermography, on the other hand, emanates in recent times as a method, which allows contactless, fast and reliable testing, at cheaper operation costs than e.g. computer tomography. The feasibility of spot weld inspection based on thermography was theoretically examined in [10]. In [11], the researchers could already show that thermography is a robust alternative and can be calibrated using X-Ray methods. A non-destructive testing approach based on laser thermography was proposed by Jonietz et al. [12], where the researchers could detect important metrics of quality like the welding diameter by applying active thermography in transmission and reflection. However, the quality of the spot welded joints could not be assessed in detail. Convolutional neural networks (CNNs) have achieved remarkable results in computer vision for tasks such as anomaly detection and classification, thus gaining immense popularity in NDT research in recent years. Cruz et al. [13] used CNNs to detect defects in ultrasound testing. Works by [14], [15] and [16] use CNNs to detect welding defects within X-Ray images and show performance enhancements. For instance, Wang et al. [15] used a RetinaNet-based CNN architecture to detect and classify three different types of defects inside X-Ray images. Zhang et al. [17] presented a weld defect detection on X-Ray images based on CNNs. The researchers achieve satisfying results in detecting features relevant for quality assessment. However, since the X-Ray approach is based on the transmission of radiation through the spot-welded joint, it is only possible with access from both sides. Janssens et al. [18] explored the usage of a deep neural network on infrared thermal images to monitor machine health by detecting fault conditions from moving machine components. The researchers conclude a significant performance boost when applying CNNs and that relevant regions could be identified and visualized to detect potential failures. Nasiri et al. [19] used CNNs to detect six conditions in thermal images of cooling tubes. Similar to our work, Yang et al. [20] used a Faster-RCNN-based architecture to visualize defects inside metal plates inducted with heat. They analyse the heat distribution and propose an improved Faster-RCNN architecture to visualize and detect the cracks. Dung et al. [21] explored the effect of CNNs on welded joints on gusset plates and conclude its feasibility when using transfer learning and data augmentation. In this paper, we will utilize both methods as well but will specify data preparation methods specifically for thermography data. Therefore, we consider the underlying physics such as the heat distribution and the temporal component of thermal images, which provide more information about the specimen. In addition, our data acquisition approach is contactless and, very importantly, requires access to the weld from one side only, making it proficient for in-situ quality inspection. For improved feature visualization, we apply preprocessing steps presented in [12].
The main contributions of this work are the following: • Proposal of CNN-based welding quality assessment method to classify welding quality from thermal images that are not distinguishable by human vision inspection. • Proposal of methods to generate a feasible training dataset from thermal images by analyzing the underlying physics and generating filters accordingly. • Evaluation of different data augmentation methods and their effect on thermal datasets. • Performance evaluation of three State-of-the-Art neural network architectures. The paper is structured as follows. Sec. II begins with the theoretical foundations utilized in our approach. The methodology including the overall concept and the implementation of each module, is presented in Sec III. Sec. IV presents the results and discussion. Finally, Sec. V will give a conclusion and outlook.

II. THEORETICAL FOUNDATIONS
The data analyzed in this contribution has been acquired using laser thermography. The theoretical backgrounds are presented in this chapter.
A. Description of the thermal radiation components and emissivity corrections Fig. 1 illustrates our setup for a theoretical understanding of the IR radiant flux used in our experiments. The radiant flux Φ (SI unit: Watt) is a common quantity to describe the intensity level of the IR radiation. Fig. 1 shows the IR Fig. 1: Sketch for theoretical understanding of IR radiant flux in our experiments. The IR camera receives different IR radiation components (red arrows), whereas the direct component from the specimen is given as a blue arrow. The schematic of the cross section through a welding joint is given on top in gray color. radiation components as detected by the IR camera in the measurement environment: Direct radiation from the ambient environment (Φ ambient ), environmental radiation reflected from the surface of our investigated specimen (Φ reflect ), as well as radiation from the measurement path between specimen and our measurement device (Φ path ), which is caused by the atmospheric absorbers (e.g., air, humidity, CO 2 ). All these disturbing quantities (summed up in the following as Φ env ) are detrimental for our measurement since we are interested in measuring the radiant flux of the specimen Φ specimen . In addition, we do not know the exact emissivity ε of our specimen that indicates how much radiation it emits compared to an ideal heat radiator, i.e. a black body (BB). These conditions, lead to the following total radiant flux Φ tot during our measurements for every pixel: The emissivity ε is a unit-less scalar with ε ∈ [0, 1]. According to Stefan-Boltzmann law, the radiant flux Φ depends on the temperature (Φ ∝ T 4 ). During a thermographic measurement we can rewrite Φ BB to Φ BB T (t) and before the measurement, we can write Φ BB T 0 with T 0 standing for the room temperature and T (t) considering the temporal heating given by laser illumination (t > 0). Assuming constant environmental conditions and temperature-independent optical quantities of the specimen, Φ env T 0 ≈ Φ env T (t) and ε remain the same during the experiment. The environmental disturbances Φ env could be therefore removed if we consider the radiant flux difference Φ tot T (t) − Φ tot T 0 . Further, the unknown emissivity ε can be removed by considering a normalized radiant flux difference(Φ tot where t norm refers to another time after the sample is cooled down to a temperature T (t norm ) > T 0 . For more detailed explanations we refer to [12]. In this contribution we utilized this method to generate a noisefree dataset without any uncertainties due to the emissivity.
Please note that in this approach we have to calculate with the temperature dependent radiant flux (as measured with the IR camera) and not with the temperature (calculated inside the IR camera based on a previous calibration) itself. Moreover, using Stefan-Boltzmann law is an approximation, since the IR camera is sensitive in a restricted spectral range only.

B. Data description
In our experiment, we perform pulsed thermography using a rectangular shaped homogeneous laser illumination over the whole area of interest. Therefore, we can calculate the 2D solution (referring to the two spatial dimensions x and y, see Fig. 1) for the homogeneous heat diffusion equation for a 2D heating source in reflection configuration (z = 0) by [22]: whereby Q describes the absorbed radiation energy from the laser, A the illuminated area, t the time, ρ the material density, c p the specific heat capacity of the material, α the diffusivity, R the thermal reflectivity (material to air), n the number of reflections of the so-called thermal wave and L the thickness of the plate. The given temperature evolution refers to an ideal sheet infinitely extended in the plane and an infinitely short heating impulse. For the actual specimen and experimental setup, we can only get a first impression on how the temperature evolves and concentrate on the transient signal contrast in the dataset caused by the geometry of the specimen. Fig. 1 shows that the value for L differs since the specimen consists of two steel sheets welded together by a spot-welded joint. This means that, according to eq. (2), the solution for the heat diffusion equation in the area of the spotwelded joint works with L = L 2 whereas the region outside the spot-welded joint works with L = L 1 . Fig. 2 (b) shows also the main difference of the heat flow visually (red -high temperature, blue -low temperature). Since we are measuring in reflection configuration (IR camera and laser on the same side of the specimen), we observe a hot rim outside the spotwelded joint region since the heat is accumulated. On the other hand, we observe a cold spot in the middle since the heat diffuses through the spot-welded joint towards the other steel sheet. Therefore, a good connection should serve for an evident contrast between the region inside and outside the spot-welded joint.
In the following, we are working with intensities I n,p ∈ R N t , where N t designates the number of time stamps which is equal to the number of measured thermal images (thermograms) in a thermal film sequence. These intensities refer to the radiant flux of the thermal radiation as measured by the InSb-based detector of the IR camera and converted to digits using an analog-to-digital converter. Thus, in this work, the intensity values in a thermal image are given by digits pixelwise representing the measured radiant flux in x (n ∈ {1, . . . , N x }) and y (p ∈ {1, . . . , N y }).

III. METHODOLOGY
After describing the underlying physics and theoretical foundations of our data, in this section, we will present the methods that we used for our proposed quality assessment use case.

A. Experimental setup and data acquisition
Our dataset was collected from specimens that were made using an electric welding system, see Fig. 2. These specimens consist of two resistance spot-welded hot-dip galvanized micro alloyed steel sheets HX340LAD [23] (zinc layer is approximately 7.5 µm on each side), respectively, which are typically used in automotive industry and have a thickness of 1 mm. The resistance spot-welding has been performed using a welding current of 7.5 kA, a pressure of 3.5 kN, and a welding time of 240 ms using an electrical spot welding machine. According to the procedure for the determination of the electrode life [24], more than 1600 spot weldings have been performed. After approximately 1000 welds, the electrode life has been reached and started to produce unreliable spot-welded joints. We tested 115 welds using thermography starting from weld no. 1510. As reference, we applied destructive chisel testing according to Ref. [25]. The setup for data acquisition is illustrated in Fig. 2 (a). We used active laser thermography for all tested specimens and captured 250 frames over time, which results in a film for every test object visualizing the spatial heat distribution for each time step. The laser radiation was switched on for a duration of one second at 500 W, illuminating a square-shaped area of 19 × 19 mm 2 . The thermal images were measured with an IR camera (InSb detector, sensitive between 3.7 − 5.3 µm, frame rate: 40 Hz, spatial resolution varied between 62.5 and 133 µm/pixel). The utilized fiber-coupled laser emits in the near infrared range (940 nm) and is therefore not interfering with the detector range of the IR camera. The laser heats up the specimen with a spot-welded joint. As can be observed in Fig. 2, the challenge of our thermal dataset is the similarity of the raw infrared data for different quality classes, which is not distinguishable by human visual inspection. For instance, it is hard to classify between image 1612 and 1587 or 1533 and 1548, despite their different classes. The features specifying each class are not evident, which causes common feature extractors like CNNs to struggle with. One that account, we explore ways to generate feasible datasets out of utilizing the underlying physics of the laser thermography process described in the previous chapter.

B. Data filtering
One of the aspects of this work is to explore how to process the normalized intensity data described in the previous section (see section II-B) to provide reliable predictions using CNNs. Therefore, we study different filters and their effect on the performance of the CNN. We only extract certain images defining a filtered set S i filt ⊂ S = {1, . . . , N t } with the cardinality |S i filt | = N filt < N t . Φ can be replaced by I n,p , referring to eq. (1), and further it can be described by a 1D array with I n,p ∈ R N t and I =   I 1,1 . . . I 1,N y . . . . . . . . .
we can describe the filtered data by: whereby F i represents a subset of the whole measured dataset with the specified intensity values defined in Table I so that F i denotes a filtered dataset. I norm n,p stands for the normalized intensity difference as similarly described for the radiant flux in section II-A. Thus, the filters are defined based on intensity values of the films. These filters can lead to positive effects as we are investigating a dynamic temperature behaviour over time. Extracting only frames with significant changes in their amplitude, e.g. while heating or beginning of cooling phase, allows for more evident features within the datasets. The intensity is calculated by using the average value of all pixels in the image. Fig. 4 (upper right corner) illustrates the intensity and gradient values referring to the temperature-time diagram as well as marked areas of filters and resulting datasets. For the generation of our final results, we use a combination of different filtered sets which yields (F i , F j , . . . , F n ) whereby i, j, n ∈ {1, . . . , 12} and F i , F j , F n designate different filtered datasets according to Table I. In total, we define 12 different filtered datasets for the whole film each representing a different status of heating to investigate the effects of certain areas of the intensity curve on the performance of the CNN. The image counts of each dataset before and after augmentation are listed in Table I as (before || after). The applied augmentation methods are described in the next section.

C. Data augmentation
It is well-known in data science that data augmentation techniques such as scaling, rotation and flipping yields a better data basis for the application of CNNs. We first filter the data to obtain a set F ∈ R N filt and then augment F yielding a new set F aug ∈ R N aug : where M 1 , . . . , M k : R N filt → R N aug,k are coordinate transformations and C 1 , . . . , C l : R N filt → R N aug,l represent color transformations which change the intensity values of a pixel within a film. More specifically, in this work we use k = l = 3 by employing following data augmentation techniques:

D. Data Labeling
Three classes are to be considered for classification: good, bad and medium. Fig. 3 (a) describes the labeling benchmark on which the data labeling is based. This benchmark was created with destructive testing using the standardized chisel testing [25] by destroying the specimen and inspecting the welding quality visually by a human expert (s. Fig. 3 (b)). As a result, each image in our dataset contains a label stating whether it has good (standard spot weld diameter), bad (stick weld, i.e. no or only minimum actual spot weld) or medium (undersized weld nugget leading to a weak mechanical joint) welding quality.

E. Neural Network Design
The data engineering steps previously discussed enable us to generate a feasible dataset with evident features for a convolutional neural network to robustly assess the welding quality. An important aspect of our data is that the frames starting approximately from frame 100 (after the cooling down phase) immediately become similar to each other and are not distinguishable. Since the areas of interest only contain 7-12 frames, a long short-term memory (LSTM) based approach which considers the temporal dependency would not deliver the desired results. The incorporation of recurrent neural networks was not considered because of the dominance of similar-looking frames which compromised nearly 80 percent of the film. Furthermore, the dynamics of our dataset is too low with only small changes visible between the frames. However, our observations also find that especially for the relevant areas like the maximum intensity area, the features will get evident for each class. Based on these considerations, a Faster-RCNN based 2D convolutional network is employed which will analyze one dedicated frame of the film, to make the prediction. Our architecture is based on the original Faster-RCNN [26] with modified input to match our thermal data and a ResNet101 as a backbone network. The architecture is visualized in Fig. 5. The input image has three channels and is of size 131 × 146 × 3. As a backbone network, the ResNet-101 is employed. After passing the backbone network, feature maps are generated, which are passed through the region proposal network. Subsequently, for each region proposal, a bounding box regressor and a softmax classifier is applied to detect and locate the defects. The architecture is illustrated in Fig. 5.

A. Filters Evaluation
We trained the CNN with different datasets generated by applying the filters introduced in section III. Furthermore, the positional data augmentation techniques M 1 , M 2 , M 3 defined in Sec. III were applied: The images were horizontally and vertically flipped and rotated with a random value between -90 and 90 degrees. Since the heat diffusion is pointsymmetric, these positional changes will not affect the original information of the frame. Fig. 6 illustrates the accuracies for the different datasets, each representing an intensity area. We used the mean Average Precision (mAP) as evaluation metric, which indicates the classification probability of a correct result for a bounding box overlap of 50 percent to the groundtruth label (intersection over union = 0.5). The average of the accuracies for all three classes were calculated. The highest accuracy is observed Fig. 6: Accuracies of different intensity areas. The red curve is the average intensity curve of all films on which the filters are defined. The bars represent the according models' accuracies. Depending on the test dataset, the accuracy vary due to the more evident features of specific areas.
when using frames at large intensity values to train. However, using the same frames within smaller chunks of data, results in a significantly decreased performance. On that account, the effect of a combined dataset is explored by combining multiple filters as well as using the whole dataset for training. An evident accuracy boost can be observed while using the filtered dataset F 10 with images from frames of maximum intensity. However, it is noticeable that the smaller datasets from within the same area of intensity (F 2 , F 3 , F 4 ) result in significantly worse accuracies compared to the combined dataset (F 10 ). This observation is also evident when combining the datasets of frames 100 to 250, when the specimen is in its cool down stage. The results, albeit being already bad with only 30-40 percent accuracy, gain a small boost to 42 percent accuracy when being combined. However, since the specimen state at the end of a film is already cooled down completely, the visual differences between frames perish. Thus, the results are in line with our theoretical statements from chapter II. Therefore, they should not be considered when training the CNN as the similarity of the training data affects the performance of the CNN in a negative way. Overall, we could improve accuracy by 6 percent when specifying filters which consider frames from the maximum intensity area of the film (F 10 compared to F 11 ). Interestingly, using the whole film does not decrease the accuracy significantly. As expected, areas at the end of the film will result in imprecise results with an accuracy of 40 percent. Fig. 7 showcases the detections of the two best achieving models resulting from dataset F 10 and F 11 . While most test films could be classified correctly, there are some cases, in which the F 11 model gives a wrong prediction while the F 10 model could classify it correctly.

B. Data Augmentation Evaluation
To evaluate the impact of data evaluation methods, we applied different positional as well as color augmentations as defined in Sec. III. The results are depicted in Fig. 8. For the color augmentation, the brightness and contrast saturation and PCA-Color was changed with a random value each. The accuracy could be improved when using the positional augmentations compared to the dataset without augmentation techniques applied. This is more evident in the datasets F 10 and F 11 . Since the heat diffusion is pointsymmetric, positional changes like rotation or flipping will not falsify the information inside the images. As expected, color augmentations affect the accuracy in a negative way for the datasets F 10 and F 11 . Notably, the effect was not as evident as assumed. It is most evident in the area of maximum intensity, where the color augmentation decreased the accuracy. The area at the beginning of the cooling phase experiences a performance increase even when using color augmentation. This indicates a potential boost when using color augmentation due to the similar intensity values at later stages of the cooling down stage. The observed decrease in accuracy at stages where the intensity value is high, is due to the more evident spatial differences between frames, which a color augmentation would only disturb.

C. Comparison with other CNN architectures
The dataset generated when applying F 10 has resulted in robust performance. Based on this, we evaluated two additional network architectures, namely Retina Net and Cascade-RCNN. Furthermore, we evaluated the classfiication accuracies for the three different classes 'good', 'medium' and 'bad'. Retina contains an additional focal loss function [27] while Cascade-RCNN employs an additional network as cascade layer [28]. Table II lists relevant metrics of our training for all different approaches. Fig. 9 to 11 illustrate the predictions. The error rate metric indicates how many of the predictions were correct and wrong, respectively, with over 90 percent precision. For the average precision metric, we averaged the values of all correct predictions for the different classes. Each bounding box gives a likelihood of the class being predicted, e.g. a value of 0.94 denotes that the probability of the class is 94 percent. Faster-RCNN and Cascade-RCNN achieve the highest average precision. Especially for the classes 'good' and 'bad', over 95 and 94 percent are achieved, respectively. The accuracy and error rates are indicating a stable and reliable prediction for all models. Cascade-RCNN is achieving the best results. A 97 percent accuracy for the class 'good' and 93 percent for the class 'bad' is achieved. As expected, the performance is worse for the prediction of the class 'medium'. Hence, Faster-RCNN achieves a 63 percent accuracy, while Cascade-RCNN achieves an accuracy of 65 percent. The flawed accuracy is due to several reasons: films for the class medium were represented the least with only 17 percent. Thus, the imbalance between the different classes bad and good compared to medium, leads to a poor performance. Furthermore, the class 'medium' is generally hard to visually distinguish from the classes 'good' and 'bad'. Remarkably, RetinaNet perform worst in all metrics. This could be attributed to the fact that generally, the welding spots are hard to detect and classify because, even with preprocessing, features are blurry due to the unordered heat distribution throughout the whole image. This makes it hard for classifiers to spot relevant regions. This is enhanced by the fact, that one-stage-detectors rely on one step rather than incorporating an additional region proposal network. Furthermore, large objects are known to cause difficulties for one stage detectors. In our case, the object compromises almost 80 percent of the whole image, which might be another reason why RetinaNet performed worse. V. CONCLUSION Classifying the quality of spot weldings is a tedious process in industries due to the lack of reliable and robust, non- destructive inspection methods. Common approaches analyze weldings using hand engineered features. Neural networks bear the potential to automate the process and learn relevant features to assess the quality. In this work, we have explored the effect of thermal dataset preparation to generate feasible training datasets for CNNs. Therefore, we take underlying theoretical physical foundations into account and analyzed the intensity value of spot welded joints after pulsed laser thermography. Based on these observations, we proposed data filters and explored their effect on the performance of the CNN. Overall, we could achieve an accuracy of 95 percent in classifying the quality of welds, which motivates not to apply destructive testing methods. Our approach utilizes data generated with laser thermography, which is a cheaper alternative and can be easily applied in-situ, contrary to X-Ray approaches. Additionally, it can be applied for a non-contact inspection, opposing to conventional ultrasonic approaches. We demonstrated an enhancement by 6 percent when applying our defined data filters, which are based on the maximum intensity area of the film. An important aspect is that smaller data chunks are not sufficient, even with data augmentation, to deliver robust results, and a dataset covering multiple frames is always to be preferred. We also demonstrate the efficiency of different augmentation methods on different areas along the intensity curve. Color augmentation is especially useful for the cooling stage, when the data is similar, while positional augmentation like rotation and flipping can boost accuracy at the earlier stages. Further steps include the modification and optimization of the used neural network models with physicsbased optimizers to detect more complex anomalies especially for thermal images. Additionally, we aspire to employ the detection in the frequency domain, which potentially could deliver enhanced results in terms of computational performance and accuracy.