Laser Welding Quality Monitoring via Graph Support Vector Machine With Data Adaptive Kernel

Laser welding is a rapidly developing technology that is of utmost importance in a number of industrial processes. The physics of the process has been investigated over the past 50 years and is mostly well understood. Nevertheless, online laser-quality monitoring remains an open issue until today due to its dynamic complexity. This paper is a supplement to existing approaches in the field of in situ and real-time laser-quality monitoring that presents a novel combination of state-of-the-art sensors and machine learning for data processing. The investigations were carried out using laser welding of titanium workpieces. The quality was estimated a posteriori by the visual inspection of cross-sections of the welded joints. Four quality categories were defined to cover the two main laser welding regimes: conduction and keyhole. The signals from the laser back reflection and optical and acoustic emissions were recorded during the laser welding process and were decomposed with the $M$ -band wavelets. The relative energies of narrow frequency bands were taken as descriptive features. The correlation of the extracted features with the laser welding quality was carried out using the Laplacian graph support vector machine classifier. Also, an adaptive kernel for the classifier was developed to improve the analysis of the distributions of the complex features and was constructed from Gaussian mixtures. The presented laser welding setup and the developed adaptive kernel algorithm were able to classify the quality for every $2~\mu \text{m}$ of the welded joint with an accuracy ranged between 85.9% and 99.9%. Finally, the results of the developed adaptive kernel were compared with state-of-the-art machine learning methods.


I. INTRODUCTION
Laser welding is a long-standing competitor to the traditional arc welding process due to a deeper welding penetration depth that improves the mechanical properties of the welded joints [1], [2]. This method has several additional advantages, such as lower power consumption, narrower heat affected zone (HAZ) and higher welding velocity [2]. Unfortunately, the laser welding is not exempt from defects, in particular pores and cracks. Both types of defects may take place during the two major regimes of laser welding; conduction and keyhole [2]- [4]. The latter regime is characterized by the presence of a narrow capillary of overheated vapour in the liquid phase. The benefits of keyhole welding are in a better The associate editor coordinating the review of this manuscript and approving it for publication was Chao Tong. material fusion between the workpieces [2]- [4], although, it is more prone to defects due to the high instability of the vapour channel at high penetration depth [5], [6]. Hence, an in situ and real-time method for quality monitoring is of utmost value [7].
At present, no reliable methods exist to detect pores and cracks in situ and in real-time. In practical applications, the quality control is performed post mortem. This is usually based on the analysis of the porosity, spatter, undercut and root sagging in cross-sections of the welded joints [2], [4], [7], [8]. Obviously, such method is destructive and time consuming and so it is not suitable for industrial mass production. Non-destructive alternatives also include post mortem analysis using active methods and the main ones are: X-ray tomography [9], ultrasonic testing and magnetic induced analysis (MIA) [10]. The use of X-ray tomography is strongly limited due to the large scale and expensive installations. In addition, the computational complexity of the data processing is not adapted yet to operate in realtime. Ultrasonic testing is an active method, which detects the response of a workpiece to the injected ultrasonic waves [10]. The drawback of this method is in the spatial resolution which is dependent on the wavelength, thus missing defects below a certain size. Additionally, it is sensitive to defects orientation. The adaption of this method to in situ monitoring is technically restricted due to the presence of intense white noises, produced by the process zone. MIA is based on the measurements of the local magnetic inductive reactance of the workpiece [10]. Currently, several technical problems exist, which do not allow adapting this approach to in situ monitoring. They are: low penetration depth, high sensitivity to the defects orientation, weak response from the melted phases, which are present in real life processes.
The challenge in the development of a reliable and cost effective in situ and real-time quality monitoring system is the intricate dependency between the process parameters of the laser welding process, the material and the actual quality. This has not yet been clarified [7], [11]. Nevertheless, a number of novel approaches for passive quality monitoring are seen in literature. Those aim to bypass the disadvantages of active methods, exploiting different physical parameters and are briefly summarized.
Optical emission (OE) from the process zone during laser welding is detectable in the visible and infrared (IR) spectral ranges. In both ranges, the intensity is mainly a characteristic of the heating dynamics [7], [12]. A correlation between the OE signals and the temperature has been under intensive research in the last decades [1]- [5], [8], [11], resulting in the development of a number of heat propagation models [3], [5], [13]. In real life, the applicability of such online modeling has practical limits. The reason is that direct heat measurements are accessible only from the surface of the workpiece [12], while the heat flow in depth is based on multiple model assumptions. The non-uniformities of the material and the instabilities in the local laser-matter interactions bring to the deviations between the real dynamics and estimates, thus, resulting in models inaccuracies. Additional spectral components in OE are present due to the relaxation of the excited atoms, emitted from the process zone and are observed as distinct lines in the OE spectra. These characteristics can act as an indicator to the weld penetration depth [5], [7], [14], although no correlations with the laser quality exist, in particular for porosity and sub-surface cracks.
Back reflection (BR) of an incident beam irradiation, such as laser, is distorted by three main physical phenomena [3], [7], [15]. First, back (Rayleigh) scattering takes place within a keyhole [3]. Next, Fresnel reflections take place at the interface keyhole/melt-air [3]. Finally, the reflection of the light from the melt pool is distorted due to the presence of surface waves with wavelengths in the micron range [16]. The exploitation of the BR together with the AE for quality monitoring was reported by Nakamura and Ito [17], where the dependence of both parameters from penetration depth was revealed. Some other approaches can be found in the literature and the reader can be referred to the following comprehensive reviews [3], [7], [15]. Despite some positive results in specific applications when using only BR, the existing methods lack of precision when operating in real life conditions. This is mainly due to the signal processing methods applied. Furthermore, BR has one additional natural limit since the signal detected comes only from the surface due to distortion of the melt pool and so cannot fully reveal the undersurface dynamics.
The OE and BR signals are recorded using photodiodes with selective spectral sensitivity. The advantages of those are in simplicity, low cost and wide commercial accessibility of many models [3], [7], [14], [15].
In recent years, passive acoustic emission (AE) methods have been involved for quality monitoring of several industrial processes [18]. In laser welding process, AE is an elastic stress waves generated by the alternating pressure of the melted material on the solid surrounding that happens due to the temperature fluctuations and mass transfer inside the melt pool [19]- [23]. Due to this nature, AE captures the volumetric behavior of the matter inside the medium as compared to surface behavior of the matter grabbed by the OE and BR. The potential of AE was shown while detecting single laser welding events in several studies. A successful attempt to link the AE signals with the temperature was made by Li [22] to characterize different laser-matter interaction processes, including laser welding. A correlation between the AE and OE signals from the laser plumes was demonstrated by Farson et al. [20]. Farson et al. [24] also showed the dependency of the AE intensity with the penetration depth. Zeng et al. [21] and Sun, Jr., et al. [25] extracted the AE signatures for the porosity as well as the keyhole. They showed the possibility to detect those AE patterns during the laser welding process. Additionally, the AE signal during the laser-induced plasma formation was used by Conesa et al. [23] for monitoring the laser melting process. Despite all the progresses made to improve the quality monitoring of laser welding, it is insufficient to be implemented in industrial applications. Therefore, to obtain an efficient and cost effective in situ and real-time quality monitoring for laser welding, further developments in detecting multiple laser welding events within a single framework is required [10].
To summarize, the mutual dependency of the different physical measurable parameters is reported in parallel studies [3], [7], [14], [15], [22], [25]. However, due to the extreme complexity of the laser-material interaction, no physical description exists that links all those parameters within a single model. Under such circumstances, machine learning (ML) allows developing correlation models rather than complex physical ones, unifying versatile physical parameters within a single framework. Despite a large success of ML in recent years, its applicability to laser welding quality control is observed in few publications.

VOLUME 7, 2019
Hader [26] classified the images of the process zone with a polynomial classifier and estimated the welds penetration depth. Luo and Shin [27] evaluated successfully the geometry of the keyhole by processing images of the surface of the process zone using radial basis neural network. Jäger et al. [28] involved a probabilistic approach to detect defects from a sequence of images of the process zone. In the work of Deyong et al. [29], support vector machine (SVM) and feedforward neural network (FNN) were applied to detect defects using OE. Wasmer et al. [30] and Shevchik et al. [31], [32] successfully applied convolutional neural network (CNN) and spectral convolutional neural network (SCNN), respectively, to differentiate AE features of dissimilar quality in additive manufacturing -a process with many similarities to laser welding. Le-Quang et al. [33] used high speed X-ray imaging to show the real challenge to monitor the keyhole dynamic, in particular the transition stable-unstable keyhole but even more important the creation and disappearance of pores during the process. Wasmer et al. [34] applied a specialized gradient boost to differentiate AE from stable/unstable keyholes and spatter. All aforementioned approaches demonstrated the feasibility of developing ML based universal tools for in situ and real-time quality monitoring. A special interest of these works is to detect single defects, such as pores and cracks [25]- [34].
The present work is a supplement to existing studies in in situ and real-time quality monitoring of laser processes. To achieve this goal, we extended further the ML approach and focused on three characteristics of the laser welding quality monitoring. Those are: the classification accuracy; the temporal resolution (e.g. minimum time span needed for accurate classification); and simultaneous exploitation of the OE, BR and AE sensors. The classification of the collected signals was carried out by Laplacian support vector machine (LapSVM) [35]; a state-of-the-art in classification performance [36]. The attractiveness of this technique is in the possibility to involve differential geometry that precisely adapts the algorithm to given data [35]. Besides, the algorithm supports a semi-supervised/ unsupervised learning that reduces the data preparation costs; an important aspect for industrial applications.
This article includes six sections. Section II contains the description of the setup for the laser welding process and data acquisition. Section III describes the ground truth estimation of the welding quality and defines the main quality categories. Section IV describes the features employed, while Section V is a description of the features processing algorithm. Section VI presents and discusses the main results of this work, followed by the conclusions.

A. EXPERIMENTAL SETUP
The laser welding experiments were carried out using a unique setup. This setup included: a laser source, an optical head, a welding chamber, a sample holder mounted on a moving stage, an AE sensor. The general schematic is presented in Fig. 1(a) whereas Fig. 1(b) is a picture of the setup. The laser source was a fiber laser StarFiber150 P 1 (Coherent Switzerland, Switzerland) with a wavelength of 1070 nm. The laser beam was guided to a customized optical head from Coherent Switzerland (Coherent Switzerland AG, Switzerland) through a single-mode optical fiber with a 12 µm diameter. The 170 mm focal lens provided the focus of the laser beam on the workpiece surface into a spot diameter of 30 µm (within 2w 0 ).
The laser operated in continuous mode with power varied in the range 25-250W. This range was established in previous works for the same materials and conditions [14], [16], providing repeatable quality of conduction and keyhole welding regimes.
In addition, to prevent the potential weld contamination with oxidation, the experiments were carried out in a hermetic chamber with controlled environment. In our setup, the chamber was filled with argon at atmospheric pressure. The laser was transported inside the chamber by means of optically transparent window in the chamber wall.
Finally, the samples were mounted on a moving stage M-663.5U (Physik Instrumente, Germany) providing the workpieces translation while the laser beam position was kept fixed. The stage translations were mono-directional along a distance of 5.5 mm at a velocity of 100 mm/s.

B. SENSORS AND DATA ACQUISITION
The optical head was additionally equipped with a beam splitter that directed the OE and BR signals to different sensors. Three different optical sensors with different waveband sensitivity were used and they were: silicon (Si), germanium (Ge) and InGaAs. The Si and InGaAs sensors provided detection of OE in different spectral ranges. The Si sensor had the sensitivity in the range 450 -850 nm, whereas it was 1250-1700 nm for the InGaAs sensor. In contrast, the Ge sensor had an original sensitivity in the range of 1000-1200 nm. The narrow band pass optical filter provided only a selective detection of the BR of the laser (1070 nm). The spectral ranges for the optical sensors are schematically represented in Fig. 1(c). The location of each sensor inside the setup is encoded with the same color as in Fig. 1(a).
Additionally to the optical sensors, an AE sensor was installed inside the welding chamber as shown in Fig. 1(a). The contact sensor PICO (Physical Acoustics, USA), with high sensitivity within the range 500 -1850 kHz, was clamped on the sample. The AE and optical sensors were connected to a data acquisition unit from Vallen (Vallen Gmbh, Germany) with an acquisition rate of 10 MHz. The acquisition of all sensors was triggered by the BR signal. The signal of each sensor consists of a pre-trigger acquired 4 ms followed by 50 ms of the laser process and another 40 ms after the process has ended.

C. MATERIAL
In this study, we performed laser welds on 2 mm thick and flat workpieces made of titanium-aluminum alloy (Ti6Al4V, grade 5) having a melting temperature at 1,650 • C. The choice of the material was stipulated by its wide practical usage in industry and the easiness in microstructure identification. The latter means significant textural changes of the HAZ that can be easily observed visually in the cross-sections.

D. CATEGORIZATION OF THE QUALITY GROUND TRUTH
The ground truth of the welds quality was characterized post mortem, by visual inspection of cross-sections with a light microscope of each welded workpiece. The categorization of the laser welding quality considered two factors that affect the mechanical properties of a welded joint, which are: the penetration depth and the presence of porosity. Those factors are provided by two major welding regimes: conduction and keyhole welding. Hence, we defined four categories: (1) no illumination; (2) conduction welding; (3) keyhole without porosity and (4) keyhole with porosity. The first category represents an absence of laser irradiation; the laser power is P = 0. The second category is characterized by a low intensity of the laser power leading to conduction (shallow) welding mode. The third and fourth categories are for keyhole (deep) welding mode. The distinction between conduction and keyhole regimes is important since, in real life applications, both may be present within a single continuous weld. As already mentioned, keyhole welding is known to be prone to defects, especially porosity. Thus, we decided to separate the categories keyhole with and without porosity. More details about the categorization are given in Section V.A.

III. FEATURES EXTRACTION USING M-BAND WAVELETS
In this work, the features extraction was carried out in the time-frequency domain applying M -band wavelets. M -band wavelets [37] are an extension of the traditional wavelet transforms [38] that provide a more detailed tiling of the time-frequency space and more rare artifacts due to the shiftinvariance [37]. Additional advantage of wavelets is in their close relation with the finite impulse response filters (FIRs) that allow de-noising the inputs and potentially allow adapting the processing to the given signals [37]. The efficiency of wavelets was demonstrated by Shevchik et al. [31], [32] for additive manufacturing -a close process to laser welding. Besides, the extraction of the data in the time-frequency domain using wavelets may help in the physical interpretation of the observed phenomena. The decomposition using M -band wavelets is expressed as [37]: where ϕ() and ψ() are the approximation and details coefficients, respectively, j is the current decomposition level, n is the current sampling point of the digitized signal S and h 0 , h M −2 ,h M −1 are the low pass, narrow pass and high pass filters, respectively. At each scale, the filtering from Eqs (1) and (2) results in the extraction of the low, narrow and high frequency contents of the signal. Each band is described by a set of decomposition coefficients. In this contribution, the normalized energy of the frequency bands was taken as descriptive features, which allow monitoring the energy redistribution within the frequency bands independently from VOLUME 7, 2019 the absolute levels of the signals. It is defined as: where In Eq. (4), d are the products from Eqs (1) and (2). The term E j in Eq. (5) is the total energy accumulated in all frequency bands at scale j. All details of this technique can be found in [37], [38].

IV. FEATURES PROCESSING A. LAPLACIAN GRAPH SUPPORT VECTOR MACHINE WITH DATA ADAPTIVE KERNEL
Laplacian graph support vector machine (LapSVM) was selected due to its outstanding efficiency in resolving complex features configurations [35], [36]. LapSVM is an extension of traditional SVM [39], hence, the algorithm searches for a decision cut that separates the features and a kernel trick can be employed to non-linear cases [39]. The novelty of LapSVM is in the introduction of a graph that captures the mutual geometry of the features. This geometry from labeled features can be further extended to unlabeled ones, thus extending this method to a semi-supervised or even unsupervised training [35], [36]. Such situations are of a high value for industrial usage. In this investigation, the high efficiency of LapSVM was additionally enhanced by involving an adaptive kernel that improves the efficiency and gives advantages for further tuning to adapt to the complex features configurations. An example of configurations that was addressed can be seen in Fig. 2(a), where synthetic data represents a mixture of two feature sets (categories), marked with red and blue. This is a co-dimensional manifold intersection that can be seen as a number of local densities of the features from one category, surrounded by the ones from another category. Those local densities with one type of features are hereafter referred to as clusters. The sizes, shapes and mutual positions of the separate clusters are random and are not known a priori. Obviously, this complicates the search for the computation of the optimal cutting hyperplane during the LapSVM training. The performance of the traditional LapSVM on the data in Fig. 2(a), realized by Melacci and Belkin [36] with a single Gaussian kernel, is illustrated in Fig. 2(b), where the decision cut of feature space is the surfaces highlighted in green. As seen from Fig. 2(b), the separation of the feature space includes some inaccuracies since the green areas incorporate features from both categories.
In order to increase the accuracy, we introduced an adaptive kernel, expecting more accurate performance for the features configuration in Fig. 2(a).
In a binary case, the features sets are given as , where x i ⊂ x are the features, l and u are the number of labeled/unlabeled features, and y i, l+u i=1 are the category labels, where y i = ±1 is assigned for labeled features whereas y = 0 for the unlabeled ones [35]. The decision cut is defined as a functional [35]: where b is a bias, and K(.,.) is a kernel that defines the high dimensional remapping rule. The adaptive kernel K(.,.) can be built as a linear combination of basis Gaussians G(.,.) [40]: where d j are the weights of the individual Gaussians and ν is the total number of Gaussians involved. In this work, the L 2 norm was used in training to ease the optimization process, while the weights of the basic Gaussians in Eq. (7) fulfilled the condition: The choice for Gaussians as basic functions in Eq. (7) was stipulated by the welldeveloped methodology for both, kernel trick and clustering using those [41], [42]. The present contribution took advantage from both by combining them within a single framework. The individual basic kernel Gaussians G j () are defined as: where µ is a mean vector of size [1xd] and d is the dimensionality of the feature space, σ is a positive definite symmetric matrix of size [dxd] that shapes the Gaussian in d-dimensions.
In this matrix, the diagonal elements of σ are the variances whereas the non-diagonals ones are the Gaussian rotation in a specific dimension. As can be observed from Eq. (7), the kernel K(.,.) is parameterized by a set of parameters d j , µ i , σ i ,ν, which are unknown. The objectives are to tune those to provide an optimum decision cut from Eq. (6). In this study, the kernel parameterization was carried out in two stages. First, a soft clustering of the data in the original feature space was performed with estimates of µ i , σ i , and ν (see Section IV.B). Second, an optimization of the Gaussians weights d j of the kernel in Eq. (7) is performed (see Section IV.C). The intermediate steps on the synthetic dataset are shown in Fig. 2(a)-(e).

B. FEATURES CLUSTERING
To start with, the clustering of the original feature space was performed and involved only the labeled features. The aim of the clustering was to estimate a number of separate clusters ν and the variances σ for the individual basic Gaussians (see Eqs (7) and (8)). The algorithm switches between the two processing stages, which are: i) graph cuts [41] and ii) Gaussian fitting using expectation minimization with posterior constraints [42]. In our case, the former defines the number of existing clusters in the given data, while the latter fits the Gaussians to the defined clusters with respect to their spatial configuration.
The normalized graph cut [41] was applied to the labeled features. The method is based on construction of a fully connected graph [35]. First, the adjustancy matrix W ⊂ R NxN , is computed, in which the elements w ij ⊂ [0,1] define the similarities between the two features x i and x j . The heat based kernel was applied in this work, defined as: where . is an Euclidian norm [35].
The weights w ij between the labeled features from the different categories were set to zero. In graph representation, the features within the same cluster have a higher connectivity with their neighbors as compared to the rest of the graph or even disjoint from it. This geometry is encoded in the weights W , where the higher values of w ij indicate a closer neighborhood within the same cluster s , s = 1, . . .,ν. The separate normalized clusters are defined as [41]: , where the minimum values of N cut correspond to the cluster borders of the volume vol { i } = i,j⊂ s w ij . Further details for this method can be found in the original work of Shi and Malik [41]. The number of found clusters parameterizes the number ν of basic Gaussians in Eq. (7).
The expectation minimization (EM) is applied further to fit a Gaussian from Eq. (8) to each cluster, defined with graph cuts. EM is a probabilistic approach, in which each Gaussian weights each feature of the given dataset. The tuning of Gaussians parameters µ i , σ i with EM aims to provide higher weights for the features from one category, as compared to the ones from another [42]. The distribution of the features weights (DW) among all Gaussians from Eq. (8) is defined as [42]: where p(x|µ i , σ i ) are the weights of the features in a particular Gaussian i. Using the L 2 norm optimization, the weights in Eq. (9) fulfilled the condition: d i = 1 (in accordance with the description of Eq. (7)). In this contribution, the features from each cluster, defined from clustering procedure are weighted with an associated Gaussian, which is optimized to provide their higher weights p j (x j |µ i , σ i ) as compared to the surrounded features from the other category [42]. Additionally, we used posterior constraints on Gaussian fitting to keep the certain margin between the weights p j (x|µ i , σ i ) of the features from the different categories [42]. The tuning of parameters µ i , σ i , s i , of each Gaussian under constraints is made by introducing hidden variables and observing those for estimating the optimum [42]. We decided to take the hidden variables based on results from Belkin and Niyogi [43], and the optimization from Archambeau and Verleysen [42]. According to Belkin and Niyogi [43], the link between the hidden and real variables can be made using a Laplacian graph where the optimum distribution is defined as [43]: The search of an optimum in Eq. (10) is an iteration process with two stages. First, the expectation is computed according VOLUME 7, 2019 to [42]: Further, the minimization of µ i ,σ i , s i , were updated according to [42]: The convergence of the solution was in the minimization of the objective log-likelihood [42]: Following Eq. (12), the convergence is reached when the sum of the logarithms is minimum, meaning the desired mutual position of the real variables. The solution of the EM from Eqs (9)-(12) was carried out using the Newton iteration method and more details for this procedure can be found in Archambeau and Verleysen [42].
The entire clustering algorithm involves the aforementioned normalized graph cuts, followed by the expectation minimization (EM). The pseudo scheme of this algorithm is shown in Table 1. First, k clusters are defined for only a single category by the graph cuts [41]. If the mutual features configuration is analogous to the one in Fig. 2(a), then, the features of one category are in close neighborhood with the features from another category.
The Gaussian fitting, in this case, is subjected to the EM constraints already described. The Gaussian fits the features from the current cluster by maximizing their weights p x | µ i , σ i as compared to all features from the other category. In some cases, a single Gaussian cannot provide the required weighting for the cluster due to its complex configuration. Under such circumstances, the problematic cluster is further partitioned and the EM is applied again to each defined partition. This is repeated until the weights fulfill the predefined constraints and each iteration increases the number of Gaussians ν.
The outcome of the clustering algorithm is a Gaussian mixture from Eq. (7), with specified parameters: number of Gaussians ν, mean µ i and variance σ i for each individual Gaussian Eq. (8). As already mentioned, µ i and σ i define both the centers and orientations of the Gaussians in the original feature space. The thresholds for the margin in EM were set a priori. The weights d i from Eq. (10) were set to 1/ν and were as well tuned during the LapSVM training. The results of the clustering algorithm performance are shown in Fig. 2(c). The contours define the positions of the basic Gaussians in the original feature space.
As seen from Eq. (7), the d i remained not optimized and their tuning were made during a second optimization stage that is described in the next section.

C. TRAINING OF THE LAPSVM WITH ADAPTIVE KERNEL
The objective of a regular LapSVM training is the computation of unknowns α i and b in the decision cut in Eq. (6) [35]. In the present contribution, the training is extended to specify as well the individual weights of the basic Gaussians d i , defined in Eq. (7). For this reason, the training procedure in adaptive kernel incorporates an additional regularization term that takes d i into consideration. The training of the LapSVM aims to minimize the term [40]: where D = {d j } is an [1xν] array of basic Gaussians weights d j (see Eq. (7)), G is an [νx1] array of the basic Gaussians, A = {α j } with j=1,..,l+u are the coefficients from Eq. (6), λ 1 , λ 2 , and λ 3 are the empirically assigned balances of the weight loss, l and u are the labeled and unlabeled samples, respectively, and L is a Laplacian. We used a non-normalized version of the Laplacian, defined as L = D − W .W is an adjutancy matrix described in Section IV.B, D is a diagonal matrix with elements d ii = i=1,..,l+u w ij . The minimization of Eq. (13) was achieved in primal, according to the work of Guo and Jin [40]. The decision function fulfills the con- . . ,l. A solution can be found by introducing the Lagrangian multipliers in Eq. (11) as described in [40]: The minimization of Eq. (13) using Eq. (14) results in the estimation of all unknowns, namely d j from Eq. (7) and α i and b from Eq. (6). This brings to a complete parameterization of the decision cut in Eq. (6).
Additionally, it must be noted that the structure of the given data can be evaluated by the number of defined clusters ν from Eq. (7). A larger number of clusters indicate a complex mutual configuration. A smaller number of clusters indicate a good separation in the original feature space. In specific situations, where the features can be linearly separated, our algorithm defines a single Gaussian and so to the regular LapSVM.
The original feature set is observable in Fig. 2(a) and the performance of the entire algorithm is in Fig. 2(c)-(e). Figure 2(c) presents the fitted Gaussians over the estimated clusters in the original feature space and Fig. 2(d) the resultant kernel. Figure 2(e) shows the results of the decision cut of the feature spaces, which delimits the original feature space with the surfaces highlighted in green and a white. By comparing Fig. 2(b) and (e), obviously, our approach has enhanced the quality in separating the two feature categories. This is confirmed since none of the red dots (features) lie in the white area and no blue dots (features) are inside the green areas. Finally, our algorithm and experiments were run in Matlab 2012a.

V. RESULTS AND DISCUSSIONS A. DATASET CATEGORIZATION AND LABELING
In this study, 23 laser welds with a total length of 115 mm were produced. The laser power during the different experiments varied between 25 and 250 W keeping the same focused spot of the laser beam at 30 µm in diameter. This laser power range was known from previous works to produce different quality of conduction and keyhole welding regimes [14], [16]. These different welding regimes are illustrated in Fig. 3 where the laser power was linearly increased from 25 to 250W during a 50 ms pulse as shown in Fig. 3(a). The sample was moved at a fixed velocity of 100 mm/s over a length of 5.5 mm, producing a line weld with different qualities as shown in the top view ( Fig. 3(b)) and cross-section (Fig. 3(c)) optical images. In the former, the textural changes of the surface due to the laser process are visible. One of the textural components of the weld area is the solidified waves of the melted material. Those accompany the welding process and are one of the factors for BR distortions (see description of the nature of BR distortions in the introduction).
It is also seen that an increase in laser power results in an increase of the weld width and penetration. The ratio between the penetration depth and weld width, also known as weld aspect ratio (depth/width) is one of the characteristic parameters that differentiate conduction and keyhole welding [1], [3]. An aspect ratio below 0.5 is typical for conduction welding whereas the keyhole channel is typically formed at a ratio greater than 0.5 [33]. Although this information could be considered for quality monitoring, it would be limited to only two categories of conduction and keyhole welding, which is by far not sufficient. The reason is that the aspect ratio does not take into consideration the events taking place within the material. This is evident when comparing the top view ( Fig. 3(b)) and the longitudinal crosssection (Fig. 3(c)). Therefore, the longitudinal cross-sections are taken as references/ground truth for the analyses and defining the different categories.
It is important to note that in Fig. 3(c), the penetration depth can be easily distinguished by the textural changes between the melted zone and the HAZ. In this figure, the pores are easily observed as black spots. Finally, as mentioned in Section II.A, four categories were defined, namely (from left to right) no illumination, conduction welding, keyhole without porosity and keyhole with porosity. Each category is represented by a color code in Fig. 3(c).
The category no illumination represents the absence of laser power. In the present study, it corresponds to the signals, acquired during the pre-trigger stage of the acquisition.
The category conduction welding was achieved with a laser power of 25 W and is mainly characterized by a very small depth and a smooth surface texture of the melt pool ( Fig. 3(b)). Since the depth of the melt pool was below 30 µm in this setup, it could not be clearly observed in the cross-section (Fig. 3(c)). It is known that the change between conduction welding and other laser welding regimes may even occur at the same laser power due to slight changes in local optical properties of the material. This may cause a local weakening of the weld joint.
The category keyhole without porosity was formed with laser power within the range 25 -100 W, which resulted in a penetration depth within the range from 30 to 370 µm (See Fig. 3(c)). This category was characterized by the occurrence of a keyhole with a rather stable geometry without fluctuations and so without pores [33].
The category keyhole with porosity was characterized, obviously, by the presence of pores. The pores creation was achieved at a laser power in the range 100 -250 W, which led to a penetration depth of up to 760 µm. Due to the increase in depth of the keyhole, it becomes unstable, which leads to its partial collapse during the process, trapping the overheated vapor in the melted surrounding. This results in porosity after resolidification of the melt pool [9], [32], [33].
The frontier between unstable and stable keyholes is very sensitive to any changes in the local optical properties of the material and/or laser parameters [5], [6], [14], [16]. The balance of the stable keyhole is kept due to the convection of the material between the different phases inside and around the keyhole, which can be easily broken by any slight conditional changes. Also observed from Figs. 3(b) and 3(c), the transition between both regimes is rather smooth and is not observable on the surface.
We would like to emphasis that the recognition of the four regimes using only a surface monitoring is an intricate task. For example, no visible changes on the surface are identifiable in Fig. 3(b) to distinguish keyhole without porosity and keyhole with porosity. Hence, to avoid errors in the ground truth by missing pores and to have sufficient statistical confidence in the results, five cross-sections were analyzed at various transversal locations of each weld to measure the pores locations also in the third dimension.
The signals from all sensors were put in time spatial correspondence with the cross-sections, similar to ones from Fig. 3(c). As the next step, those signals were labeled according to the categories defined. From these labeled data, two separate datasets were created; a training and a test sets with no shared signals. This approach simulated real life conditions where the trained system has to operate with new input data relying on its previous experience. The structure of the training and tests sets are given in Table 2 in terms of weld length in millimeters [mm]. The classification was performed with a number of separate binary classifiers, following ''one against all'' strategy. This allowed recovering the configuration of the features in the feature space, analyzing the number of clusters for each individual case. The data proportions between the training and tests sets were chosen according to the explanations given in the next section.

B. ESTIMATION OF THE OPTIMAL TIME SPAN FOR ANALYSIS
The analysis of the collected signals was made via a running window that bounded the separate patterns of the collected signals in the time domain in a similar way as in Shevchik et al. [31]. The estimation of the optimal time span was carried out experimentally via an exhaustive search. The reason is that the laser welding process is accompanied by multiple phenomena with different time scales, ranged from 10 µsec to 1 msec [5], [12]. Consequently, the received signals are a mixture of different frequencies produced by those phenomena. Under such circumstances, the optimal time span had to include the frequencies collection, which was distinct for each particular category. The choice of the time span is a compromise between the spatial resolution in detecting of a possible defect and the classification accuracy. A short time leads to a better spatial resolution in defect detection but is more affected by noise, resulting in a lower classification accuracy. In contrast, the longer time spans are detrimental for the defect location but less sensitive to the background noise. At the same time, longer time spans raise a risk of averaging the distinct features from different categories, thus decreasing the accuracy. Hence, the estimation of the optimal time span  was a subject for a sub-study. The investigation was made using the collected dataset (see Table 2, column Total length) that was divided into two sets -for training and tests. The classification was carried out for different time spans of the running window. The overall accuracy for each time span E was computed as: where i is the number of the single experiment and was equal to the number of the defined categories. The total number of tests in Eq. (15) can be evaluated using the weld lengths in Table 2 using the following simple proportion: where the momentary weld length is defined as the length of the weld that was passed during a given span time.
The experimental results of this sub-study are presented in Fig. 4, where the classification accuracy is plotted versus time span (and weld length). From this figure, it is obvious that each sensor has a different optimum. As expected, the classification accuracy decreased significantly when the time span of the running windows becomes too short. The reason is that the distinct frequency content for the defined categories is excluded and/or affected by the background noise. By increasing the time span, the classification accuracy rises until it reaches a maximum. Further increases of the time span results in a slight decrease in the classification accuracy which could be caused by the averaging of the distinct features from several categories. To account for the differences in the optima of the various sensors during the classification step, we decided to select the best time span for each individual type of sensor. In other words, the time spans were chosen from curve in Fig. 4 to provide maximum spatial resolution with acceptable accuracy and applied for all analyses, presented and discussed below. Consequently, the chosen time span for the Ge, Si and InGaAs sensors were 0.2 ms. For the AE sensor, the selected time span was 0.25 ms.

C. THE SIZES OF THE TRAINING AND TEST DATASETS
In industrial applications, the size of a training set results directly in costs (material, machining time and manpower). Therefore, any reduction of the training set is valuable and estimation of the smallest one that guarantees a sufficient classification accuracy is of a high interest. For this reason, the size of the minimum training set was also evaluated experimentally. It was made through a comprehensive search, gradually decreasing the training set and observing the classification accuracy during a series of tests. The experiments were performed using the optimal time spans selected in Section V.B. In Table 2, the training sets were intentionally minimized.

D. CLASSIFICATION RESULTS
The results of classification tests for each individual sensor, as well as all sensors combined are presented in the tables in Fig. 5. In these tables, the first column (numbers in bold) correspond to the classification results with LapSVM with adaptive kernel. The second column (numbers in italic) correspond to regular LapSVM (single Gaussian kernel), the third and fourth columns contains the results of the conventional and temporal CNNs for comparison. Also in these tables, the experimental classification accuracies (classification results) (in rows) versus ground truth (in columns) are given. The classification accuracies are defined as the number of true positives divided by the total number of tests according to Eq. (15). These values are given in the diagonal cells of the tables that are shaded in grey. Following Eq. (15), the classification errors are the ratio of the true negatives divided by the total number of the tests. The corresponding values are in the non-diagonal row cells. For example, for the Ge sensor (BR) (see Fig. 5(a)), the classification results obtained with the LapSVM with adaptive kernel (first column in bold) for the category keyhole without porosity was classified with an accuracy rate of 89.1% (bold red). The errors are due to an overlap with the categories keyhole with porosity with an error rate of 9.8% (bold blue), and conduction welding with an error rate of 1.1% (bold green). For this particular case, there was no overlap with the category conduction welding.
Additionally to multi-kernel version of LapSVM (bold), we give the results of regular LapSVM (italic), conventional convolutional neural network (CNN) [44] and temporal CNN [45]. Both CNN included five convolution operations and the required number of those was estimated in exhaustive experimental search. The further increase of number of those layers did not show any advantages and operated with the same error proportions. We observed that the multi-kernel version of LapSVM algorithm developed in this contribution did not offer tremendous advantages when the penetration depth was shallow (no illumination and conduction welding). But, our algorithm gave better results as the penetration depth increased and evidence of this is in Fig. 5 for the categories keyhole without porosity and keyhole with porosity.
It is important to note from the tables in Fig. 5 that the classification accuracy for the defined welding quality for the multi-kernel LapSVM ranged between 85.1 and 98.8%. This demonstrated the feasibility of our approach which combines state-of-the-art sensors and machine learning for VOLUME 7, 2019 data processing for laser processes. The computational time needed per a measurement was as fast as 70 ms. Taking into account the classification accuracy and the computational time, it proves that our approach has a high potential for in situ and real-time quality monitoring under various laser welding regimes. Additionally, the fact that the classification accuracy was dependent of the sensor type could be explained by the individual sensitivity of each sensor to the specific physical phenomena as is mentioned in the introduction. This aspect requires a proper choice of measurable parameters, while adding newer categories to the database in the future.
Inspection of the tables in Fig.5 indicates the classification accuracy decreases as the penetration depth increases. This observation is valid for all sensors and algorithms. But, for simplicity, the results are discussed mainly on the developed mutli-kernel LapSVM algorithm.
A decrease of the classification accuracy as the penetration depth increases is confirmed by the fact that the category no illumination had the highest accuracies which were equal or above 98.8%. This category was characterized by the absence of the melt pool. Since no intense process takes place during this stage, the corresponding signals were characterized by the absence of high frequency content. The classification errors for this category were only in overlap with the category conduction welding.
The category conduction welding followed next with a higher penetration depth and showed classification accuracies equal or above 89.7%. The main misclassification error was in overlap only with the categories keyhole without/without porosity. This error could occur due to smooth transition between those two categories and evidence of this can be seen in the longitudinal cross-section in Fig. 3(c).
The accuracy drops for the next two categories, keyhole without porosity and keyhole with porosity, which had higher penetration depths. The explanation is related to the complex dynamics of the keyhole, which is reflected in a drastic fluctuation of the recorded signals [33], [34]. This causes an extreme features configuration in the feature space, by analogy to the one in Fig. 2(a). The classification errors for these categories were mainly due to an overlap between each other. It is interesting to note that the lowest classification accuracy, in this study, was 85.9% and it was for the category keyhole with porosity with the InGaAs sensor via surface observations (see Fig. 5(c)). This indicates that the occurrence of pores inside the medium distorted the melt pool surface and that was perceptible in OE signals. Obviously, those distortions are getting weaker as the penetration depth augments.
By comparing the results of the different sensors, it is found, in general, that most sensors have similar classification accuracy. This finding was not expected. The reason is that the optical sensors detect only changes at the surfaces whereas the AE sensors capture the volumetric fluctuations of the material phases inside the melt pool. Consequently, it was thought that the signals acquired by the AE sensors for the categories keyhole without porosity and keyhole with porosity would include more distinct features providing a highest classification accuracy.
The combination of all sensors together did increase the overall accuracy (see Fig. 5(e)). However it is important to note that it may not always be the case. One possible explanation could be in the equal quality of the distinct features inside the collected signals from each individual sensor. In other words, features from a sensor may not be significant and when combined together with other features from the other sensors, the non-significant features impact negatively the classification accuracy. In any case, the authors believe that for implementing the proposed approach in industrial environment, a single sensor will not be considered as reliable enough. Hence, the unification of different sensors is/will be the only solution for in situ and real-time monitoring of complex and dynamic processes. But, the choice of sensors is of utmost importance. This was not studied in details in this work but it is planned as future work.
The analysis of the kernels has to be mentioned as it recovers the complexity of the features configurations for each particular category. It was observed that the number of clusters also increases with the weld penetration depth. Only one cluster was computed for the features from the category no illumination. The category conduction welding already included several clusters. Finally, the categories keyhole without porosity and keyhole with porosity were characterized with more than ten clusters, pointing out to a mixture of features inside the feature space. An analogous configuration in a 2D case was presented in Fig. 2(a). A physical explanation of this behavior may be in the generation of signals in a broader frequency range during the presence of a keyhole leading to more overlaps in these categories. The greater number of clusters in our algorithm supported this observation.
Comparing the different algorithms, CNN operated better than our developed multi-kernel LapSVM algorithm for cases with shallow welding penetration, while it showed lower accuracy in recognition of keyholes with defects that possess the deepest penetration depth. The may be explained by the fact that the amount of training data, as well as its structure were not representative for this framework.
The real life operation with the blind classification of the unknown weld is presented in Fig. 6. This weld was produced with a 50ms laser pulse, similar to the example presented in Fig. 3, but with a different power profile to test the performance of our approach. A film of the classification results of Fig. 6 can be seen at https://www.empa.ch/web/s204/Laser_welding_1. The color code for the classification results in Fig. 6 corresponds to the defined categories. The quality map obtained from the computation contains more errors in the categories keyhole with porosity and keyhole without porosity than the categories conduction welding and no illumination. Additional inspection of Fig. 6 indicates that the classification accuracy is higher for an intense generation of pores. In contrast, the classification errors rate is higher when single pores were produced. Those errors could be explained by the inherited inaccuracies in our labeling method. In our method, the spatial position of the pores in the images of the cross-sections was put in correspondence with the time domain of the signals. Consequently, the time at which the single pore was produced was not known precisely leading to potential mismatches in the training databases. Another source of errors is the absence of information about the pores dynamics. It has been observed that, within a single pulse, a pore can be first created followed by its collapse leaving no visual traces of this event [33], [34]. In such situations, we may have only been able to detect the pore creation but not its collapse leading to classifications inaccuracies.

VI. CONCLUSION AND FUTURE WORK
This work reports the uses of various sensors and a machine learning approach for signal process as a solution for an in situ and real-time quality monitoring of laser welding. This approach combines optical and acoustic sensors with stateof-the-art machine learning technics. A number of welds were made on a 2 mm thick plate of titanium-aluminum alloy (Ti 6 Al 4 V, grade 5). The optical and acoustic signals were recorded simultaneously during the welding experiments. The laser parameters were selected to produce welds of four different qualities: (1) no illumination; (2) conduction welding; (3) keyhole without porosity and (4) keyhole with porosity. The welding quality ground truth was estimated via visual inspection of surfaces images as well as longitudinal cross-sections images. Based on these images, all collected signals were labeled accordingly. The collected signals were decomposed using M -band wavelets and the relative energies of the extracted narrow frequency bands were taken as features. A regular LapSVM classifier and an own developed version with adaptive kernel were used to classify the features from the four quality categories. It was demonstrated that the classification is feasible with the classification accuracy equal or above 85.9%. This proves that our approach has a high potential to be industrially implemented as an in situ and real-time method for quality monitoring of laser welding. It was also found that the classification accuracy decreases as the penetration depth of the weld increases.
The differences in the classification accuracies from the sensor types were not significant. This came as a surprise since the optical sensors detect only changes at the surfaces whereas the AE sensors capture the volumetric fluctuations of the material phases inside the melt pool. However, by combining the different sensors, the classification accuracy could be significantly improved even for the CNN for the categories keyhole without porosity and keyhole with porosity and this demonstrated the for industrial application, several sensors will be required to increases the confidences in this method. It is important to note that combining sensors together does not always increase the classification accuracy and the choice of sensors is of utmost importance.
The analysis of the feature space for each category also revealed that the complexity in the mutual configuration of the features rises as the penetration depth of the weld increases.
As already mentioned, our innovative approach has high potential for being implemented in industrial application as an in situ and real-time monitoring quality method for laser welding. There are two strong arguments supporting our statement. First, the realization of the trained classifier is technically cheap. Second, the computational speed is fast enough (70 ms) to be built in an online monitoring system (the computation is only a matrix multiplication operation). Despite the high classification accuracy already achieved, four additional investigations are planned for future improve-ments of the system performance. First, an expansion of the quality categories is needed to investigate the applicability of the algorithm to other welding events, such as plasma plume, spattering, cracking, pores collapse, etc. Second, a control of the penetration depth using AE is also important for having a closed feedback loops in quality control. Additional improvement of sensors can be done in terms of sensitivity. As a final stage, the adaptive kernel LapSVM is needed to be tested with other materials to determine: (a) if the classification accuracy is still high and (b) if the training dataset from one material is transferable to other materials. All this is planned as future investigation.