DLAM: Deep Learning Based Real-Time Porosity Prediction for Additive Manufacturing Using Thermal Images of the Melt Pool

This paper presents an investigation of the rapid variations in the temperature of metal melt pool for Additive Manufacturing (AM) processes. The melt pool is created by scanning a high-power laser beam across a metal powder bed. Rapid heating and cooling processes are involved in the layer-by-layer fabrication of the metal part. Recent advances in Machine Learning and Deep Learning algorithms provide efficient ways to analyze large sets of data in search of correlations that would otherwise be extremely time-consuming. The use of Machine Learning and Deep Learning algorithms to understand temperature variations in AM fabrication process will allow to predict the formation of porosity before it occurs. The objective of this research is to advance the AM technology using enhanced Deep Learning techniques to provide in-situ analysis of the melt pool temperature that can lead to a reliable manufacturing of Three-Dimensional (3D) metal parts/components. In specific, Deep Learning based porosity prediction for Additive Manufacturing (DLAM) methods have been proposed. In DLAMs, several state-of-the-art Deep Learning algorithms such as Convolutional Neural Networks (CNN) using transfer learning, and Residual-Recurrent Convolutional Neural Networks (Res-RCNN) are proposed for effectively performing the end-to-end porosity prediction in real-time using thermal images of melt pool. Experimental results, in this research, show that the Res-RCNN has an overall accuracy of 99.49% and inference time of $8.67ms$ , and the Res-RCNN outperforms other baseline models. The Res-RCNN’s recursive architecture allows the network to view each input image multiples times and at varying feature levels, which enables a slight boost in porosity prediction accuracy over the commonly used transfer learning CNN models.


I. INTRODUCTION
Recently, Three-Dimensional (3D) Additive Manufacturing (AM) has been widely used in an increasing number of industries as the technology is continually maturing. The aerospace industry was one of the first industries to largely embrace these 3D printing technologies, which allowed for novel designs of lighter weight and stronger components [1], [2]. The biomedical industry has adopted this technology to build artificial tissues, customized The associate editor coordinating the review of this manuscript and approving it for publication was Jiajie Fan .
implants, and prosthetics [1], [3], [4]. A fully 3D printed automobile was unveiled in 2014 by Local Motor, while Ford and BMW have leveraged the technology for prototyping and building engine components [1]. In the field of thermodynamics, the use of AM has allowed for higher efficiency cooling devices that were previously not achievable with traditional manufacturing methods [5], and AM has helped facilitate the design of fusion energy technologies [6]. In addition, even the electronics industry has benefited from this technology, as they can embed customized and active electronic materials into structures, which will allow for novel sensing and reactive features [1].
What makes AM markedly different with traditional manufacturing techniques is that the fundamental process is entirely different, in that material is being precisely joined together at a small scale in order to achieve a larger scale shape. The process relies on adhering new material to the work piece in order to achieve the final geometries required [1]. Depending on the additive material, a variety of adhesion mechanisms are available. The most commonly used mechanism is briefly melting and rapidly cooling the material in a fixed location [1], [4]. During operation, a thin cross-section (layer) of the part is precisely filled with material, then the part is moved up or down in a small increment. Additional layers are consecutively added on top of one another until the final geometry is achieved. Since this process allows material to be precisely joined, waste material can be vastly reduced and novel geometries can be incorporated into designs [2]. Additionally, this process has been extended to a variety of materials, such as plastics, composites, ceramics, metals and even regolith [1]- [4], [7]- [10].
The Selective Laser Melting (SLM) process belongs to a class of AM technologies called Powder Bed Fusion (PBF), which relies on locally melting material powder [1], [11]. Other notable technologies in this class are: Electron Beam Melting (EBM), Selective Laser Sintering (SLS), Direct Metal Laser Sintering (DMLS) and Directed Energy Deposition (DED) [1], [2], [11], [12]. The SLM process is distinct in that the process utilizes a high-powered laser beam to fully melt the powder material, as opposed to the lower temperature sintering processes [11]. Due to its high temperature output, the SLM printer is ideal for pure metal alloys, whereas the sintering process is better suited for multi-alloy metals [11]. This also means that if the AM process is properly controlled, additional heat treatments are not required once the part is complete [11]. Compared to other PBF processes, a high proportion of excess powder from the SLM process can be reused in future prints, reducing wasted materials [12].
In addition to focusing on SLM printing, this research is also focusing solely on Ti-6Al-4V powder, which is one of the most popular α + β, titanium alloys [12]. The Ti-6Al-4V alloy was first developed in the 1950's in order to combine the tough and ductile characteristics of β grains within the stronger and more brittle α grains. The combination of α + β grains have remained widely popular in the aerospace industry due to its excellent corrosion resistance, lower density, higher machinability and its high strength [12]. Along with all these traits and its superior biocompatibility, Ti-6Al-4V has found uses in the biomedical industry as a building platform for implants [12]. Generally, titanium alloys are highly susceptible to oxidative processes and strain hardening during alloying and manufacturing processes [12]. Given these weaknesses, AM is an ideal alternative to traditional manufacturing, but further improvements still need to be addressed, particularly with repeatability of the process.
Numerous studies have been conducted to better understand both the mechanical properties of as-built, AM parts and the dynamics of the melt pool in order to detect irregularities and optimize part quality. Studies such as [4], [13]- [19] focused on understanding the fundamental phenomena behind the crystal growth and the emergent mechanical properties in as-built, Ti-6Al-4V parts. Others have focused on developing theoretical models of melt pool dynamics [20]- [22] and porosity defects such as: keyholes [22]- [30], spattering [28]- [31] and lack of fusion [23], [32], [33]. Due to the commonality of porosity defects in as-built, AM parts, the current research has focused on improving process monitoring and defect detection of porosities. Porosities are relatively small bubbles that form during a welding process and their occurrences can generally be related to how well the welding process is being controlled and monitored. In general, porosities form when gas bubbles are not able to escape from within the melt pool before the melt pool solidifies [23]. A commonly occurring defect in AM is the keyhole defect, which is when the laser's heat penetrates into lower layers and causes too much material to remelt [24]. In this case, the bubbles in the melt pool require a longer time to escape than what was originally accounted for, so these bubbles can get trapped during the solidification process [25]. Keyholes tend to form when the laser is being turned on or when the laser briefly halts its movement [27]. Conversely, lack of fusion defects occurs when the laser heat cannot penetrate enough material beneath the melt pool to properly weld materials together [28], [29]. In AM, this generally happens during the initial layers of printing, when powder is first welded to the build plate, which tends to be dissimilar material than the powder.
Common methods for in-situ melt pool monitoring for detecting porosities rely on high-speed X-Ray imaging of the melt pool or analyzing thermal radiation from the melt pool. Researchers have shown that high resolution X-Ray imaging can reliably detect bubbles and keyhole formations during the welding process, while others have been able to monitor crystal growth in real-time [22], [23], [26]- [30]. In addition, researchers were able to proscribe laser control techniques to mitigate keyhole formations [27] and a method to predict keyhole sizes based on laser intensity [26]. However, access to X-Ray imaging technology is still cost prohibitive for widespread commercial use and other research has focused around utilizing infrared, near infrared, photo diodes and optical cameras to analyze thermal information from the melt pool [33]- [47]. Researchers have shown that melt pool widths and lengths can be estimated [43]- [45] and abnormalities detected [37] directly by extracting and analyzing simple metrics and statistical measures. A real-time implementation of this method was implemented using an FPGA-based, detection system which analyzed thermal and optical melt pool images to detect abnormalities has been previously proposed [39]. An additional approach found that statistical deviations in photo diode intensity could be correlated to artificially generated porosities at known locations. This method also detected uncontrolled porosities in a secondary specimen [46]. VOLUME 9, 2021 When researchers coupled destructive testing with data from a single photo diode, they found an inverse correlation between the number of anomalous photo diode readings and the plastic elongation of their test specimen, but the ultimate strength remained unaffected [33]. Others have performed X-Ray Computed Tomography (CT) as a non-destructive test to locate porosities within as-built specimen. The combination of infrared, optical or photo diode readings with known porosity locations through X-Ray CT scans provides finer grain data for predictive models. In particular, research groups have shown that fitting ellipsoids to melt pools helped to detect anomalous defects [41]. Another research group was able to show that graph Fourier transform coefficients could be derived from two different photo diode sensors, and when these coefficients were used as input features for machine learning models, each model showed an improved accuracy in predicting porosity volume by layer compared to conventional metrics and statistical features [38]. Others have analyzed the aggregate thermal data of melt pools within a given layer and used Isolated Decision Trees [36] and Random Forest (RF) [47] models to detect anomalous melt pools and therefore predict porosities within that layer. Finally, a research group was able to utilize a pre-trained Convolutional Neural Networks (CNN) to predict whether a porosity existed at a location and its cross-sectional volume for a single-layer, single-track part, when optical images of the melt pool were input into the model in a frame-by-frame manner [40].
In [34], the locations of temperatures near the melting temperature of Ti-6Al-4V (1636 • C) were used to reconstruct a single ellipsoid. With this ellipsoid shape of constant temperature, an additional cubic spline interpolation was used to generate a continuous function representation of the melt pool boundary. Once all of the images had a continuous function representing the melt pool boundaries, Functional Principal Component Analysis (FPCA) was used to compare and extract features across all the images. It was found that the first nine Principal Components (PC's) were accounted for 99.5% of the variations between images. Accordingly, these nine PC's for each image were used as input vectors for Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), Decision Tree, Support Vector Machine and K-Nearest Neighbor algorithms (KNN). Of these algorithms, KNN achieved the highest accuracy and the QDA algorithm was close behind it. However, the main drawback to this approach was that the FPCA algorithm required to see all of the images in the data set in order to find optimal PC's, which would be used in the subsequent classification algorithms.
In a follow-up study [42], the researchers utilized a layer-wise approach instead of running classification on individual images. Initially, the images were cropped to a size of 130 × 130, then a surface interpolation using bi-harmonic interpolation was used to fit a spherical surface to the 3D Heat of Affected Zone (HAZ) images. The surface interpolation allowed a compression in size from a 130 × 130 pixel size to a 27 × 32 grid. With the parameters from the surface, Multilinear Principal Component Analysis (MPCA) was used to extract tensor features across images within the same layer. The tensors represented the thermal histories of the melt pool as it progressed through time for that particular layer. From here, the researchers used volumetric convex hull to compare all the tensors in a given layer. The researchers also calculated the statistical variance of the tensors within the layer as a benchmark to compare the convex hull method. It was found that the convex hull method was able to achieve a higher Sensitivity, precision and F-Score than the statistical variance method. In addition, this method can be utilized during the printing process to detect anomalies within a recently created layer and stop prints early if needed.
While the layer-wise methods proposed in previous research can be used to detect porosity during the AM process, catching an error in frame-by-frame would allow for a higher quality feedback loop [34], [36], [38], [42], [46], [47]. Others have relied upon simple metrics to detect melt pool anomalies in real-time, but these methods may not reliably scale due to their over-simplification of complex phenomena occurring during the AM process [37], [39], [43]- [45]. Therefore, the aim of this research is to present Deep Learning models that can accurately predict porosity based on thermal images in real-time. In this paper, we propose novel end-to-end Deep Learning algorithms to automatically extract thermal features to predict porosity for Additive Manufacturing (DLAM) using thermal images of the melt pool. The proposed methods, DLAMs, have been built upon the well-known deep CNNs including ResNet [48] and Recurrent Convolutional Neural Networks (RCNNs) [49]. We also compare the results of training CNNs from scratch to the use of transfer learning techniques. The experimental results show that the proposed DLAMs can be efficiently trained end-to-end to perform the porosity prediction without relying on post-processing procedures.

Recently, Deep
Learning has proven to be extremely successful for solving many difficult tasks within pattern recognition, data mining, and data science, where the differences between these three fields are largely due to the application of their respective results. For example, tasks within pattern recognition tend to be more general, while within data science they are typically industry specific. Both the advancement of Graphical Processing Units (GPUs) and development of heuristic techniques for training deep networks [48], [50]- [52] have enabled deep models to be the gold-standard in both academia and industry. In particular, CNN architectures have provided the state-of-the-art performance on core tasks within computer vision, computational neuroscience and medical image analysis [53], [54]. In addition, Deep Neural Network (DNN) architectures have found application in Natural Language Processing (NLP) for tasks such as learning word representations [55], [56], machine translation [57]- [59], language understanding [60], speech recognition [61], and advanced control systems [62].
There are a few common themes within Deep Learning: neural networks with increasing numbers of hidden layers, optimization using the Stochastic Gradient Descent (SGD) algorithm or second-order methods [63], the computation of gradients via the backpropagation algorithm, and the use of modular components [64]. The use of modules in Deep Learning has streamlined the ability to design task-specific modules that contribute to the powerful representation learning capabilities of DNNs. This is most apparent in the computer vision community, where CNN modules aim to increase the receptive field of filters in higher layers [65], enable the flow of gradient during training [48], and decrease the number of overall parameters [51].
In this paper, DLAMs, Deep Learning algorithms for porosity prediction in melt pool images, are proposed to offer in-situ and real-time analysis in the AM processes. DLAMs can automatically extract features from raw thermal image slices of the 3D melt pool and utilize the variant of CNNs as the classifiers to predict the existence of pore inside the manufactured part. In general, Deep Learning methods have the significant advantages of being a more generalized solution with a shorter inference (prediction) time compared to the traditional Machine Learning methods. The short inference time allows DLAMs to make predictions of porosity in real-time during the manufacturing process. Learning from a previous research that is using the same data set as this paper, a stateof-the-art Machine Learning model for porosity prediction is a KNN [34]. Accordingly, the inference time of DLAMs has been compared with the top performing KNN proposed in [34]. KNN has a time complexity of O(knp) for inferences, where k is the number of neighbors that are considered for voting, n is the number of training samples, and p is the number of features. As the available data to the model grows, which is what we can expect in the era of big data, KNN will not scale well by having a growing computation overhead for inferences. Similarly, based on the research in [66], the inference time complexity of SVM could be denoted as O(n 3 ), where n is the number of training samples. And in a study in RF by Louppe, G. [67], the inference time complexity of a RF with m trees and n numbers of training samples is (m log n) in the best case and (mn) in the very worst case. Showing both SVM and RF also suffer in their inference time with a large data set.
Besides, Machine Learning models have extra computation overhead in processing the test samples before making inferences. In the case of the KNN model in [34], the extra overhead includes feature extraction and dimensionality reduction that were done by their in-house boundary extraction algorithm(s), cubic spline interpolation, and FPCA. While the time complexity of the boundary extraction algorithm(s) was unknown, and the cost for cubic spline interpolation is negligible (in nanoseconds) [68], FPCA requires long execution time and therefore become the bottleneck of the traditional pipelines. On a test platform with an Intel Core i9-9880H CPU running the FPCA class in the scikit-fda library, it takes on average 64.3 milliseconds to finish the FPCA process for a test sample, which is significantly longer than the inference time on DLAMs presented in Section IV of the paper, that are ranging from 5.5 to 13.4 milliseconds.
Deep Learning models usually require large amount of data to be trained, however, our thermal data set is limited. Therefore, we propose the pre-trained CNN models and fine-tune on different data domain which is related to thermal images of the 3D metal melt pool in Additive Manufacturing. Furthermore, we develop an intuitive Deep Learning architecture specific for HAZ images, which we call Residual-Recurrent Convolutional Neural Networks (Res-RCNNs). RCNN was originally proposed to integrate context information from color images for object detection tasks [69]. With the context modulation ability, Res-RCNN is capable of automatically extracting thermal features in high-resolution thermal images.

A. CONVOLUTIONAL NEURAL NETWORKS
CNNs are a class of neural networks with a network architecture that is inspired by the structure of the visual cortex in the human brain [70]. They have proved the state-of-the-art on a wide range of tasks, including image classification [71], [72], object detection [73], [74], image captioning [75], and visual question answering [76]. This class of models can learn hierarchical features by using compositions of non-linear maps. From a technical standpoint, the network topology of CNNs enforce a structural prior on the model via three mechanisms: 1) Sparse connections: As opposed to the fully-connected layers of Multi-Layer Perceptrons (MLPs) 2) Parameter sharing: To enforce translational equivariance 3) Pooling: To enforce translational invariance Sparse weights account for the success of CNNs in modern times; this enables CNNs to efficiently learn from data in high-dimensional spaces. Most important for the application to thermal images, sparse connections lead to CNNs processing local, rather than global, information. Although convolution is a local operation, CNNs can capture structures at multiple scales by stacking layers, effectively widening the receptive field of each neuron. Temperature values become highly uncorrelated at long-distances within the melt pool, so measuring the correlation (convolution) within local patches is an intuitive way to model the temperature contour of the melt pool.
The neural network parameters are shared on a per image basis. These parameters comprise the convolutional kernels which are applied to each input image. The convolution operation commutes with translation, and is thus equivariant to translation: where τ x is the translation operator such that (τ x f )(y) = f (y − x); and * is the convolution operation. This implies VOLUME 9, 2021 that if the parameters of a CNN filter of size k × l are kept constant, the translation of an object from position (i, j) to (i+ , j+δ) in the image will result in a corresponding shift in the feature detected in the output activation of that layer from position (i , j ) to (i + k , j + δ l ), where the position of the new translated feature depends on the filter size k × l. In the context of porosity prediction, this means a CNN will extract the same thermal features indicating porosity, regardless of spatial location of such features. Pooling (or subsampling) layers are used in CNNs for a twofold task: they serve to reduce the dimensionality of the parameter space and to make neural activations in a CNN invariant to translations. Requiring that the model have translational invariance enforces a strong structural prior that may or may not be beneficial, depending on the task at hand. In image classification, this is essential: an object should be classified as its ground truth class regardless of where it occurs in the image. Likewise, in the context of porosity prediction, any image containing features indicating an anomalous sample should be classified as containing porosity, regardless of its position in the image.

B. TRANSFER LEARNING
In AM, it can be costly to construct large-scale, annotated data sets. One popular method for handling insufficient training data, transfer learning, aims to utilize models trained on data from one domain and apply those models to data from another domain. In statistical learning theory, it is assumed that the training and testing data are both independent and identically distributed (i.i.d.) [77] according to the same distribution P(x, y). Here, the key i.i.d. assumption for traditional Machine Learning algorithms is relaxed for the test data. The assumption now is that there exist overlapping features in both the source and target domains, and that these features, learned by the model from the source domain, can be used in modelling data from the target domain. In this way, a model can be trained using a large, benchmark data set, such as Ima-geNet [78], and then the model parameters can be adjusted to that of the smaller data set of interest.
Formally, a domain D is defined as a pair {X , P(X )}, where X is some feature space; and P(X ) is a marginal probability distribution over n examples, X = {x 1 , . . . , x n } ∈ X . In the domain of AM, X is the space of HAZ images. Given a domain D, a task T is defined to be a triple {X , Y, f }, where X is a feature space; Y is some label space; and f is a mapping f : X → Y to be learned from the data. In the current application, Y = {0, 1} where 0 denotes a normal sample; and 1 denotes a sample containing porosity. Transfer learning is defined as the process of learning the target mapping f T by incorporating information from a source domain and task {D S , T S }, given some target domain and task {D T , T T }, such that D S = D T and/or T S = T T . In this paper, we use deep CNNs pre-trained on ImageNet, in order to predict porosity in HAZ images so that both P(X S ) = P(X T ) and Y S = Y T . The primary assumption is based on the fact that CNNs extract both high-level and low-level features from data, and that the low-level features learned from the ImageNet data set can be used to predict porosity in the low-level features of thermal images. A similar method was also implemented in [40], where a pre-trained, Alexnet CNN was successfully trained to predict porosity locations based on optical, melt pool images but only with a single-layer part.

C. RESIDUAL-RECURRENT CONVOLUTIONAL NEURAL NETWORK
In this research, a Residual-Recurrent Convolutional Neural Network (Res-RCNN) for the porosity prediction is proposed and implemented as the classifier in a DLAM method. The architecture of the RCNN was inspired by anatomical evidence that shows the existence of recurrent connections in the neocortex. Albeit the exact functions of the recurrent synapses remain unclear, it is believed that they play an essential role in context modulation in objection recognition tasks [69]. Consider that in the real world, objects never exist solely. They often coexist with other objects and are associated with each other. Accordingly, the contextual information of the objects in a particular environment shall be utilized in visual related tasks [79]. The prediction of porosity from melt pool images is no exception. The formation of pores is strongly associated with its context. Therefore, it is desired to adopt a Deep Learning model that can well capture the global and local contextual associations in the limited amount of melt pool images available. In this regard, RCNN is a suitable model with its larger receptive field compared to the vanilla CNN at each pixel unit. The icon of an RCNN is the use of Recurrent Convolutional Layer (RCL). RCL is unique compared to the convolution layer used in most CNN models that have purely feed-forward connections. RCL introduces recursive connections within the same convolution layer of a Deep Learning model, and the states of the neurons at the layer evolve over discrete time steps. With a traditional feed-forward CNN, high-level features such as the context are captured exclusively at the model's higher layers. Whereas low-level features such as texture, lines and borders are captured only at the lower layers [69]. Thus, the correlation between the learned high and low-level features may be missing if the model is very deep. In other words, the context information from the higher layers of a deep feed-forward CNN model may fail to modulate the activities of the lower layer neurons that are responsible for detecting smaller objects such as pores in melt pools. However, RCL unlocks the ability of context modulation with its recurrent connections at the same convolution layer at different time steps and boosts the performance of the model. The effective receptive field of the pixels in an RCL expands as the recurrent connections iterate the previous feature maps through the time steps.
The structure of an RCL module used in the proposed Res-RCNN is presented in Fig. 1. Both the forward and recurrent connections of an RCL have local connectivity and shared weights for the convolution at different locations of the input. Besides, when the layer is unfolded through time, the recurrent connections within an RCL at the different time steps also have shared weights. Noted that at the last time step, the identity of the input state at: t = 0 is added to the output state. This specific connection does not exist at the original RCL architecture. Finally, for a patch of the input image i(t) with the k th feature map in an RCL, the state of the forward connection z k , the recurrent state r(t), and the output state of the layer h k (t) at time step t could be denoted as: where W f k and W r k are the vectorized forward and recurrent weights, respectively, b f k and b r k denote the bias for forward and recurrent connection and, g denotes a non-linear activation function. In this case, the Rectified Linear Unit (ReLU) is used.
As a result of RCL's distinct architecture with shared parameters between the recurrent steps, RCNN has drastically fewer trainable parameters compared to a vanilla CNN with a similar depth. In other words, an RCNN has a significantly lower training overhead and less likely to overfit compared to the CNN counterparts. Considering the difficulty of collecting samples and the high cost of building a large data set by the nature of the metal additive manufacturing process, the low training cost of the RCNN is very attractive.
In [69], the authors proposed RCNN that was built with four RCLs for object recognition tasks. With the configuration of three recurrent time steps, 96 feature maps, and 3 × 3 convolution kernel size at each RCL, the authors conclude that RCNN outperforms the state-of-the-art CNNs over many data sets such as CIFAR-100 [80], MNIST [81], and SVHN [82] despite having fewer parameters. The Res-RCNN model, proposed in this paper, is inspired by the work done in [69]. Res-RCNN has eight RCLs and with modifications in the original RCNN's architecture that facilitating the porosity prediction tasks aimed by DLAMs. The modifications include the adoption of the batch normalization, the added shortcut connections that are inspired by the residual network in [48], and the hierarchically increased feature maps across the layers achieved by 1 × 1 convolutions. The overall architecture of the proposed Res-RCNN model is presented in Fig. 2.
With the batch normalization, the proposed Res-RCNN converges more stably on our training data during the early training process. In [69], the original RCNN adopted local response normalization. Through experiments, it is not as effective as the batch normalization, and it also introduces extra parameters to be selected. In addition, according to the studies in [49], [50], [83], and [84], it was believed that many visual recognition tasks have greatly benefited from very deep neural network models with up to 30 layers. In its unfolded form, the proposed Res-RCNN is implemented to be a reasonably deep neural network model. With the eight RCLs in the architecture, the model contains 32 convolution layers plus the very first convolution layer when it is unfolded through the time steps. However, in [48], the authors observed the degradation problem when a model's depth was increased. The degradation of training accuracy was not due to overfitting. Also, the more layers added to a reasonably deep model leaded to higher training errors. Therefore, the idea of introducing shortcut connections (identity connections) to very deep neural networks was proposed in [48] to address the degradation problem. In this paper, the idea of adding shortcut connections between the layers is adopted. Identity connections are inserted between the adjacent RCLs in the proposed RCNN model to avoid the degradation problem and to improve the model's classification accuracy.
Besides, it is observed that with the batch normalization and the added identity connections, the trainability of the proposed model is improved. Whereas a counterpart without the two aforementioned modifications tends to fail at converging at the beginning of the training process. Finally, stride two convolutions, instead of the ordinary pooling methods, are adopted to reduce the input size after every two RCLs. Along with the decreased input size, the number of feature maps has been increased to capture the more complex features that the deeper layers tend to learn. To cope with the changes in input size and dimensions, the shortcut connections (identity connections) are replaced by 1 × 1 convolutions, which are also known as the projection connections, to match the size and dimensions of the identity and the input of an RCL. Overall, the output channels of the eight RCLs, in the proposed Res-RCNN, are empirically set to 32 at the first RCL and are doubled every two RCLs afterward. Besides, the input size is halved along with the increase of output channels. As a result, with a one channel 200 × 200 pixels thermal images as the input, the output at the 8 th RCL would be a 256 × 13 × 13 tensor.

III. DATA PREPARATION
The data utilized, in this paper, was obtained by Mississippi State University (MSU) and funded by the Army Research Laboratory [42]. The following subsections provide detailed information about the experimental setup used to collect the data and how this information is represented in the data set.

A. EXPERIMENTAL SETUP
In the original experiment, the researchers utilized an Optomec, LENS 750 machine to build a thin block with the following operating parameters: the scan speed was 12.70 mm/s, the nozzle diameter was 1.016 mm and both the hatch spacing and layer thickness were 0.508 mm. The test specimen was built using Ti-6Al-4V and its spatial dimensions were: a height of 27.56 mm, a length of 47.81 mm and a thickness of 1.78 mm. With the combined hatch spacing and nozzle diameter, the LENS 750 machine could complete each layer in a single pass. This helps to reduce the complexity of the data set and it allows for a deeper analysis of thin-walled features.
In order to collect thermal data, two thermal imaging instruments were setup up to collect in-situ thermal data of the melt pool and global. The first instrument was a single Stratonics, dual wavelength, co-axial pyrometer that was installed such that the pyrometer could collect top-down thermal readings of the melt pool. Additional information about the thermal readings and operating parameters of the pyrometer are further discussed in the HAZ Images subsection. The second instrument was an infrared camera (Sierra-Olympic Technologies, Inc. Viento 320), which was used to monitor the global thermal history of the part while in operation. However, data from this infrared camera was not included in the data set. Finally, non-destructive testing with a combination of X-ray scanning and 3D CT reconstructions was used on the completed part to locate and measure the sizes of porosities within the part.

B. DATA SET
The data set is separated into four categories: ''Y-Location'' and ''Layer'' coordinates corresponding to G-Code locations, HAZ images collected by the pyrometer and ''Size'', which is the measured pore size by the X-ray CT scan.

1) LAYER AND Y-LOCATION
The layer dimension is related to the current height in which the HAZ image was captured during operation. Since the test specimen is only as wide as the laser beam, only a single dimension is required to locate the laser beam within a given layer. While the layer spacing was 0.508 mm, the ''Y'' spacing for the HAZ images was 1.98 mm due to the scanning speed and the pyrometer's sampling rate. Both the Layer and Y-Location were not directly used in the training of the Deep Learning models in DLAMs. Given that the Layer and Y-Location features offer the spatial relations of the melt pools, it is intriguing to implement a Recurrent Neural Network (RNN) that accepts sequences of images as input. One way to form such sequences is by extracting HAZ images with the same Y-Location and different heights (indicated by Layer). However, due to the incoherent sampling range available at each layer, as shown in Fig. 6, it is not valid to form image sequences at Y-Location larger than 30 mm. As a result, a large part of the data set is inevitably discarded from the training of the model, which is not desired since the data set contains limited samples in the first place.

2) HAZ IMAGES
The thermal data contained within this data set was obtained from a Stratonics Inc., dual-wave length, co-axial pyrometer. The pyrometer had an image size of 752 × 480 pixels, 6.45 µm of pixel spacing and an exposure time of 2.03 ms. The nominal collection rate for the pyrometer was 6.4 Hz and the operating temperature range was 1000 • C -2500 • C. The HAZ images were cropped to 200 × 200 images, both to decrease the dimensionality of the input space and also to remove irrelevant background information and thermal noise. Given that we are concerned with porosity only within the melt pool, the temperatures of the background and the melt pool are highly uncorrelated. In addition, temperatures values become highly uncorrelated outside of local patches. This is the key intuition which enables the use of CNNs for HAZ images in the first place.
To understand the mechanisms within and around the melt pool, temperatures can be sampled along the scanning direction, and plotted by pixel location, like shown in Fig. 4. During the SLM process, the laser beam travels along the scan direction, and the powder material must be heated by the laser beam until the powder melts. Once the powder is sufficiently melted, material underneath and around the melt pool must remain hot enough for the newly melted material to congeal to otherwise defects can be generated at these points. Since the spatial, thermal conductivity value for the powder is less than the solid form of Ti-6Al-4V, these regions can be separated, visually. Due to lower spatial, thermal conductivity, the powder region will have a higher temperature gradient and therefore the temperature will appear to rise and fall more rapidly [44]. Conversely, the solid region has a higher spatial, thermal conductivity and will therefore distribute heat further than the powder, so the temperature will rise and fall more gradually. Additionally, abrupt changes in heating and cooling rates can be indicative of a phase transitions and therefore the elbow seen around pixel 125 is likely where the solidus transition occurs. In addition, the HAZ extends multiple melt pool lengths within each image, which allows the cooling process of previous melt pools to be seen within a single image. For example, in Fig. 4, the estimated location of the previous melt pool is also shown in blue. Therefore, anomalies that occur between HAZ samples will also cause deviations from a normal temperature profile during the solidification process as shown in Fig. 4.

3) SIZE
The ''Size'' category represents the largest measured diameter of a pore that was detected at a given Y-Location and Layer coordinate. The pore diameter values lie between 0.05 mm and 1.00 mm and they were extracted from the 3D reconstruction of the final part via X-ray CT scans. The X-ray CT scan had a resolution of 1 µm, and was capable of detecting pores down to a diameter of 0.05 mm and therefore this was the minimum diameter within the data set. The location of each porosity from the X-Ray CT scan was compared with the Y-location and Layer coordinates of the printer. The porosity's diameter value was then assigned as the ground truth label to the closest HAZ image. The spacing between any two adjacent HAZ images has been analyzed to determine the existence of any blind spots that were not covered in between the images. Calculated by the sampling rate of the pyrometer and the scan speed of the laser, the distance between HAZ images was 1.98 mm. Accordingly, the distance between the two melt pool samples was also 1.98 mm. Finally, the area covered by a single HAZ image was analyzed to determine the gap between the images. Given that the manufactured part was a single-track wall, the thickness of it, which was 1.78 mm, was also the average diameter of the melt pools. In the cropped 200 × 200 pixels HAZ images that were used as the input of DLAMs, an average melt pool was covered by less than 40 pixels in its diameter. Therefore, it was calculated that each pixel covers an area larger than 44.5 × 44.5 µm 2 , and the input images cover an area that was larger than 8.9 × 8.9 mm 2 . Albeit some readings in this area were lower than the melting temperature of the Ti-6Al-4V powder, they were still significantly higher than the environment temperature. Therefore, it is clear that the thermal intensive area covered by the HAZ images is large enough and no gap is formed between any two adjacent HAZ images. Since there is no gap between the HAZ images, all the porosity in the part have been assigned properly. Accordingly, DLAMs can give reliable porosity prediction with no blind spots under this configuration. VOLUME 9, 2021 Due to the low number of porosities above 0.05 mm diameter, each HAZ image contains a single, unique pore diameter value. However, the low number of porosities in this data set also causes a major imbalance between normal melt pools and those with defects. The imbalance can be seen in Fig. 5, which also shows how pore sizes are stratified into three distinct regions: no porosities, porosites between 0.05 mm -0.50 mm and porosities greater than 0.50 mm in diameter.
Similarly, Fig. 6 shows the spatial distribution of these three classes of pore sizes and it represents an overlay of pores onto the front view of the finalized part. A closer inspection of the two sub-classes of defects, shown in Fig. 6, reveals different physical phenomena cause both types of defects. An example of a HAZ image from the first build layer with an associated pore size between 0.05 mm and 0.5 mm can be seen in Fig. 7. Compared with Fig. 4, the melt pool, seen in Fig. 7, has a wider melt pool and a lower maximum temperature, meaning it has a lower energy density within the melt pool. Conversely, Fig. 8 shows a HAZ image from an upper layer with a porosity between 0.5 mm and 1.0 mm and the smaller melt pool indicates the melt pool has a higher energy density. However, both Fig. 7 and 8 have similar symmetrical shapes and therefore the typical melting and solidification process as seen in Fig. 4 did not occur.

4) MISSING DATA
As seen in Fig. 6, the higher the ''Y'' coordinate gets, the more likely a HAZ image is to be discarded from the data set. These missing images are due to corrupted or missing readings from the pyrometer [42]. Unfortunately, this prevents a full reconstruction of the part's thermal history. Additionally, each image had a relatively small number of pixels with missing data that appeared to be somewhat randomly distributed. Only a minor number of images had any error  pixels that appeared near or within the melt pools. However, these error pixels appeared as 0 • C, which is well below the pyrometer's minimum temperature of 1000 • C.

5) DATA SPLIT
The data set consisted of 1,557 labeled thermal images, 72 of which contained porosity. The HAZ image data set was then split into training, validation, and testing data sets. 75% of the data was used for training/validation. Of this split, 10% went to validation. For the transfer learning experiments, this training set was split into an initial training set, along with a smaller fine-tuning set. The RCNN and Res-RCNN training did not utilize a fine-tuning set, and was instead trained on the full training set.
The number of samples per class in each split, along with the total number of samples per split, and the percentage of the minority class (porosity) is shown in Table 1. The splits were performed such that the ratio of porosity samples to normal samples in the training, fine-tuning, and test sets were as close to the ratio of porosity samples in the original data set (prior to splitting) as possible. Synthetic Minority Over-sampling Technique (SMOTE) [85] was used in order to combat class imbalance of porosity samples in both the training and fine-tuning data sets. The total number of training samples after SMOTE is included in Table 1.

IV. EXPERIMENTAL RESULTS OF DLAMs
This section presents the performance of DLAMs. All the experiments are done on Google Colab with an NVIDIA Tesla V100 GPU with 16 GB of video memory available. The evaluation measurements include accuracy, sensitivity, precision, F-score and false alarm rate (FAR), which are calculated from the confusion matrix generated by the performance on the testing set. The optimal model aims to maximize all metrics, except the FAR, which should be minimized.

Sensitivity class =
TP class TP class + FN class (5) FAR class = FP class TN class + FP class (6) Precision class = TP class TP class + FP class (7) F1 class = 2 · Precision class · Sensitivity class Precision class + Sensitivity class (8) The performance of GoogLeNet, ResNet18, and VGG-16 with batch normalization are compared. All architectures have the final layers (which may be referred to as the classifier) replaced by three fully-connected layers, of dimensions 1024 (512 for GoogLeNet), 64, and 1. PReLU [86] is used after the first fully connected layer, and has been applied across all 1024/512 features. Dropout with probability of 0.2 is used before the last two fully-connected layers. The number of trainable parameters during transfer learning and fine-tuning in each architecture can be found in Table 2. The ResNet and VGG were trained using SGD, with a learning rate of 1e−6, momentum of 0.999, and L2 regularization parameter of 1e−5. GoogLeNet was trained using SGD with a higher learning rate of 5e−6, momentum of 0.999, and larger L2 regularization parameter of 1e−4. Fine tuning was performed using Adam with L2 regularization parameter of 1e−2 for all networks, with a learning rate of 1e−7 for ResNet and GoogLeNet, and 1e−9 for VGG. The learning rates were decreased to 65% every 50 epochs, for 500 epochs during transfer learning, and decreased to 45% every 10 epochs for 50 epochs during fine-tuning. The results are summarized in Table 3.
For the Deep Learning models trained from scratch (CNN, RCNN, Res-RCNN), no fine-tuning has been performed. SMOTE is applied only on the training samples for data augmentation and to alleviate the class imbalance in the data set. Adam [52] is used as the optimizer for error back propagations and the models are trained by 50 epochs each. In the final experiments, the training parameters are empirically set based on the validation loss and accuracy calculated along with the previous training processes. The learning rate is set to 0.001, and has been decreased to its 1/10 at the 35 th and  45 th epoch. The weight decay is set to 0.01. Finally, each model has been trained on the same training data set for up to 20 extra epochs with the learning rate set to 1e−5 and the weight decay set to 0 to search for its best validation benchmarks.
As the final experiment results, the performance metrics of all the Deep Learning models tested for DLAMs are presented in the Table 3 and 4. Compared to CNN that is trained ordinarily, the pre-trained GoogLeNet and VGG-16 with transfer learning both gain respectable improvements across all the metrics. It is also observed that the proposed Res-RCNN model not only has the best accuracy performance, but also yields a 20% higher Sensitivity, 10.5% higher F1 and on par FAR compared to the performance of CNN. Meanwhile, given the recurrent connections within an RCL have shared weights, the proposed Res-RCNN has 43% less trainable parameters compared to the CNN counterpart while having almost three times the depth. Compared to RCNN in [69], the proposed Res-RCNN results in an improvement of 5.8% higher Sensitivity, 8.6% higher F1, and only half the false alarms. Finally, the confusion matrices generated by the Deep Learning models are offered in Table 5 and 6. When analyzing the misclassified images reported in Table 5 and 6, the most common errors involved images within the second layer. This seems to be due to a transition in temperature profiles that occurs from the first layer to the third layer. In the first layer, the temperature profiles look similar to Fig. 7, whereas in the third layer, the profiles morph into the one seen in Fig. 4. The cause of this transition is due to thermal interaction between the specimen and the build platform, and it is consistent throughout all of the images within these first layers. An example image within layer two is shown in Fig. 9 and it has symmetrical qualities, like in Fig. 7, yet it also has a slight elbow in the solid region, like in Fig. 4. This layer is more challenging to discern anomalies due to the ratio of porosities to normal images being slightly more equal than in the first or third layer. In addition, the small data set may have allowed the DLAMs to overly generalize on profiles like Fig. 4, 7 and 8, so they may not have been attuned to all the minute differences for correct classification. Despite these common misclassifications, the short inference times shown in Table 2 and high benchmark performances in Table 3 and 4 imply that the development of a real-time monitoring system is feasible using DLAMs. Considering that the pyrometer has a collection time of 156 ms (6.4 Hz), while the longest inference time for the proposed DLAMs is 13 ms, an inference operation can be easily utilized between pyrometer samples. This proposed system requires a dedicated processor to read streaming data from the pyrometer and immediately utilize its porosity prediction pipeline to analyze a given frame. When a porosity is predicted, a simple true or false command could be sent to the SLM machine in order to initiate an emergency halt building process. Whereas, [42] proposed a layer-wise anomaly detection system, a system utilizing DLAMs would be able to initiate a halt command between sample images, therefore we could save additional time and preserve more resources in the event of an operational error.

V. CONCLUSION AND FUTURE DIRECTION
In this paper, DLAMs are proposed to shed some light on the formation of porosity in the metal AM processes. By extracting features from thermal images of the melt pools, DLAMs generate predictions of the existence of pores in the monitored area during the manufacturing processes. Equipped with Deep Learning models, DLAMs have the significant advantages in the inference time compared to the traditional Machine Learning algorithms. The quick inference time enables DLAMs to produce real-time predictions, which can be used as feedback and potentially improve the yield of the AM processes. There are two major approaches in the development of DLAMs presented in this paper, the transfer learning models and Res-RCNN. Through the carefully designed experiments and thorough evaluations, the proposed approaches have been proven to outperform the baseline counterpart. In conclusion, with the high classification performance and quick inference time accomplished by DLAMs, new areas of research have been opened to further advance the AM processes.
In the future work, DLAMs can be further implemented to prove whether they are capable of distinguishing between variations of pore sizes. In addition, future iterations of DLAMs would also be incorporated with some manufacturing operating parameters such as scan speed and material type to further improve the AM processes. With the ability to predict porosity types and understand differences in operating parameters, DLAMs would be ideal for a generalized feedback system. In addition, 3D Deep Learning models would yield important information about the physics behind 3D melt pool dynamics and porosity formation. Finally, the future optimization of the input data or pipeline will also yield faster inference times and therefore DLAMs would be more accessible for off-the-shelf hardware solutions.  He also served as the Chair, a Professor, and the Director of Graduate Programs with the College of Science and Engineering, Southern Arkansas University. He participated in multimillion-dollar grants from the National Science Foundation, NASA, Department of Defense, and other federal and state agencies. He published over 80 articles in scientific journals and conference proceedings. He also chaired technical and engineering conferences at the national and state levels. VOLUME 9, 2021 LINKAN BIAN received the B.S. degree in applied mathematics from Beijing University and the Ph.D. degree in industrial and systems engineering from Georgia Institute of Technology. He is currently a Thomas B. and Terri L. Nusz Endowed Associate Professor with the Industrial and Systems Engineering Department, Mississippi State University. His research interests include analytics of big data generated from complex engineering systems. The methodology of his research includes machine learning, surrogate modeling, statistics optimization, and uncertainty quantification. His research has been applied to areas including additive manufacturing, predictive maintenance, cybersecurity, and other engineering systems. He has received federal funding from NSF, NIH, DOD, DOE, and industrial companies. He has published one book and over 50 peer-reviewed papers that appear in prestigious journals. His work has been widely recognized in the industrial and system engineering professional communities. He received the Outstanding Young Investigator Award from the Institute of Industrial and Systems Engineering (IISE), as well as multiple best paper awards. He is an Associate Editor of the flagship journal of IISE Transactions and the President-Elect for the IISE Quality Control and Reliability Engineering Division.
MOHAMMAD MOZUMDAR received the Ph.D. degree in electronics and communication engineering from Politecnico di Torino, Italy. He is currently an Associate Professor with the Electrical Engineering Department, California State University, Long Beach, and an ex-postdoc from the University of California at Berkeley, Berkeley. His novel ideas of model-based design for sensor networks made a profound impact on engineering and industrial communities and have been published in a book chapter, renowned journals, reputed conference proceedings, major scientific magazines, and also translated in several different languages. His research interests include methodologies and tools for embedded systems especially in the domain of sensor networks, energy-efficient building information and control system design, cloud computing, cyber physical systems, and methodology for the design of distributed embedded systems typically subjected to high real-time, safety, and reliability constraint.