Improving Build Quality in Laser Powder Bed Fusion Using High Dynamic Range Imaging and Model-Based Reinforcement Learning

In laser-based additive manufacturing (AM) of metal parts from powder bed, information about actual part quality obtained during build is essential for cost-efficient production and high product quality. Reliable and effective monitoring strategies for laser powder bed fusion (LPBF) therefore remain in high demand and are the subject of current research. To address this demand, a novel analysis approach using high dynamic range (HDR) optical imaging in combination with convolutional neural networks (CNN) is proposed for spatially resolved and layer-wise prediction of the surface roughness of LPBF parts. In a further step, the predicted surface roughness maps are used as a feedback signal for a reinforcement learning technique that employs a dynamics model to subsequently identify optimal process parameters under varying and uncertain conditions. The proposed approach ultimately combines the estimation of the local surface roughness based on image texture and model-based reinforcement learning to an in-situ optimization framework for LPBF processes. In addition, the relationship between the layer surface roughness of the part and the overall part density is discussed on the basis of experimental data, which also indicate the applicability of the proposed method in industrial environments. This preliminary study is a first step towards highly adaptive and intelligent machines in the field of automated laser powder bed fusion with the primary goals of reducing production costs and improving the environmental fingerprint as well as print quality.


I. INTRODUCTION A. PROCESS PARAMETER AND SURFACE ROUGHNESS
As a sub-branch based of AM, the laser powder bed fusion (LPBF) technology is frequently used in machine tool and automotive industries [1], in aerospace engineering [2] as well as for medical devices [3]. LPBF is considered one of the key technologies that enables the fabrication of increasingly complex parts and systems with high demands on mechanical properties (e.g., yield strength, ductility, or heat resistance) [4].
However, the lack of process reproducibility and the resulting quality differences between work pieces hinder the transition of the technology to mass production. Hence, a reliable The associate editor coordinating the review of this manuscript and approving it for publication was Wai-Keung Fung . and cost-effective approach for in-situ quality monitoring and process optimization is highly demanded [5]- [7].
A significant quality parameter in LPBF is the increased roughness of the as-built surfaces, which potentially leads to reduced fatigue life of the final part due to the concentration of residual stresses on the surfaces [8]. Additionally, high surface roughness generally leads to poor surface quality and therefore requires long and expensive post-finishing operations. The final part surface is often specified to be in range of the roughness defined by the current application which can require a surface roughness of 0.8 µm or better to prevent mechanical failure of the part due to cracks initiating on its surface [9].
In comparison to the overall part surface roughness, that is particularly difficult to measure during the build process, the local top surface roughness can be estimated for each layer after its processing. Further, the layer-wise surface roughness is a key feature to evaluate the results of laser-material interaction during a build process and provides a useful indication of the part quality [10].
The roughness of LPBF surfaces mainly results from the layer-wise build process using overlapping laser tracks, the applied process parameters, and incomplete melted material [8].
The following parameters have been investigated in literature for different materials and indicate a significant correlation to the surface roughness of LPBF parts: 1) Laser power -- [8], [10] 2) Scan velocity + [10] 3) Build orientation ++ [10]- [14] 4) Layer thickness ++ [11], [13] 5) Hatch distance + [15] 6) Scan strategy + [14], [16]- [18] From the physical point of view, as the laser power increases, the size of the melt pool also increases [19]. During the layer-wise build process, larger melt pools increase the intersection area between different tracks and therefore lead to a smoother surface. However, if the laser power exceeds a certain limit, the increased energy intensity may result in the formation of a high fluctuant keyhole that can introduce additional defects such as subsurface pores or spatters [8]. In addition to increasing surface roughness, the appearance of defects such as pores and lack of fusion can critically reduce the final density of the part [20]- [22].
Moreover, the applied scanning strategy has a significant effect on the microstructure of the component, the mechanical properties, and on the inter-layer surface roughness [14], [16], [17]. For example, DePond et al. [18] reported that the lack of rotation of scan vectors between layers leads to an increase in surface roughness. In addition, Snyder and Thole [16] suggest that roughness is strongly correlated with melt pool volume and shape. The authors have also reported that scan strategies leading to layer-wise similar directional scan vectors correlate with large-scale roughness features. Therefore, it is suggested that knowledge of the influence of scan strategies should be incorporated to effectively control surface roughness.
Furthermore, surface roughness, microhardness, and part density can be improved by using laser remelting strategies [17], [23], [24]. Although remelting increases processing time because the same area is scanned twice, Yu et al. [24] show that roughness can be effectively reduced by about 50 % for AlSi10Mg parts. Remelting strategies do not necessarily have the same process parameters as the main process and therefore bring into play additional variables with respect to process optimization. However, to limit the scope of this study, scan strategy variations and remelting methods are not considered as optimization variables.
Recent studies have also revealed that thermal load induced by the layer-wise processing significantly impact the quality of the final part [25]. The literature suggests that the occurring thermal gradients can generate residual stresses in the component and cause deformations, which increases the susceptibility to crack formation or layer delamination [26], [27]. Thus, process parameter optimization with respect to low surface roughness and defect-free parts, is limited by physical process boundaries leading to gas pores and high thermal gradients at increased volumetric energy density (VED). At low VED balling may occur which eventually results in high surface roughness and lack of fusion defects [22], [26].
Therefore, a trade-off between increased defect probabilities due to physical process limits and further aspects of the part quality (e.g., surface roughness, part density) must be found by the proposed optimization framework.

B. SURFACE ROUGHNESS MEASUREMENT
In order to find optimal process parameters with respect to high part quality, a measurement technology is required that allows to capture key aspects of the part's properties, which in this work is the surface roughness and the percentage of defective surface areas. Conventional roughness measurement methods such as contact-type surface roughness profile measurements and advanced optical methods (e.g., chromatic confocal microscopy, white light interferometry) provide quantitative information in the vertical direction (i.e., height). In contrast to this, 2D imaging techniques only provide information about the texture and in some cases about spectral reflection properties (e.g., color) of the surface.
However, if the 2D imaging system is combined with further image processing that subsequently maps image features to roughness parameters (e.g., Ra, Sa, Rz), these instruments can be used for quantitative roughness measurement as well [28]- [31].
As opposed to the other measurement technologies, a camera-based roughness measurement system that uses surface texture information enables layer-wise and fast measurement times at low system costs.
For AM parts the most often used areal roughness parameter according to [32] is Sa, which represents the average of absolute height deviations within a defined area. The areal Sa parameter corresponds to the roughness profile Ra already used in many industrial applications.
In this work, the roughness is approximated as [12]: where NxM is the number of measured height deviations and f ij states the deviation from the average surface height.
In recent works, different image processing methods were used to establish the relation between image texture and surface roughness [33]. Based on a defined reference surface, the texture of a surface can be interpreted as the structured or random deviation from the reference that defines the topography of the surface. The texture of technical surfaces is the result of nano-and micro roughness, macro roughness (i.e., waviness), defects and lay [34]. VOLUME 9, 2021 The models used to predict the surface roughness can be divided into statistical models (regression, classification) based on machine learning (ML) and analytical models [11] that rely on physical relationships. Analytical approaches play an important role when estimating surface roughness based on process parameters. However, in case the model input is represented by an image of the surface, advanced statistical techniques and ML are favorable methods to obtain high prediction accuracies.
For example, Kamguem et al. [35] demonstrate that different roughness characteristics (i.e., Ra, Rq, Rv, Rt and Rz) show a strong correlation to certain surface image features. Therefore, a ML model was successfully developed that requires no information about the process parameters used. Instead, several image characteristics based on the surface texture of turned surfaces have been identified as important: aggregated image gradients, average grey level and average texture frequency. The correlation between the image features and the associated surface roughness was found to be high (93-98%).
In the field of AM, Akhil et al. [29] investigated the surface characteristics of Ti-Al-4V LPBF parts based on the image texture. Statistical image features based on first and second order statistics were used to describe relevant image properties. The most relevant features were selected by applying neighborhood component analysis (NCA) and subsequently used as input for several ML models to predict the surface roughness. The authors reported that the Gaussian process regression (GPR) model is able to provide an accurate roughness estimation with R 2 greater than 0.9.
Recently, methods that incorporate the process of feature extraction from images into the ML pipeline, have been developed to predict surface roughness characteristics in industrial [36] and infrastructure engineering [37]. These models are based on recent developments in Deep Learning that led to a new class of neural networks known as convolutional neural networks [38], which allow to analyze high dimensional sensor and further process-related data more efficiently.
To quantitatively estimate the concrete surface roughness from high-resolution images in infrastructure engineering, an approach based on convolutional neural networks was recently developed and implemented [37].
For this purpose, a CNN architecture called ResNet50, originally proposed by Microsoft in 2015 was utilized [38]. ResNet50 is a 50-layer CNN that can be used to automatically extract and classify complex visual features from image data. The authors used a model pre-trained on the ImageNet dataset. Finally, the classification section of the network was redesigned and trained to categorize three different concrete roughness classes. Using the pre-trained ResNet50 model, the surface roughness could be predicted with more than 93% accuracy for new images.
A similar approach has recently been used for image-based LPBF material identification and defect detection by Narayanan et al. [39]. The authors compared the performance of established CNN architectures such as AlexNet and ResNet using transfer learning with that of an approach that uses principal component analysis and SVM, as well as with their proposed CNN. It was shown that established architectures can be adopted to the new domain, however, coming with higher computational costs compared to simpler architectures.
Although the recent developments show promising results, further investigations in case of surfaces manufactured via LPBF are required, since the detection of local roughness deviations on highly reflective metal surfaces remains challenging. In addition to our knowledge none of the approaches found in relevant literature provide image texture-based roughness prediction with spatial resolution (i.e., a roughness map), which is a requirement to identify local roughness trends and deviations.

C. EXISTING WORK ON PROCESS OPTIMIZATION AND LIMITATIONS
Process control for LPBF typically requires an in-situ measurement signal as feedback to apply a control action to ultimately adjust the controlled variable as desired. An essential part of the control system is the control algorithm. The proportional-integral-derivative (PID) controller is among the most frequently used techniques used for industrial control systems according to the literature [40]. PID controllers have already been used and investigated in various applications in the field material processing to control parameters such as cladding height, bead geometry or temperature [41]- [43]. More sophisticated techniques for process control such as model predictive control (MPC) [43]- [45], Linear-Quadratic Regulators (LQR) [46] and Iterative Learning Control (ILC) [47], have recently been suggested in the field of industrial manufacturing. However, the proposed control strategies have often been implemented only in the form of single-input single-output (SISO) systems, which do not reflect the full potential of process optimization.
Cao et al. [48] proposed a multi-input-multi-output (MIMO) control concept for the laser metal deposition processes. In this approach, the current layer height and the average melt pool temperature serve as input variables. The laser power and the processing speed are used as manipulated variables. As part of the model predictive control, a nonlinear process model was utilized to adjust the deposition height and melt pool temperature with respect to a target value. The performance of the system was validated in a single layer case study. It was shown that further improvements in terms of robustness as well as generalizability are required for deployment in real industrial environments.
In summary, many of the proposed control techniques require extensive experimentation to determine the parameters of the controller, which are then often only valid as long as the model assumptions made do not change. In this context, solutions are needed that enable robust and cost-effective process monitoring and optimization, while complying with high industrial standards. Current solutions are only suitable in a limited number of applications due to the frequent need for expert intervention and time-consuming fine-tuning and should therefore be further improved in terms of usability, autonomy, and flexibility.
With recent advances in Deep Learning and ML in general, data-driven techniques are increasingly being used for various tasks in the area of 3D printing. In various sub-domains such as part design, quality control, process optimization, cloud platforms services, as well as cyber-attack and weapon detection, ML has already shown its potential in a variety of applications [49]- [51]. In the specific field of process optimization and in-situ quality control, artificial neural networks can be considered as the most widely used ML technique [50]. However, Goh et al. [52] showed that several other methods such as genetic algorithm, k-means, random forest, reinforcement learning, support vector regression and ensemble methods have also been used to address problems related to process optimization and quality control. In addition to these supervised learning techniques, a self-organizing feature map in the sense of unsupervised learning has recently been used to discover process-structure-properties (PSP) relationships in large and high-dimensional AM process datasets [53]. Overall, ML algorithms seem to be generally superior to conventional optimization methods such as polynomial regression, Taguchi or analysis of variance (ANOVA) due to their ability to establish nonlinear relationships between input and output variables [52].
Given this background and recent advances in ML, the development and implementation of self-learning, adaptive, and intelligently behaving controllers appears to be possible. Although not commonly used in the field of AM so far, reinforcement learning (RL) seems to be an attractive method as it utilizes interactions and rewards to reinforce the use of process parameters which lead to high long-term rewards for a specific process. RL can provide solutions to complex optimization tasks while remaining flexible by constantly interacting and exploring the given process [54].
In the domain of industrial machining, a meta-reinforcement learning approach for learning turning machining parameters with respect to energy efficient process control has been developed by Xiao et al. [55].
The authors used an approach based on Meta-Reinforcement Learning (MRL) to determine optimal machining parameters in material processing. The approach allows to identify the similarities of different optimization models which enables a fast adaptation in case of changing task conditions. The continuous parametric optimization task has been addressed by using the Actor-Critic (AC) framework. To improve the generalization abilities of the optimizer, meta-policy training was applied. It was shown that most of the algorithms used for comparison have been outperformed by the proposed approach in terms of optimization performance.
In the domain of laser welding, the authors of [56] used RL to control the laser power based on optical and acoustic measurement signals. Two different RL approaches (i.e., Q-learning and policy gradient algorithms) were investigated. Laboratory experiments have shown that both methods can successfully learn a control strategy for achieving the required weld penetration depth in a reasonable amount of time.
More recently, a RL approach was used as a feedback control scheme in robotic wire arc additive manufacturing [57]. The authors used RL to approximate the nonlinear effect of process parameters on multilayer multi-bead prints and then correct for geometric deviations that occur based on thermal photodiode signals and 2D-profile measurements. Experimental results showed that the proposed learning framework can be used to reduce thermally induced deformations of the parts.
Although it can be seen that these methods are useful in metal-based AM, more research is required to assure process stability, short training time for fast adaption with respect to multiple goals.
Thus, a new approach for layer-wise optimization of LPBF processes with respect to a defined target (reward) function is proposed. Based on HDR images at high spatial resolution, a CNN-model uses image patches derived from the original images to estimate a roughness map for each surface image. Subsequently, the performance of the proposed CNN-model is compared to state of the art CNNs and to a classical ML approach using statistical features as well as texture descriptors in combination with support vector machines (SVM) for roughness estimation. Furthermore, a model-based reinforcement learning (MBRL) approach with high sample efficiency is used to find an optimal control policy faster, compared to model-free RL methods. The approach allows to find and update an optimal policy, which eventually enables the selection of an optimal action (i.e., a set of process parameters) for the next processing layer, depending on the current process state. Additionally, the MBRL method is compared against a model-free Q-learning RL algorithm as baseline. Finally, the evaluation of the overall framework as shown in FIGURE 1, based on CNNs for spatial resolved surface roughness estimation and MBRL, is presented in section IV. The individual modules and the applied methodology will be explained more detailed in section II. In section III, a description of the experimental setup and the applied data preprocessing can be found.

II. METHODOLOGY A. IMAGE-BASED SURFACE ROUGHNESS ESTIMATION
Initially, the roughness estimation module described in FIGURE 1 is engaged in the acquisition of HDR image data of the surface of the current layer. The HDR technique allows to increase the range of luminosity that can be represented within a single image. This is particularly important for scenes that contain high brightness gradients that cannot be adequately represented by the camera's standard dynamic range, which is 71.23 dB for the camera used in this work. Due to the high reflectivity of the investigated metallic surfaces and its strong dependency on the direction of light incidence, some surface areas only lead to a low intensity in the acquired image. Other areas, however, appear highly reflective and show saturation of the affected pixel values. For the metallic as-build surfaces investigated in this work, examples in the form of low dynamic range (LDR) images acquired at different exposure times are shown in FIGURE 2 The HDR images of the specimen's top surfaces are generated by capturing the same scene using different exposure times. Thus, four photos are taken for each HDR image at exposure times ranging from 120.2 ms to 156.2 ms with a step size of 12 ms. It is worth mentioning that all surface pictures are taken and processed as grayscale images. Subsequently, the algorithm of Debevec et al. is used to merge the multiple exposure images into a single HDR image [58,59]. As a result, the HDR image is represented as pixels of type float32 with values rescaled between zero and one. These scaled HDR images are then used as input for the CNN to estimate the local surface roughness.
However, to show the HDR images on conventional displays so that most details are preserved, a linear tone-mapper with bilateral filtering is used [59]. FIGURE 2 also shows that the overall brightness level appears to depend on the average surface roughness, since the relative camera position and the relative position of the illumination to the samples do not vary for all images studied in this work. After image capturing, the images are divided in small image patches and subsequentially used to train a convolutional neural network that assesses the surface roughness based on individual image patches. Although the origin of CNNs lays back in the 1980s, huge attention was recently given where high performance GPU implementation enabled to train complex CNNs with a high number of parameters that outperformed many other methods in the most important image recognition contests [38], [60]. CNNs can not only be used for image data, but they bring certain advantages to these applications, such as translation invariance through weight sharing, and local connectivity that takes the spatial structure of images into account [61].
The architecture used for this work is depicted in FIGURE  3 and consists of three types of layers, which are connected consecutively to create a deep neural network model: Convolutional layer, fully connected layer and pooling layer. In the convolutional layer, small filter kernels convolve over the input array to produce class specific filter responses as layer output. The coefficients of each filter kernel defined in a given convolution layer are determined during the training process by backpropagating the actual error between the network prediction and the given training data. The output of a convolutional layer can be denoted as follows [62]: where X l d is the dth output feature map (image) of the lth convolutional layer. On the right side, the ith output feature map X l−1 i of the previous layer l−1 is convolved with the idth kernel K of the current layer. b l d denotes the offset (bias), and M d represent the input feature maps while f represents the activation function.
The convolutional layer is frequently followed by a pooling layer to reduce the input dimensions for the following layers by down sampling feature maps from the previous layer. Typical types of pooling layers are max pooling and average pooling. The output x l d can be represented by the following equation: where l is the number of the pooling layer, f can be an activation function, δ l d denotes the resample factor and subsample (.) represents the down-sampling function (e.g., mean or max pooling), and b l d is the bias (offset). Pooling, especially max pooling, is a convolution-based operation that is applied to reduce overlapping in feature maps and can help to avoid over fitting and may lead to a more generalized model [63]. During network training, random drop out was used after the pooling layer and in the fully connected region of the network (layer 4) as an additional regularization technique that prevents the network from overfitting.
After concatenating the output of the third layer, the flattened feature vector is used as input for the output layer, using fully connected nodes with softmax activation to classify the image patches into five different surface classes.
To generate spatially resolved roughness maps, overlapping image patches are extracted by a sliding window approach that resamples the original image using a step size of 16 pixels. The process of extracting image patches from the original layer surface image of a specimen is parallelized and optimized to run on GPU. The roughness class probabilities for the output layer are obtained by feeding 1024 patches per batch as input for the trained CNN-model. The calculation of the roughness map for a single LPBF part surface image (i.e., 3661 × 3617 pixels relating to 49,728 image patches) takes approximately 4.5 seconds (TABLE 1 & TABLE 2) on a NVIDIA GeForce 1080 Ti GPU.
To evaluate the performance of the proposed CNNarchitecture for roughness prediction, a benchmarking scheme as proposed by Narayanan et al. [64] is adopted to evaluate F1-Scores as well as training and inference times for different models. For this purpose, state-of-the-art CNN architectures, namely ResNet50 [38] and VGG16 [65], were used for comparison with respect to prediction performance and execution times. First, the pre-trained models have been used in the sense of transfer learning by replacing the original classification heads with a new fully connected head consisting of a hidden layer (i.e. 128 nodes) and the output layer (i.e. 5 nodes). In the case of transfer learning, all other layers were set to be untrainable. In addition, the performance of fully re-trained VGG16 and ResNet50 models with adapted classification head has been investigated.
Based on image characteristics extracted from the individual patches, a support vector machine algorithm [66] with linear kernel, was additionally used for roughness classification representing a classical ML approach. In this regard, four first-order statistical features, namely, ''mean'', ''variance'', ''skewness'' and ''kurtosis'' were extracted to describe a given image patch. Additionally, image features based on second order statistical methods were used in this work. Related to this, Haralick et al. proposed Gray-Level Co-Occurrence Matrix (GLCM) as an effective method for texture characterization [67]. The GLCM describes the histogram of co-occurring gray levels for a given area within a given image. Several features can be extracted from the GLCM for characterizing the texture of the given image patch. In addition to the aforementioned statistical features, the following GLCM characteristics are utilized for surface classification in this study: ''dissimilarity'', ''Angular Second Moment'' (ASM), ''contrast'', ''homogeneity'', ''energy'' and ''correlation'' [29], [67]. It is worth mentioning that GPUs were used for training and inference of the CNN models, while the classical approach was executed on CPUs.
For each CNN training process, the Nesterov-accelerated Adaptive Moment Estimation (Nadam) optimizer was used to minimize the loss function (i.e., categorical cross-entropy). The number of epochs was determined by tracking the training and validation errors. When it was observed that the validation error stagnated or increased, which is a sign of overfitting, the training was stopped. This resulted in 200 epochs for training the proposed CNN and 65 epochs for the reference CNN architectures. The architectures were implemented using TensorFlow 2.3 [68] and Python 3.6. Compared to other deep CNN architectures such as VGG16 or ResNet, a comparatively shallow network VOLUME 9, 2021 architecture is assumed to be more suitable for the given problem as it prevents overfitting and leads to higher computational efficiency and lower memory requirements.

B. MODEL BASED REINFORCEMENT LEARNING FOR PROCESS OPTIMIZATION
Learning to maximize the cumulative feature rewards of an agent through active interaction with a given environment essentially describes the framework of RL [54]. As a branch of machine learning, RL allows an agent to choose an optimal action a t from a pool of candidate actions A in a given state s t by selecting those actions that maximize the rewards the agent is expected to receive. To express the problem within the RL framework it is formulized as a Markov Decision Process (MDP) that shows the following characteristics: 1) An action a t ∈ A the agent can choose to interact with its environment.
2) The agent's current state s t ∈ S and the state s t+1 ∈ S after taking an action a t .
3) The state-transition probabilities p(s t+1 |s t , a t ) that action a t in state s t at the agent's next step will result in state s t+1 . 4) The expected immediate reward r(s t+1 , a t , s t ) received after the state transition (i.e., s t to s t+1 ) based on action a t . The goal of MDP is to approximate a decision policy in form of a function π(s) that defines the optimal action to be selected by the agent based on a given state.
In this work, the agent denotes a software implementation of a RL method that choses LPBF process parameter based on a given state and a learnt policy. For that, the agent's current state s t is defined as a tuple s t = (P t , v t ,Sa mean,t , δ t ) comprised of the applied laser power P t and scan velocity v t as well as mean surface roughness Sa mean,t and percentage of defective areas δ t for a given part surface at time t. The quality metrics Sa mean,t and δ t are estimated by the CNN-model explained in section 2.1. In each step, the agent can choose from a defined set of action described as follows: where each element in the list represents an action tuple that consists of a possible action value for laser power P t+1 and scan velocity v t+1 to be applied in the next layer. Each process parameter can have the value up, down, or none which represent the action of increasing, decreasing, or applying no change at all with respect to the given process parameter.
Different RL algorithms can be used to estimate the optimal policy for choosing an action in a given state. The policy can be learnt with or without having a model that approximates the environment, referring to model-based and model-free RL respectively. Both approaches face their own challenges, but each offers unique advantages. Generally, model-based approaches show higher data efficiency and faster learning compared to model-free methods, which on the contrary can be used for a variety of applications and avoid the incorporation of model errors into the learnt policy [69], [70].
Since model-free learners require a larger number of training examples, their attractiveness for real-world applications such as laser materials processing is reduced. Without a corresponding simulation in which the agent can interact with the modeled environment to derive an effective control strategy, model-free learning appears to be disadvantageous.
In MBRL, the system dynamics are modeled as a function f θ (a t , s t ) to estimate the next state s t+1 . The function is parametrized by θ and can be formulated as a regression task within the scope of supervised learning that maps a given state s t and action a t at time t to predict the subsequent state s t+1 at time t + 1. Based on the dynamics function f θ (a t , s t ), planning can be used to infer the optimal action for a given state using sampling-based planning. A simple method based on random sampling is used to generate C candidate random actions sequences. The sequences are sampled from a fixed distribution and subsequently evaluated based onf θ (a t , s t ), the current state s t and the expected immediate reward r(s t+1 , a t , s t ) formulated as [54]: Random sampling of action sequences avoids solving Eq. 5 numerically, which may lead to instabilities due to non-linear reward and dynamics functions [71].
Since the action space defined for the problem in this work is comparatively small (8 actions), random sampling sequences of consecutive actions (a t , . . . ,a t+T −1 ) can be a feasible way to solve the optimal control problem. Based on equation (4), the action sequence that promises the highest expected cumulative reward with respect to the predicted future states, T horizon time steps ahead is then selected. Finally, by utilizing the framework of Model Predictive Control (MPC) [72], the first action a t of the optimized action sequence is chosen for execution. As the agent transits to a new state s t+1 , the MPC based policy recalculates the optimal action sequence and selects the first action for the next time step. The steps for model-based RL is denoted as Algorithm 1 based on the work of Nagabandi et al. [69].
As reference method a model-free RL approach based on Q-Learning is used for comparison. The algorithm is based on a bellman equation as a simple value iteration update and combining the old and new value using weighted average [54]: +α * (r t + γ max a Q (s t+1 , a)) (6) where α represents the weight formally known as the learning rate, r t denotes the expected immediate reward and γ max a Q (s t+1 , a) describes the maximum reward that can Algorithm 1 Model (Random Forest) Based Reinforcement Learning 1: Gather dataset D ini of experimental data (e.g., random policy) 2: Initialization of empty dataset D real and Random Forest with n estimators 3: for layer = 1 to layer_max do: 4: Trainf θ (a t , s t ) based on random forest algorithm [66] and datasets D ini and D real 5: for t in T horizon do: 6: Obtain the current state defined as s t = (P t , v t , Sa mean,t , δ t ) 7: Estimate optimal action sequence (a t , . . . ,a t+T −1 ) usingf θ (a t , s t ) and (Eq. 5) 8: Apply first action a t from estimated optimal action sequence 9: Add data point (a t−1 , s t−1 ), s t to dataset D real 10: end for 11: end for be obtained from state s t+1 based on all available actions a, discounted by γ . The algorithm was implemented in Python 3.6 and further hyperparameters were determined to γ = 0.9 and α = 0.3. For both algorithms' epsilon-greedy (ε = 0.1) action selection was used during training

C. REWARD AND ENVIRONMENT SPECIFICATION
The reward function has major impact on the success of the agent to learn and is defined in a way that enforces the agent to take actions that result in a high cumulative reward and avoid actions, which lead to low or negative reward.
The agent used in this work receives a negative numeric reward for the current state s t = (P t , v t ,Sa mean,t , δ t ) if the percentage of surface defects δ t detected by the CNN in the current surface image is higher than 10 %. At this point, a human expert would have to decide whether the build process should be stopped and modified or aborted. From the agent's perspective, the optimization episode terminates with a negative reward of -10,000 for the recent action applied.
A positive reward of 2,000 is given for a predicted roughness value Sa mean,t below 4 µm in combination with defective surface areas smaller than 10 %. At this point, the optimization episode is terminated, as the current state is considered optimal. In addition, a continuous reward function is formulated as r Sa (a t , s t ): With r Sa (a t , s t ) = w Sa 1 Sa mean,t and Sa mean,t =f θ (a t , s t−1 ) were Sa mean,t represents the continuous mean surface roughness of the part and w Sa (w Sa = 150 this work) denotes an experimentally determined weight factor to adjust the overall reward behavior.
To generate continuous roughness values Sa mean,t based on the predicted roughness classes, the following empirical formula is defined: where p i denotes the relative number of pixels belonging to class i within a given image and c i states an experimentally derived scaling factor for each class. The reward function above provides rewards exponentially to decreasing surface roughness values, which are normalized between 0 and 1. This additionally encourages the agent to find optimal actions that allow transitions to a state with particularly low surface roughness.
In our setting the agent interacts with a simulated environment based on data from the experiments. Based on an action the agent takes in a certain state, the simulation provides the agent a new state considering the next possible parameter change available within the experimental dataset.
In this work, the upper limit for laser power values is 450 W, the lower limit is 150 W with a step size of 75 W. The lower limit for scan velocity is 667 mm/s, the upper limit is 2000 mm/s. Possible steps for scan velocity are 667, 800, 1000, 1333, 2000 mm/s. In case a transition is not possible due to an invalid action selection (i.e., parameter is out of range), the state will not change, and the agent eventually would have to learn an alternative action to reach the maximum reward. Although the described setup allows building parts using 25 combinations of different laser powers and scan velocities, the experimental data in this preliminary study contain only nine process parameter combinations. However, for each combination a part was built with two different heights and in at least four different positions on the build platform. If more than one data point is available for a new parameter state requested as a result of an action taken (e.g., parts with P l = 300, v s = 1000mm/s were built at four different positions on the build platform for two different layer heights), the simulation randomly chooses a part that fulfills the requested state-action transition. Consequently, based on the part's surface image evaluation via CNN-model, it provides the full state s t =(P t , v t ,Sa mean,t , δ t ) to the agent. Consistent with the high variances in the estimated quality metrics as reported in FIGURE 10, the agent must cope with uncertainty in state-action transitions due to the influence of the part position and layer height within the build chamber and other unknown effects on surface roughness.
It is worth mentioning that the current environment implementation assumes that a parameter change is independent of the surface texture in the previous layer. This might be true for small changes of process parameters but it is not necessarily true in case of strong surface distortions and defects which would require several layers to heal or to not a heal at all.

D. RANDOM FOREST FOR DYNAMICS FUNCTION APPROXIMATION
While neural networks [69] and Gaussian Process Regression [57] were recently used for data-driven modeling of process dynamics, in this work an ensemble technique is proposed to approximate the dynamics functionf θ . Random Forest Regression is chosen to establish a dynamics model that predicts the next state s t+1 based on the current state s t =(P t , v t ,Sa mean,t , δ t ) and a given action a t . The values for the next process parameters P t+1 and v t+1 can directly be calculated by the action given. For the remaining state members, two different RF regression models were trained separately, one to predict the surface roughness Sa mean,t+1 and the other to estimate the percentage of surface defects δ t+1 . Starting with FIGURE 4, a brief introduction of random forests is given. A random forest (RF), introduced by Breiman et al., can be considered as a combination of a specific number k, typically a few hundred to several thousand of decision trees while each tree represents a single non-parametric regression model. The final regression decision is made by majority votes based on the participating decision trees. RF uses bootstrap aggregating (bagging) and a random feature subspace for each decision tree to increase the level of randomness [73]. This procedure can be interpreted in such way that each tree in the RF focuses on a slightly different aspect of the data set, leading to different expert models each contributing a vote for the final prediction.
Based on this strategy it turns out that RF is a very competitive alternative to many other regressors, including linear regression, support vector regression and neural networks [74]. The advantage of combining randomized decision trees and bagging to a random forest is a potentially more robust model, which achieves a better generalization error [75]. In summary, the main advantages of RFs for dynamic functions are sound resistance against overfitting, insensitiveness to prior feature selection and less outlier sensitivity during training [76].
The number of trees k (i.e., 100 in this work) of which the forest consists is an important parameter for the final regression performance. Additionally, mean squared error is used as a metric that measures the quality of a split during the generation of a decision tree. An efficient RF implementation based on scikit-learn was used in this work [66].
The dynamics function can be learnt using data from previous experiments in the form of tuples D ini = {(a t,ini , s t,ini ),s t+1,ini } N t=1 that contain information about the state, action and resulting state relationship in form of an initial data set D ini . Additionally, as the agent gathers new experience while interacting with the real environment, the new data D real = {(a t,real , s t,real ),s t+1,real } N t=1 can be used to retrain the dynamics function based on the aggregated data set D ini + D experience to increase the model accuracy and continuously adapt it towards real word (changing) process dynamics.

III. EXPERIMENTAL SETUP AND DATA PREPROCESSING
According to experimental investigations mentioned in section I, the surface roughness significantly depends on the VED, a measure for the energy input related to a specified material volume. The energy density can be calculated by the following formula [20]: where VED stands for volumetric energy density [J/mm 3 ], P l for laser power [W], y s (90 µm = const.) for distance between two neighboring laser scanning paths [mm], v s for scan velocity [mm/s] and D s (40 µm = const.) for layer thickness [mm]. The variation of the energy density in this work is realized by adjusting the scan velocity and laser power. As a reference, laser power and laser scanning speed are used as 100 % at 300 W and 1000 mm/s respectively, resulting in a VED of 75 J/mm 3 . For the production of different cubic samples, the laser power and laser scanning speed vary, so that the relative VED changes from 50 % to 150 % by 25 % each step. Based on this, 44 cubic specimens (10 × 10 × 10.96 mm 3 ) of identical height (274 layers) were built on the first platform using nine different sets of process parameters in at least four different and randomly selected positions, as shown in FIGURE 5. The parts on the second platform were built using the same process parameters and positions on the platform, however the build job was stopped in layer 121, resulting in additional 44 cubic samples of size 10×10×4.82 mm 3 . Overall, 88 cubic samples with the same surface area size were built, representing 18 unique parameter combinations. As shown in FIGURE 10, the experimentally determined LPBF process window for this setup is narrower (i.e., 75+/−10 J/mm3) than the range covered by the combinations (i.e., 35 to 112.5 J/mm3). However, a larger range was chosen so that the optimization algorithm can also explore 55222 VOLUME 9, 2021 critical process situations that can lead to surface deviations and low part density. The material chosen for all samples built for this study is Inconel 718 and a meander-based scan strategy with layer-wise scan vector rotation of 90 degree was applied.
After processing the parts, HDR images were generated as explained in section II-A. For this preliminary study, the decision was made to take the images after removing the parts from the build chamber to investigate the influence of high-resolution images that are difficult to obtain from a setup within the build chamber. The imaging system is comprised by an industrial camera (LUCID vision TRI200S 20 MP -mono) with a pixel size of 2.74 µm and a 105 mm optical lens. After calibration of the lens, the field of view is approximately 12.34 mm × 8.26 mm with a resolution of approximately 2.26 µm/Pixel. The field of depth is determined to +/−1.1 mm.
In addition to the HDR image patches, a measurement of the local top surface roughness of the LPBF samples is required as input information for training the CNN. Instead of using the common line-related Ra or Rz values, the surface roughness is characterized in this work by area-related mean arithmetic height Sa (Eq. 1).
For roughness measurement, a white light interferometer (WIM) namely Zygo Newview 7300 was used. The component surface was enlarged ten times which results in a resolution of 1.14 µm and measurement area is 0.73 × 0.54 mm 2 . The roughness measurement was carried out at 16 reference points according to the definition in FIGURE 5. For the class ''surface distortion'', an expert carefully analyzed the LPBF parts together with the related surface image data. In case of visible surface distortions in form of height deviations greater than 1 mm, the corresponding surface areas were annotated accordingly. To create a database for the CNN model to be trained, each surface image from the total of 88 LPBF cube surfaces is first divided into sixteen sub-regions with the size of 1.7×1.7 mm 2 . The center point of each sub-region is the same as it is for the WIM measurement as depicted in FIGURE 5.
Subsequently, small image patches (96 × 96 pixels) are extracted from the sub-regions using a sliding window To prevent the model from learning only features that are highly directional, data augmentation such as mirroring, and 90-degree rotation is used. The CNN for roughness classification is trained on 130,048 annotated HDR image patches extracted from data of 67 specimen layer surface images. The evaluation of the CNN-model is based on 4-fold cross validation and the final evaluation of the overall framework is only performed on data from the remaining 21 parts. In a last step, the density of the manufactured parts is evaluated. For this purpose, the parts are separated from the building platform by means of wire erosion. For evaluation of the relative density, the samples are cut in half for preparation of polished cross sections. For each cross section five microscopic images with an enlargement factor of 50 are acquired and analyzed in view of porosity by applying threshold analysis, that yields a mean value of the samples' relative density.

A. ROUGHNESS PREDICTION
The following section first shows the model's performance assessed during the training procedure. Subsequently, the roughness prediction model is validated based on unseen LPBF surfaces. In FIGURE 7, the accuracy of the proposed CNN-model during training of 200 epochs is depicted. The graphs on the left side show the 4-fold cross-validated training accuracy for different input patch sizes. As expected, the accuracy increases with increasing input patch size, which maximizes at high overall classification accuracy (>90 %) for VOLUME 9, 2021 input patches at size 96 × 96. Although the patch size could be further increased, memory consumption reaches hardware limit (without further optimization), while the performance benefit is marginal. When using the trained CNN-model in real-world scenarios, large input patches would also require larger surface areas with constant roughness for correct classification, which does not meet real-world requirements. Therefore, the input size is set to 96 × 96 pixel in this work. The same graph shows the training performance based on HDR image patches (purple line) and LDR patches (brown line) for input. The HDR version finally outperforms the LDR version by more than 12 %.
The graphs on the right part of FIGURE 7 show the dependency of the image resolution on model accuracy. For each graph, the original input patches of size 96 × 96 pixels and 2.26 µm/pixel are downsampled by the given percentage factor and subsequently upsampled to meet the required CNN target input shape of 96 × 96 pixels. The combination of down and upsampling approximates the influence of optical and digital image resolution on model accuracy. When downsampling the bilinear sampling filter kernel is scaled to properly anti-alias the input image signal [68]. The results show that image downsampling to 20 % (approx. 11.32 µm/pixel) of the original resolution is possible while the classification performance is still moderate (>78 %). This supports the thesis that low frequency image information (i.e., the surface waviness due to scan vector overlapping) is important for surface roughness classification. According to FIGURE 7, a spatial image resolution of at least 5.66 µm/pixel, representing 40 % of the original resolution, is recommended for high roughness classification accuracies (>80 %).
For different input image types (i.e., HDR and LDR), the comparison of the proposed CNN-based surface classification with the reference SVM approach is described in TABLE 1. HDR imaging shows its benefits for both roughness estimation methods. Moreover, the proposed CNN architecture clearly outperforms the classical ML approach that uses statistical and texture feature extraction in combination with support vector machines for classification.
When the proposed CNN is compared with established CNN models, as depicted in TABLE 2, it is shown that the highest F1 score of 0.94 can be obtained with a fully re-trained VGG16 with an adapted classification head. The performance of the VGG16 and ResNet50 models using weight transfer is also shown in TABLE 2. It is assumed that the comparatively lower prediction performance is due to large differences in the shape of the images on which the models were originally trained (i.e., 224 × 224 ×3 pixels) and the image patches used in this work (i.e., 96 × 96 pixels). Additionally, the domain gap between the original dataset (i.e., ImageNet) and the dataset used in this work could play an important role. Furthermore, due to their higher complexity, the established models require more time for training and inference. In particular, the high inference time of the VGG16 of 25.1 s per surface image is too high for real-time inter-layer process optimization. Since powder recoating is required after each layer is processed, the time to move the slider (i.e., recoating time) can be used to derive quality metrics and calculate new process parameters.
For the LPBF machine used in this study, the recoating time is determined to 5.3 s. In this comparison, it is shown that the proposed CNN allows the estimation of surface roughness within 5.3 seconds and is therefore suitable with respect to the time constraints in this application. As depicted in FIGURE 8 the cumulative confusion matrix for the proposed CNN based on four different cross-validation test folds is shown for surface classes 3, 4 and 5. All presented methods have difficulties to distinguish class 1 and class 2 from each other, which also appear to be visually similar according to sample patches in FIGURE 6.
Moreover, since the more complex reference CNN models have higher F1-Scores, this may indicate that the proposed CNN slightly underfits the data. However, due to further timing constraints (e.g., recoating time between layers), possible underfitting caused by reduced model complexity is tolerated to achieve low inference times. Nevertheless, an overall accuracy of 91 % and a comparatively fast inference time of 4.5 s appear promising for the proposed CNN-based roughness prediction model, which is therefore used for further analysis.
After training the neural network the obtained predictive model is used to classify roughness patches extracted from unseen part surface images into the defined categories to generate a surface roughness map. Examples for the surface classification results are depicted in FIGURE 9. Sample number one was build using 375 W laser power and a scan velocity of 2000 mm/s. The positions where the roughness was determined via WIM are shown as circles in the original surface image. Due to high directional dependencies, the surface is divided in four equal-sized segments and the mean roughness measurements are shown for each segment. Right to the original image, the roughness map derived from the CNN predictions is shown. The CNN-model classifies most (>97 %) of the surface into category four, which represents roughness values within a range of 12 to 20 µm. Although this potentially is a large range for roughness classification, the result is consistent with the roughness values measured by WIM. Regarding surface texture, the original image and the predicted roughness map show high texture homogeneity for the given surface, which also matches the WIM measure-ment results for each segment. Left to the original image of sample one, a low pass filtered false color version of the original image is shown. As high image frequencies (i.e., sharp edges and corners due to pixel values that are changing fast over space) are suppressed, the surface waviness due to the meander-based scan strategy (i.e., horizontal track overlapping) becomes visible.
In case the same filtering procedure is applied to the second sample, the scan vector related waviness appears more clearly, especially at the horizontal edges of the cube's surface. Compared to the first part, sample two was built at increased VED (93.75 J/mm 3 ), which leads to a completely different image texture with less overall surface texture homogeneity. The predicted roughness values for segment 2 and segment 3 (center region) are categorized into class 1, which matches the measured roughness values for those sections. For segment 1 and 4, the border region, most pixels are categorized into class 2, which represents roughness values of 3-4.99 µm (Sa).
The WIM measurement for these sections confirms the prediction and shows increased roughness values for the border regions. In the right bottom corner of the surface image, a surface distortion leads to the partially categorization of pixels in section 4 into class 5. The surface distortion can be described as a strong deviation in height (i.e., −/+1-5 mm) from the normal surface and leads to a blurring effect due to a limited depth of focus that is also visible in the original image. In the low pass filtered image of part five, the defective area shows less waviness and higher pixel values, which is highlighted by a higher proportion of low image frequencies in this area. It is assumed that a positive correlation exists between the frequency of occurrence of this type of surface distortion and increasing volumetric energy density.
When evaluating sample three, which was built with increased VED (i.e., 112.5 J/mm 3 ), strong deviations in surface height can be obtained by visual inspection that are also visible in the original HDR surface image. The blurred surface areas lead to a categorization by the CNN-model into class 5 for most of the border regions of the image.
Due to the high surface deviations, roughness measurements could only be performed in the center region. In this case, the roughness values were obtained at four different positions. The results range between 9.82 -12.02 µm (Sa), which is only partially consistent with the prediction by the CNN. Instead, the roughness prediction for these positions results in an increased number of pixels belonging to the classes 2, 3, and 4. Compared to the other parts, the low pass filtered image reveals a high degree of inhomogeneity, which corresponds to the occurring surface deviations.
In addition to the evaluation of single samples, the overall results for all parts are shown in FIGURE 10. The measured and predicted relative amount of defective surface areas of samples within different groups of volumetric energy densities is depicted in FIGURE 10 -a). Compared to the ground truth that is based on the defect annotations of a human expert, the prediction results for each VED group are in good align-VOLUME 9, 2021 ment. Both graphs show increased probabilities for surface defects at high VEDs. At the same time, no clear correlation can be established between the predicted amount of defective surface areas within a given surface image and the measured overall part densities.
In FIGURE 10 -b), the relationship between the surface roughness, the part density and the VED is shown.
For easier comparison, continuous roughness values are obtained from the predicted roughness classes by using Eq. 9.
It is shown that measured roughness values correspond to the predicted values, which, however, reveal higher standard deviations. The measured part density shows a negative correlation with the surface roughness especially for parts with lower VED. This is also supported by FIGURE 10 -c), which reveals the surface roughness over part density. It is shown that low surface roughness correlates with high part density. According to this figure, if the part density were to be above 99.75%, the surface roughness would have to be kept at 7.5 µm. It must be mentioned that both quality indicators (i.e., roughness and part density) are strongly influenced by VED. Overall, it can be seen from the experiments that high surface roughness values indicate decreased part density, which needs to be avoided due to decreasing mechanical properties. The VED can be increased up to a certain level, whereby surface roughness decreases, and part density remains at a high level. However, if the VED exceeds a threshold value, in this case 93.75 J/mm 3 , the probability of surface defects due to deformations caused by overheating effects increases rapidly. Furthermore, the maximum VED is particularly dependent on the material used and the component geometry.

B. PROCESS OPTIMIZATION
Using MBRL, the predicted surface roughness and the predicted amount of defective surface areas are leveraged to optimize part quality. For that, the MBRL agent is trained based on Algorithm 1. The dataset D ini for initializing the dynamics function is based on ten different surfaces from parts build in previous experiments. In a next step, 50 % of the experimental data was used to train the agent to choose optimal actions (i.e., process parameter for the next layer). FIGURE 11 shows the results of the training process where each step represents an interaction of the agents with the environment. Training was stopped after 1000 steps since the MBRL algorithm showed significant convergence as the moving average of 100 steps, which was chosen due to high variance in the reward signal, stopped increasing. The reward received by the MBRL agent starts at a higher value and converges to a higher total reward faster than the reference implementation (Q-learning). Since the dynamics function is initialized with data from previous experiments, the MBRL agent is able to choose actions better than by random choice at the beginning of the training process. After each step, the dynamics function is updated with new data generated through interaction with the environment, which further improves the agent's performance. Opposite to that, the Q-learning algorithm starts with randomly choosing actions at the beginning and updates its policy after each step.
It should be mentioned that the Q-learning agent could also be initialized with data from previous experiments. However, the effort to build a suitable environment for the agent to interact with, including the necessary rewards, makes Q-learning approach less efficient than MBRL. Creating an initial dataset for training the dynamics function is comparatively inexpensive and comes with the additional benefit not being biased towards the used reward function. Finally, both agents can improve the surface roughness and reduce the number of defective areas, however, the MBRL approach reaches the defined goal significantly faster as shown in FIGURE 11.
After the training procedure, the MBRL approach is evaluated on unknown top surfaces of LPBF cubic samples. The environment starts with a randomly selected surface image from a list of 21 LPBF surfaces that shows either a measured  mean surface roughness value above 4.0 µm or is labeled as ''surface distortion''. After the mean surface roughness Sa mean,t and the percentage of defective surface areas δ t are predicted by the CNN, the agent must choose an optimal action or action sequence to achieve the optimization goal. An example for an optimizations sequence is given in IV-C. The trained MBRL agent starts with a state representing low laser power (150 W), medium scan velocity (1000 mm/s) and a measured mean surface roughness of 13.04 µm. After assessing the current state s t = (P t , v t ,Sa mean,t , δ t ), the agent choses the action with highest expected rewards, estimated by using the dynamics function (Eq. 5) in combination with the reward function (Eq. 7).
The MBRL approach is able to reduce the average surface roughness during two optimization steps to 2.42 µm by increasing the laser power to 300 W and decreasing the scan velocity to 667 mm/s. The roughness prediction suggests slightly higher roughness values in both cases. The predicted amount of defective surface areas increases from 0.06 % to 1.5 %, which is probably due to the uncertainty of the CNN-model regarding the prediction at the surface edge. In both optimization steps, the optimal action is the one that maximally increases the VED (i.e., (P down , v up ) which agrees with the relationships depicted in FIGURE 10.
An overall evaluation of the 21 LPBF surfaces optimized by the approach presented in this work is given in TABLE 3. The mean surface roughness Sa of components was 10.40 +/− 5.44 µm.
Five out of 21 components are labeled as ''surface distortion''. Each surface was set as starting point for an individual optimization sequence based on the combination of roughness evaluation through CNN-model and parameter optimization using the trained MBRL agent. It is shown that the mean surface roughness Sa mean,t can be reduced to an average value of 3.38 µm. The percentage of defective surface areas can be lowered to an average of 3.2 %. Moreover, it takes 1.97 optimization steps on average to reach the optimum state in this setup.

C. DISCUSSION
Since the raw HDR image of each surface must be interpreted in terms of the quality metrics to be optimized, the performance of CNN-based surface classification is of particular importance. Although the CNN architecture can be used for regression tasks and would therefore allow to predict the roughness values directly, the roughness values were first binned into specific roughness classes. There are two reasons for this. First, during development and testing it was shown that the regression models turned out to be unreliable compared to the classification models. This might be due to uncertainty in the roughness measurements and lack of precise alignment of the individual patches and the measurement position. The second reason is the integration of the class ''surface distortion'' which does not represent certain roughness values that would be required for building a regression model. Therefore, in this work, the roughness estimation problem was formulated as a classification task, which allows a rather coarse categorization of the investigated surfaces. FIGURE 10 shows that the overall measurement trends are reproduced by the image-based roughness estimation. However, the increased standard deviations of the predicted variables show significant uncertainty in classification per-formance, which could be further improved by incorporating more training data from different layers. Nevertheless, the predicted roughness maps can be used by the MBRL algorithm to improve the previously mentioned surface quality metrics.
The dynamics function, which estimates the future state based on the current state and a given action, plays a key role. Although a Random Forest regressor is finally used in this work, different function approximators such as multi-layer perceptron, Gaussian Process regression and Support Vector Regression were tried during the development. Based on the results it is assumed that RF is especially suited to model the process dynamics due to the increased robustness based on the individual decision trees and the ability to handle nonlinear parameters efficiently. The output of the RF regressor is used together with the reward function to select the most promising action sequences. An important parameter for estimating the expected rewards is the prediction horizon T horizon . Experiments have shown that a planning horizon longer than two steps leads to decreased optimization performances. It is assumed that the uncertainty introduced by RF-based state prediction, combined with the uncertainty in roughness estimation via CNN, leads to unreliable estimates of expected future rewards that increase substantially with the length of the prediction horizon (i.e., propagation of uncertainty).
In this work, the agent's action space is restricted to seven actions and the state space is indirectly restricted as it depends on the experimental database. These restrictions might help to achieve faster convergence during training since fewer combinations from the state-action space need to be visited and learned. However, in real world scenarios, such restrictions could also be useful at the beginning of a LPBF build process when training data is rare, and a rough estimation of optimal process parameters would already be advantageous. As the amount of data increases, the state and action spaces should be dynamically increased to refine the outcome of the optimizer.
By optimizing the surface roughness, the proposed framework indirectly optimizes the component density as high surface roughness correlates with low component density and low VED according to FIGURE 10. This relationship makes sense from a physical point of view, since a low VED leads to insufficient fusion between the current and the previous component layer. The resulting inter-layer lack of fusion can lead to reduced component density due to resulting gas or material inclusions. In addition, increased fluctuations within the laser-powder interaction zone, which increasingly occurs at low VED, can simultaneously lead to high surface roughness.
Regarding the generalizability of the approach, the authors assume that the proposed technique can be applied to many other 3D printing devices and materials where layer-wise surface observation by optical imaging is possible. In some applications, the execution times could be more critical and therefore further optimization of the inference time should be considered. It is also assumed that the CNN model for roughness estimation and the RL agent, which is responsible for optimizing the process parameters, need to be updated based on new process data.

V. CONCLUSION
In the presented work, a new approach based on HDR imaging combined with CNNs and model-based RL for inter-layer quality optimization of LPBF processes is proposed. The targeted quality metrics in this work are the current layer's surface roughness as well as the percentage of distortion on the layer's surface. Although this is a preliminary study, the intermediate results indicate that the framework has the potential to be successfully applied in industrial LPBF processes. The following experimental results are encouraging to continue and improve the demonstrated concept: 1) Surface roughness classification based on optical imaging and deep neural networks outperforms a classical ML approach using statistical texture features under the same image resolution and dynamic range conditions by more than 20 % in F1-Score. 2) HDR imaging increases the classification accuracy by more than 12 % compared to its LDR counterpart. Experiments indicate that an image resolution of at least 5.66 µm/pixel is required for roughness classification accuracies greater than 80 %. 3) Based on measured surface roughness data, the negative correlation between surface roughness and volumetric energy density could be reproduced using the image-based roughness predictions. 4) Moreover, the experimental evaluation also supports the assumption that a low surface roughness correlates with high component densities. 5) For 21 unknown LPBF surfaces, the proposed MBRL approach finds optimal process parameters resulting in high rewards and a low surface roughness of 3.38 µm (average for 21 parts) obtained faster than with the Q-Learning reference implementation. At the same time, the MBRL effectively avoids actions that would result in a high percentage (>10 %) of predicted defective surfaces and thus be penalized by negative rewards.
While the outcome of this work appears promising, future work should address the real-time implementation of the proposed framework that enables the quality assessment and process optimization during the build of more complex components. To improve in-situ quality assessment, additional sensors, such as high-speed coaxial pyrometers or thermographic images, can be integrated to provide the RL algorithm with additional information for deriving optimal control decisions at each layer. In addition, further improvements to the framework can be expected if viable physical simulations are provided as training environments. With the help of transfer learning, the knowledge from physical simulation can be used to create high-performance optimization strategies that require less experimental data.
LUKAS MASSELING received the M.Sc. degree in mechanical engineering from Ruhr-Universität Bochum, Germany, in 2015. He is currently pursuing the Ph.D. degree in mechanical engineering with the Fraunhofer Institute for Laser Technology ILT, Aachen, Germany. From 2015 to 2021, he was a Research Assistant with the Fraunhofer Institute for Laser Technology ILT. Since 2019, he has been the Co-Founder and the CTO of Aixway 3D, Aachen. His research interest includes the development for LPBF, in particular the development of the µ-LPBF process for the production of particularly small and high-resolution components.
EMIL DUONG received the master's degree in physics from RWTH Aachen University, Germany, in 2016. He is currently pursuing the Ph.D. degree in mechanical engineering with the RWTH Aachen Chair for Laser Technology LLT. Since 2016, he has been a Research Assistant with the Fraunhofer Institute for Laser Technology ILT, Aachen. His research interests include the development of process monitoring systems for laser powder bed fusion process to detect defects using a sensor fusion approach and the development of inline-topography sensors based on the principle of conoscopic holography.
PETER ABELS studied mechanical engineering at the University of Applied Sciences, Aachen. In 1986, he started as a Researcher with the Fraunhofer Institute for Laser Technology ILT, where he is currently the Leader of the Process Control and System Technology Group. He worked initially with the Laser Development Department, later on he changed to the Department of Laser Material Processing, especially in the section of cutting with CO 2 lasers. He co-founded the initiative for process monitoring and control in the Fraunhofer ILT. He has more than 20 years' experience in this field and is the Leader of the Interdisciplinary Research Group. His group is active in industrial contract research and in national and international public funded projects.
ARNOLD GILLNER studied physics at the University of Darmstadt. He received the Ph.D. degree in mechanical engineering from RWTH Aachen, in 1994.
Since 1985, he has been working as the Scientist of the Fraunhofer-Institute for Laser Technology. Starting in 1992, he developed the Department for Micro Technology, ILT, where he has been the Head of the Department of Ablation and Joining, since 2010. Together with more than 55 scientists, he is developing industrial laser processes for macro and micro joining, packaging, micro and nano structuring, polymer applications, and life science applications. He is the Head of the Board of the Aachen Competence Center for Medical Technology and the Head of the Advisory Board of LifeTec Aachen Jü lich. He has published more than 150 articles and book chapters in scientific journals and other scientific contributions and holds more than 20 patents on laser processes. VOLUME 9, 2021