Object Shape Error Response using Bayesian 3D Convolutional Neural Networks for Assembly Systems with Compliant Parts O

1 Abstract —The paper proposes a novel Object Shape Error Response (OSER) approach to estimate the dimensional and geometric variation of assembled products and then, relate, these to process parameters, which can be interpreted as root causes (RC) of the object shape defects. The OSER approach leverages Bayesian 3D-Convolutional Neural Networks integrated with Computer-Aided Engineering (CAE) simulations for RC isolation. Compared with the existing methods, the proposed approach (i) addresses a novel problem of applying deep learning for object shape error identification instead of object detection; (ii) overcomes fundamental performance limitations of current linear approaches for Root Cause Analysis (RCA) of assembly systems that cannot be used on point cloud data; and, (iii) provides capabilities for unsolved challenges such as ill-conditioning, fault-multiplicity, RC prediction with uncertainty quantification and learning at design phase when no measurement data is available. Comprehensive benchmarking with existing machine learning models demonstrates superior performance with R 2 =0.98 and MAE=0.05 mm , thus improving RCA capabilities by 29%.

I. INTRODUCTION BJECT shape errors modelling and diagnosis are important enablers of Industry 4.0 and provide a transformative framework integrating facilitators such as big data, in-line 3D scanners, robotics and AI algorithms towards achieving near-zero-defect manufacturing. In this paper, the proposed 3D object shape error response (OSER) approach translates into estimating and discriminating between shape error patterns and linking them to manufacturing process parameters. Estimating at first and then reducing or eliminating these error patterns ensures dimensional product quality (as defined by GD&T) which is a major challenge for industries such as automotive, aerospace and shipbuilding. Two-thirds of the quality issues in the automotive and aerospace sectors are caused by dimensional variations [1]. The key goal is developing an RCA model that can identify the relationship between shape errors and manufacturing process parameters.
Past methods used to diagnose manufacturing dimensional quality faults are based on: (i) statistical estimation; and, (ii) pattern matching based approaches. These approaches have Manuscript received July 7, 2020; revised September 16, 2020; accepted November 24, 2020. This study was supported by the UK EPSRC project EP/K019368/1: "Self-Resilient Reconfigurable Assembly Systems with Inprocess Quality Improvement", the UKRI open access block grant and the WMG-IIT scholarship. (Corresponding author: Sumit Sinha) been shown to have limitations in their applicability to complex, high dimensional and nonlinear systems [2] as these used linear models between process parameters and measurements of product dimensional quality for both systems with rigid [3] and complaint parts [4]. Ceglarek et al [5] used CAD-based variation patterns and a fault matching technique which combined principal component analysis and pattern similarity for fault diagnosis. This work was later extended to include the effect of measurement noise and then generalized for multistage assembly process using state-space model, stream-of-variation [6]. Jin et al. [7] used a Bayesian network approach for estimating fixture faults using all measured points. Bastani et al. [8] used a spatially correlated Bayesian Learning algorithm for an underdetermined system by exploiting the spatial correlation of dimensional variation from various error sources. In summary, the aforementioned approaches are linear and are designed to work for relatively small number of measurement points on each manufactured part. This significantly limits the application of the methods for 3D object shape error modelling and diagnosis in manufacturing. The 3D shape error modelling and diagnosis used in manufacturing must have the capability to satisfy a number of requirements with respect to: (i) High data dimensionality of a batch of 3D objects [9] which are defined by CAD (ideal parts) and point-clouds (nonideal parts) with millions of points for each part or subassembly.
(ii) Non-linearity due to compliant parts being constrained by assembly fixtures and part-to-part interactions [10].
(iii) Collinearities as many manufacturing systems are illconditioned [11] with error patterns of key process parameters being near parallel, thus, yielding widely discrepant results.
(iv) High faults multiplicity [12] as current near-zero-defects strategies require taking into consideration 6-sigma defects that lead to redefining defects from binary {0,1}, i.e., fault/no-fault, to continuous <0,1>, i.e., the fault being measured as a level of variation with dynamically changing threshold of acceptance, that significantly increases fault multiplicity.
(v) Uncertainty quantification in the RCA output, as the identified RC frequently leads to costly corrective actions [13], it is crucial therefore to enhance the RCA model by an uncertainty estimation of the predictions. Sumit  (vi) Dual data generation capability by using metrology gages and multi-physics-simulator needed for RCA model training. As the RCA model needs to be trained on a very large number of fault scenarios, which cannot be generated via real systems; and the training needs to be done before the real assembly systems are ready for production; there is a strong need to generate data via high fidelity multi-physics simulator for training RCA model. Then, the RCA model will use pointcloud data of real free-form surfaces obtained via robotic 3D scanner when implemented in a real system. This paper will address the above requirements as follows: (1) Requirements (i)-(iv) by developing a 3D deep learning approach. As markets get competitive in terms of product quality, production volume and costs, manufacturers aim to leverage developments in the field of artificial intelligence. Deep neural networks have revolutionized data-intensive tasks that involve generating insights from high dimensional input data [21]. 2D/3D convolutional neural networks (CNN) are known to perform well when spatial data such as depth images, point-clouds, mesh, and medical scans have to be analysed for tasks such as control systems, object detection, video analysis and cancer detection. Manufacturing is one of the major domains that has benefited from this development [16]. This paper proposes a 3D CNN architecture that enables the extraction of spatial features from point clouds and hence, models non-linear relationships between features and process parameters. This approach has high performance for non-linear and ill-conditioned systems having high fault multiplicity.
(2) Requirement (v) by leveraging a Bayesian 3D CNN based approach. Recent developments in artificial intelligence cautions making real-life decisions based on point-estimates. As compared to traditional CNNs with deterministic weights Bayesian CNNs leverage probability distributions over model weights and model outputs and enable quantification of predictive uncertainty, prevent overfitting and require comparatively lesser data to train [17]. Successful applications of the above have been done in healthcare [18] and load forecasting [19]. Using such models enables segregation of the uncertainties into aleatoric and epistemic, the former quantifying the uncertainty due to uncontrollable factors such as system noise while the latter quantifies uncertainties due to model structure and insufficient training data [17]. The proposed approach estimates each model parameter as a distribution (epistemic uncertainty) while also modelling the output estimates as parameters of a multi-variate distribution (aleatoric uncertainty). Such estimates involving the different types of uncertainties in model predictions are crucial as these quantify when the model is 'randomly guessing' as compared to making a confident prediction. Particularly within manufacturing environments, these uncertainty estimates integrate a degree of confidence within the estimates and hence, support the decision-maker in making cost-effective selection of corrective action(s) which can be quite costly.
(3) Requirement (vi) by making the developed Bayesian 3D CNN from (2) compatible with point cloud data obtained via either multi-physics simulator or 3D scanners and leveraging the epistemic uncertainty estimates to perform intelligent closed-loop training and enable model convergence using a lesser number of training samples. In turn, this reduces total data generation and model training time. Since multi-physicssimulator is computationally expensive and generating each sample for assembly applications can be time intensive for high fidelity simulations, it is crucial to reduce overall simulation time. The reduction in simulation time provided by leveraging the epistemic uncertainty estimates of Bayesian 3D CNNs is significantly higher than the increased training time for Bayesian deep learning approaches. The approach developed in this paper will utilize high fidelity multi-physics simulator of the assembly process, called Variation Response Method (VRM) [20]. The VRM has the capability for, first, modelling and simulating assembly process with compliant parts constrained by assembly fixtures and part-to-part interactions; and, then it enables high-fidelity point-cloud data generation of 3D assembled products/subassemblies with error patterns as obtained under different sets of process parameters. The VRM model accuracy was verified and validated for various assembly processes [20]. Additionally, the approach is equipped to utilize data obtained from measurements using 3D optical scanners. 3D optical scanners enable real-time high dimensional point cloud data extraction from manufacturing systems within short cycle times. This point cloud data can be post-processed using alignment techniques to extract deviations for points, thus enabling dual data generation and integration. In summary, the paper develops a novel 3D Object Shape Error Response (OSER) approach in an effort to enable RCA within manufacturing systems using point cloud data. The proposed methodology integrates deep learning (which addresses requirements (i)-(iv)), Bayesian training enabled by Bayes-by-Backprop [21] and Flipout [22] (Requirement (v)); and, multiphysics simulator to address Requirement (vi).
The key contributions of the paper are as follows: (1) Proposed 3D OSER methodology based on a novel Bayesian 3D CNN architecture: it builds on current work done in the area of 3D Object Detection [14] by expanding it to manufacturing systems where the key goal is not to detect the object but to estimate various shape error patterns present on the final object/product and relate these variation pattern to manufacturing process parameters variations within the system. To the best of our knowledge, this is the first paper to propose an uncertainty enabled 3D CNN based deep learning model for RCA of assembly systems.
(2) Propose a closed-loop framework for training and deployment of the Bayesian 3D CNN model that leverages a Computer-Aided Engineering (CAE) simulator known as VRM [20] to emulate the multi-stage assembly system. The VRM performs sampling which leverage the epistemic uncertainty estimates of the Bayesian 3D CNN thereby, reducing overall simulation and training time. Given that data availability within manufacturing systems is costly, scarce and the data can be highly skewed the VRM functions as a physics-based Digital Twin for generating augmented data that is close to the real system and can therefore, be used to train the proposed model.
The trained model can then be leveraged for applications such as RCA of assembly systems using point cloud scans obtained from 3D scanners.
(3) Verify & validate the methodology on an industrial automotive door assembly process made of compliant parts.
(4) Benchmark 3D OSER methodology against three categories of methods that can be leveraged to estimate the dimensional and geometric variation of assembled products namely, (i) current linear state-of-the-art RCA models; (ii) machine learning models in a multi-output regression setting; and, (iii) deep learning models such as various types of CNNs and fully connected networks to highlight performance and the ability to fulfil the aforementioned six requirements.
The rest of the paper is organised as follows; Section II formulates the object shape error estimation problem, presents the proposed Bayesian 3D CNN architecture, the steps involved in architecture optimization and the overall steps required to train and deploy the model; Section III presents the industrial case study. Finally, conclusions and future work are summarized in Section IV.

A. Object Shape Error Estimation in Manufacturing
Multi-stage assembly systems can be mathematically expressed as a state-space model where different states correspond to different stages of the manufacturing system [3]. The input is an object (set of parts to be assembled) entering the assembly process. Within the process, object shape errors can be introduced in any of the stages due to one or multiple variations in the process parameters and are further propagated through the stages (Fig. 1). Any object at its design nominal shape is characterized by a set of nominal points = { }, = 1, … , , where is a vector consisting of the x,y and z coordinates of the kth input point and represents the total number of points on object . The object here represents a single subassembly which is assembled in a single station, which can be understood as a collective reference to all parts used in this assembly station. In practice, the points correspond to mesh nodes in the Computer-Aided Design model of the object when considering CAE simulations and to actual points within the point cloud when considering the 3D scan of the object.
= { } denotes the deviation of each point after the nominal object has gone through different stages of the process, is a vector comprised of deviations of each point in x,y and z axes on object . An assumption made in this paper is that the assembly process has a single station which includes multiple stages = 1, … ,4 involving objects/parts: positioning (P), clamping (C), fastening (F) and release (R). Stage = 0 is used to represent the incoming part that includes deviations from the previous processes such as part fabrication. As the object goes through multiple stages the set of points are represented as while represents the deviations. As the main goal of this paper is object shape error estimation, hence, the paper extends the problem formulation in object detection, which only considers the set of points { } [15], by including deviations for each point { } as additional features. This adds the required discriminative ability in the data hence, enabling object shape error estimation. Thus, the object shape error for object after stage can be represented as: On the other hand, the set of all process parameters across all stages are denoted by where = { , … , }, ℎ denotes the total number of process parameters. The deviation of points at each stage for object can be expressed as the sum of all deviations accumulated in all stages from stage 0 up to stage : represents the shape error of incoming object caused by upstream manufacturing processes. After each stage the actual points of the object with error can be written as: At the end of the final assembly stage = 4 the object shape error data for the assembly is collected and decomposed into the nominal points and their deviations by using alignment techniques [9], where , are now a collective reference to the set of all incoming objects that have been assembled. The measurement system error is considered to be negligible ( ≈ 0). The object with errors is represented as a point cloud of non-ideal parts: can be considered as features at each point . The aim of the Bayesian 3D CNN model training is to learn assembly process transfer function (. ) (equivalent to state transition matrix in [6]). The function (. ) is parametrized by weights and biases of a CNN that can accurately estimate the process parameters given the point cloud data of non-ideal parts collected from the system:

Fig. 1. Object Shape Error Propagation in Assembly Systems
The high accuracy of the 3D CNN in estimating all assembly process parameters provides the underlined capability of the OSER approach for high root cause (RC) isolability. Essentially within assembly systems, RCs are estimated as a subset of the estimated process parameters: ⊆ (6) Based on the requirements and the production phase of the assembly system the exact definition of an RC may differ but the key requirement to conduct RCA under any definition is to accurately estimate all process parameters . Hence the proposed OSER approach aims to do the aforementioned by estimating (. ) as specified in (5).

B. 3D Object Shape Error Voxelization
In the presented OSER approach the simulation output represented as mesh or point cloud data {( , )} (4) is with discrete voxel coordinates (u,v,w) in the following way: for all points = ( , , ) that fall within a voxel grid , , the maximum value of = ( , , ̃ ) characterizes the features of the corresponding voxel grid and is represented as , , , . The voxelization techniques used in object detection [15] is applied to construct the initial voxel structure of the object and for each unique object the voxel features are characterized by the shape error . The key difference is that in object detection, voxel grids are characterized by either binary voxels or voxels containing RGB values for each point instead of real values of shape error as in the OSER approach. Although binary voxels, traditionally used in Object Detection retain the spatial structure, the granularity of voxelization required to discriminate between minor differences in the shape error will make the problem computationally infeasible and hence, limit performance. In the proposed approach, the nominal object is voxelized and each voxel is characterized by real vales of the shape error . This is critical in representing the geometric variations with the required granularity for effective RCA. This efficiently retains all information about the spatial structure of the object as well as the components of object shape errors. Given the alignment ensures a fixed orientation, there is no need for data augmentation to achieve rotation invariance.

C.Uncertainty Estimation
Given the uncertainties of the system and the availability of only a limited dataset, a deterministic estimate of function (. ) as shown in (5) is not feasible. Hence, by leveraging Bayesian inference, a prior distribution can be allocated over the space of possible functions ( ), which represents a prior belief of the possible functions (. ). Given a dataset, a likelihood ( | , ) is defined to model the function from which the observation is generated; and, hence, given a dataset ( , ) the posterior distribution over the functions ( | , ) can be inferred. The function is characterized by model parameters represented by (weights and biases for neural networks) and the posterior over the function can be inferred by estimating the posterior over the parameters . In Bayesian Neural networks, this is achieved through Bayes-by-Backprop [21] and Flipout [22]. Given a dataset the posterior can be written as: For complex models such as deep neural networks, it is not analytically possible to infer the true posterior for all model parameters ( | , ) hence, an approximating variational distribution ( ) parametrized by , such as normal distribution, is used to approximate the posterior. This approach is known as variational inference [23]. The approximating distribution should be as close as possible to the true posterior which is achieved by minimizing the Kullback-Leibler (KL) divergence with respect to : Using the estimated variational distribution * ( ) the process parameter distribution quantifying the uncertainties for a new data point * can be obtained using:

D.Bayesian 3D CNN Model Architecture
Building on the work done on voxel-based approaches for 3D object detection such as VoxNet [15], the research proposes a Bayesian 3D CNN architecture to enable object shape error estimation. The 3D convolutions aggregate features from the input which are then utilized by the fully connected layers and mapped to process parameters. The model consists of three 3D convolutional Flipout layers, a 3D max-pooling layer followed by three fully connected Flipout layers, the final layer estimates parameters of the predictive distribution for all process parameters. The convolution can be represented as: where represents the layer output value at position ( , , ) in the a th layer and b th feature map. ReLU is the Rectified Linear Unit activation function [24].
represents the bias; m represents no. of filters from the previous layer; ( , , ) and ( , , ) represent the kernel dimensions and stride lengths in the three directions respectively; represents the weights of the connections. The convolution operation as in (10) is done consecutively for the three convolutional layers. In 3D max-pooling operation, the resolution of the feature map is reduced by taking the maximum value of the local neighbourhood of the layer outputs. Given the Bayesian framework, each parameter of the Bayesian 3D CNN model follows a distribution. In the case of neural networks, it is not feasible to assign informative priors hence, non-informative prior distributions are placed over the model parameters. Each parameter is approximated using variational inference approach assuming that the posterior follows a normal distribution. The overall model has 1,997,286 trainable parameters. Output nodes have linear activation units. Fig. 2 shows the proposed Bayesian 3D CNN model architecture with annotated hyper-parameters.

E. Architecture Selection and Optimization
Hyper-parameters optimization for Bayesian 3D CNNs is done to maximize performance and eliminate architectures that are more likely to overfit. As this is computationally intensive hence, in order to perform optimization in a computationally feasible manner the following steps were involved: Step 1 -Set Baseline: VoxNet [14] which is a 3D CNN architecture used for object detection consisting of two 3D convolutional layers, one max-pooling layer and two fully connected is set as the baseline. A dataset consisting of 1500 samples is generated to conduct k-fold cross-validation (k=6). The hyper-parameters are split into two categories; Category one consists of the number of convolutional layers = {2,3,4} and number of dense layers = {1,2,3}; Category two consisted of the number of filters in each 3D convolutional layer, filter size for each 3D convolutional layer and number of hidden units in each dense layer.
Step 2 -Grid Search for Category one Hyper-parameters: In this step, grid search for category one hyper-parameters are conducted and each selection is evaluated using k-fold crossvalidation (Fig. 3). For computational feasibility, category two hyper-parameters are kept constant and equal to the VoxNet architecture values. = 3 and = 3 were obtained as the optimal hyper-parameters having the minimum crossvalidation Mean Absolute Error (MAE) average of 0.08 mm.
Step 3 -Hyperband for Category Two Hyper-parameters: The optimal values for category one hyper-parameters are fixed and further Hyperband [25] is leveraged to obtain the optimal values for category two hyper-parameters given its ability to speed up the random search process through adaptive resource allocation and early stopping.
Step 4 -Deterministic to Bayesian Model: The final step includes replacing the deterministic layer with Bayesian Flipout layers and then training using Bayes-By-Backprop. Various learning rates and prior distributions for the model weights were tested. Standard normal distribution provided the best balance between weight initialization and weight exploration, which was inferred by conducting an uncertainty vs. error calibration study. The training hyper-parameters that provided the best uncertainty calibration and ensured that the model performance was greater than or equal to the deterministic counterpart were selected as the final Bayesian 3D CNN architecture training hyper-parameters. The key changes from the baseline architecture of Object Detection that enable fulfilment of the aforementioned six requirements are summarized in Table 1.

F. Model Training and Deployment
Training of the model is done in a closed-loop framework using data generated by VRM. The key motivation behind using a closed-loop framework as opposed to an open-loop framework is to minimize the bottle-neck computation, i.e., multi-physics simulation using the VRM model. Although this increases the number of training iterations, nonetheless, the overall time of VRM simulation and training is significantly reduced as fewer samples need to be generated. The key steps of the proposed framework are summarized below (Fig. 4): Step 1 -Sampling: Process parameters are sampled from the allowable ranges. Latin Hypercube Sampling [26] is used to generate initial process parameter sample values given it distributes samples optimally across the ℎ −dimensional process parameter space by stratifying the possible ranges. The consecutive sets of samples are generated using Monte Carlo sampling based on the uncertainty ( ) of the model.
Step 2 -VRM Simulation: The samples are used as input to the VRM to simulate the assembly process and generate the output mesh from which the point cloud and deviations of each point are extracted after the desired stage of the assembly system as in (4) = {( , )}. The KL divergence term quantifies the divergence between the standard normal prior and the learnt posterior and hence, prevents overfitting by penalizing weights for diverging from the prior. Group normalization [27] with four groups is used after each convolutional layer. This also prevents overfitting and accounts for small minibatch size due to GPU memory size constraints and aids in stabling the training process. Weights of the network are initialized using normal initializer [28]. The Adam method for stochastic optimization was used to optimize the loss function while training [29]. The initial learning rate is fine-tuned to = 0.0005 and monotonic KL annealing was leveraged to ensure the model initially learns the object shape error and process parameter relations before applying the KL penalty for uncertainty quantification. The learning rate finetuning, monotonic KL annealing and ReLU activations prevent gradient vanishing. The predictive distribution is modelled as a multivariate normal with ℎ components (same number of components as the number of process parameters ℎ), ~ ( , ) where each component corresponds to a process parameter hence the mean across all components of the multivariate distribution corresponds to the set of process parameters . The distribution is assumed to have a diagonal covariance matrix . The scale parameters in the diagonal are assumed to be fixed since the noise has been assumed to be negligible.
After each iteration of training the model is evaluated on the validation set. For evaluation, Monte Carlo (MC) sampling from the model is done and the sample means and standard deviations ( ) are estimated for each process parameter.
( ) represents the epistemic uncertainty while the fixed scale parameters of the predictive distribution represent the known aleatoric uncertainty [17]. Given the assumption of negligible measurement noise, aleatoric uncertainty is considered to be negligible and hence, the overall uncertainty in the prediction can be assumed to be equal to epistemic uncertainty ( ). This uncertainty is used for sampling in the next iteration.
Mean Absolute Error (MAE) between the model estimates = and actual value across all process parameters ℎ (12) is used as the metric for model performance evaluation given the ease of interpretation and given that the model outputs are continuous and real-valued. Training is stopped when MAE is below the required threshold . The threshold value for this metric is determined based on the quality requirements for a specific product as required by design tolerances and the accuracy of the measurement system. The model is trained within the measurement system accuracy. For example, automotive body assembly process tolerances are within [-1mm,1mm], and the 3D optical scanner used has a repeatability of 0.05 mm and accuracy within 0.15 mm.
Step 4 -Model Deployment: After training the model can be deployed within an actual system. The data collected from the 3D scanner is aligned to obtain point cloud and deviations = { , } and then, voxelized , , , before it can be given to the trained model for conducting RCA inference. Inferencing estimates the process parameters for a given (5) using MC sampling from the trained model. Using these samples, process parameters (distribution mean) and their uncertainty (distribution standard deviation) ( ) can be estimated. The sample mean is considered as the model Fig. 4. Model Training and Deployment Framework estimate , while ( ) quantifies the uncertainty. Further, the RCs can be inferred as a subset of (6). The work has been implemented using Python 3.7 and TensorFlow -GPU 2.0 [30] and TensorFlow-Probability 0.8. A python library, Bayesian Deep Learning for Manufacturing [31] has been developed to validate and replicate the results of the methodology. For this paper, both, the data generation and evaluation of the OSER methodology have been done using VRM. Two Nvidia Tesla V100 32 GB GPUs are used for model training and deployment.

A. Assembly Setup
For verification and validation of the proposed OSER approach, an automotive assembly of two components namely, the door inner and hinge reinforcement are selected. The assembly setup and parameters are shown in Fig. 5. The assembly process is controlled by the six (ℎ = 6) parametrized process parameters , , … , (depicted using yellow symbols in Fig. 4). Assembly parameters such as pin-hole, pinslot and NC blocks for the door inner are considered constant (depicted using green symbols in  After starting with 200 initial samples for model training, 200 samples are adaptively added during each iteration of the closed-loop training based on the uncertainty estimates and the model is trained on the combined dataset including all previous samplings to ensure that there is no catastrophic forgetting (using 200 samples provided an optimal tradeoff between VRM simulation time and model training time). These samples and outputs are used for training the Bayesian 3D CNN model. The diagonal scale parameters for all process parameters in the covariance matrix are fixed at 0.001. A validation set of 300 samples is generated within the validation range, and after each iteration, the trained model is evaluated on the validation set. During evaluation for each of 300 samples, 1000 MC samples are drawn from the trained model. The sample means are considered as the estimate for the process parameters while the sample standard deviations quantify the uncertainty for each process parameter for the given sample. RCs can be inferred from the process parameter estimates. The closed-loop training is stopped when average MAE across all process parameters for the validation set is below the threshold which is selected to be 0.05 mm for automotive assembly applications as the impact of variations less than 0.05 mm is not detectable by the 3D scanner. After this, the model is ready for deployment with measurement data collected from 3D optical scanners followed by alignment and voxelization. For each measurement, MC samples from the trained Bayesian 3D CNN model can be drawn to estimate process parameter mean and standard deviations (uncertainty). Measurement data collection is done using WLS400A mounted on an ABB robot.
In summary, the industrial assembly process selected for case study consists of (i) high dimensionality point cloud (10841 points); (ii) non-linearity as induced by fixturing (N-2-1, where N=6), two compliant parts and part-to-part interactions (door inner to hinge reinforcement); (iii) collinearity induced by fixturing as locators: , , are within 5 degrees of collinearity (-3 to 2-degree deviation from axis y); and, (iv) high fault multiplicity as we take into consideration 6-sigma defects at the level of variation within 3D scanner accuracy (<0.05 mm) that significantly increases fault multiplicity from zero to 6 process parameters manifesting errors (100% fault multiplicity). The door assembly requirements are: (1) Product: Design tolerances of door assembly: <-1.0, 1.0> [mm], (2) Process: Fixturing calibration and commissioning is achieved within <-0.1; 0.1> [mm], and (3) Shape error detection: Using the 3D optical scanner for measurement.
Key Performance Indicators (KPIs) used for assessment of the results are as follows: (i) Mean Absolute Error (MAE) <0.05 mm and, (ii) R >0.95 for the model to have the capability to explain more than 95% variance in the process parameters under the assembly system Requirements (ii)-(iv).

B. Results
The KPIs of model performance are summarized for all , … , in Fig 7. The model convergence is shown in Fig. 8. The model converges with average MAE across all process parameters equal to 0.05 (below the required threshold) and average equal to 0.98 after 10 iterations of closed-loop training, which included a total of 2000 samples being generated adaptively. For validation purposes, this study trained both Bayesian 3D CNN and a deterministic version of the model, i.e., 3D CNN with the same architecture as in Fig. 2.

C. Benchmarking and Discussion
The benchmarking analysis is conducted by using the six requirements as listed in Section I. The case study and results along with analysis of collinearity, multiplicity and uncertainty are used to demonstrate the capabilities of the proposed approach to fulfil the aforementioned requirements.
The benchmarking analysis of the proposed 3D OSER approach is discussed on two levels: 1. OSER vs. currently used approaches at production phase when point cloud data is available -The benchmarking is conducted for two scenarios: (a) RCA; and, (b) RCA with uncertainty quantification; RCA: as discussed in Section I, the state-of-the-art models used for assembly process RCA such as [32] [8] are linear and can be classified as regularized linear regression approaches (Table  II). Hence, their upper limit performance can be estimated by using regularized linear regression on all point deviations d within the point cloud. They also use a limited number of sampled points from the point cloud on a single part (less than 100 out of >10,000) which additionally limit their performance for assembly processes. The OSER methodology validation against the six requirements as presented in Section I is as follows. Requirement (i) is fulfilled by the proposed voxelization approach which ensures that irrespective of the dimensionality of the point cloud, it is transformed into a sparse tensor of dimensions (64,64,64,3) which preserves information in terms of the object spatial structure and point deviation features. This also enables the application of the OSER based models that require a regular data structure as input. Secondly, the model performance of the state-of-the-art regularized linear regression approaches is at = 0.41 mm and = 0.76 (see Table II), which is unsatisfactory as compared to the required MAE<0.05, >0.95. This is because the regularized linear regression model can explain only the linear variance in the system. By comparison, the proposed OSER model demonstrates good performance at = 0.05, = 0.98, hence fulfilling Requirements (ii), (iii) and (iv). Fig. 9 compares the performance of regularized linear regression (i.e. upper limit for state-of-the-art approaches) with the proposed OSER approach under different levels of fault multiplicity and collinearity. For example, in scenarios 1, 2 and 3 (fault multiplicity up to 50%) both approaches have similar performance. However, in scenarios 4, 5 and 6 as the fault multiplicity increases to 4, 5, and 6 parameters being simultaneously at fault, i.e., 100% of parameters, and with induced by design collinear relation between process parameters and input, the performance of linear model decreases while OSER approach exhibits performance above the required threshold ( >0.95). The benchmarking also comprehensively assesses the OSER against existing deep learning and machine learning techniques [33] in ways that are not currently used for RCA of assembly processes (see Table II). This paper implemented these techniques and applied them for the aforementioned case study. CNN based deep learning methods where selected as they retain spatial information while learning which is essential for object shape error estimation. Each model is compared in its ability to fulfil the aforementioned six requirements. Table II shows   proposed model has higher model training time, the overall training time is significantly lesser due to the ability to leverage the epistemic uncertainty to generate informative samples leading to faster convergence with only ~2000 samples. All other models are trained using random sampling until convergence. Fig. 10 summarizes the convergence of the entire set of benchmarking models. The hyper-parameters of the machine learning models were optimized using grid search. For statistical quantification of accuracy and goodness-of-fit, 20 runs of training and testing are conducted using a set of 4000 randomly sampled data points for training and 300 for validation within the validation range. The mean and standard deviation (SD) for each model-averaged across six process parameters have been reported. The model performance of the proposed OSER model is significantly better in terms of accuracy and goodness-of-fit. Result from ANOVA followed by post-hoc Tukey-HSD test at 95% significance level considering two sources of variations (model type and process parameter) showed the differences to be statistically significant. This comes at the expense of increased model complexity. RCA with Uncertainty Quantification: As discussed in Section I the identified RCA frequently leads to costly corrective actions conducted in the manufacturing environment, therefore, it is crucial, especially for 6-sigma faults to have decision-driven RCA directed toward informing choices by uncertainty quantification of solving problems. The OSER methodology provides standard deviation of the predicted process parameter distributions ( ) that quantifies this uncertainty hence, fulfilling requirement (v). Although the performance of the OSER with 3D CNN and OSER with Bayesian 3D CNN models are similar, the latter can quantify and segregate the aleatoric and epistemic uncertainty while estimating the process parameters. To demonstrate the capability of the model in quantifying the uncertainty on unseen samples, evaluation is done on 500 samples within the training range [-1 mm, 1 mm] and 500 samples outside of the training range [-2 mm, 2 mm]. The standard deviation across all observations has been averaged and compared for each process parameter , … , . Results are shown in Fig. 11. Additionally, the epistemic uncertainty estimates enable closed-loop training reduce overall training time.
2. OSER vs. approaches at design phase when NO point cloud data is available -In manufacturing environments, the availability of a comprehensive dataset inclusive of all fault scenarios is not feasible, hence augmenting the dataset with high-fidelity multi-physics simulation enables training and deployment of deep learning approaches during the design Fig. 11. Process Parameters Distribution Standard Deviations phase of a new product/production system introduction. Given the proposed OSER approach transforms the simulation mesh nodes output and scanned point cloud output to the same voxelized shape error that is compatible with 3D CNN, it enables this integration hence fulfilling Requirement (vi). This provides the capability for modelling and simulation of the assembly process and conducting system diagnosability and resilience analysis. Currently, no approaches providing this capability for object shaper error RCA at the design phase. IV.

CONCLUSIONS
This paper presented an Object Shape Error Response (OSER) approach which is relevant to manufacturing industries where dimensional and geometric variations can be quantified as object shape errors. This is also relevant to areas such as robotics, computer-aided detection, stamping, machining and additive manufacturing where RCA of dimensional variations translates to estimating object shape error patterns and relating them to process parameters. Transfer learning can be leveraged for application in these domains with exponentially lesser training samples [16], a focus for future work. The proposed approach leverages a Bayesian 3D CNN model trained within a closed-loop framework using a multi-physics simulation (VRM) model, to estimate shape errors and relate them to process parameters while quantifying uncertainty. This can then be deployed on real data collected from 3D surface scanners and thereby, enable more effective and efficient decision making for control and correction of manufacturing systems. The approach is benchmarked against state-of-the-art assembly RCA models and other machine learning models to highlight, statistically significantly better model performance while fulfilling the manufacturing system design requirements. Leveraging such automated RCA models ensures early estimation and elimination of process variations before they become defects which can improve the quality and productivity of the system by reducing scrap and machine downtime. This also eliminates the need for trial and error approaches for root causes analysis, which is often ineffective and inefficient. Future work aims to explore scaling up the work to multistation assembly systems. Various encoder-decoder based CNN architectures such as U-Net [36] and Pointnet [14] that enable process parameter estimation for a heterogeneous set of process parameters, i.e., continuous and categorical as well as enable object shape error estimation in-between stages/stations be explored to comprehensively perform RCA on multi-station systems. Approaches for uncertainty guided continual learning will also be explored that enable transfer learning to different manufacturing systems while simultaneously retaining knowledge of previous assembly systems. The future work also aims to develop a life-long continual learning approach leveraging Bayesian 3D CNNs which is crucial for continuously changing manufacturing environments. Dr. Pasquale Franciosa is Associate Professor at University of Warwick, and head of the laser welding applications laboratory at WMG, University of Warwick. His research interests are in smart manufacturing, process monitoring, closed-loop control, applications of machine learning/artificial intelligence and multi-disciplinary optimization, with specific attention for assembly systems and laser processes. He has been PI and co-I on several funded projects with a total income to University of Warwick of circa £2.8M since 2015. He has published 80+ papers and received four best-paper awards. He is member of the editorial board of the ASTM Smart and Sustainable Manufacturing Systems Journal.

Prof. Darek Ceglarek is EPSRC Star Recruit Research
Chair, U-Warwick, and a CIRP Fellow. Previously, he was Professor in IS&E at U-Wisconsin-Madison. He received his Ph.D. in ME, U-Michigan-Ann Arbor (1994). He focusses on smart manufacturing, data mining/AI for root cause analysis across design, manufacturing and service. He has been PI/co-PI on research grants of over £30M: NSF/NIST/EPSRC/InnovateUK/APC/EU-FP7/Curie and industry. He has published over 180 papers, is listed by Stanford University among Top 2% of the world's leading scientists; received several Best Paper Awards; 2018 JLR 'Innovista' Award for the most innovative 'piloted technology'; EPSRC Star Award, NSF CAREER Award. He has served as AEs: ASTM SSMS; IEEE TASE, and ASME JMSE.