A Deep Regression Framework Toward Laboratory Accuracy in the Shop Floor of Microelectronics

Deep learning (DL) has certainly improved industrial inspection, while significant progress has also been achieved in metrology with impressive results reached through their combination. However, it is not easy to deploy metrology sensors in a factory, as they are expensive, and require special acquisition conditions. In this article, we propose a methodology to replace a high-end sensor with a low-cost one introducing a data-driven soft sensor (SS) model. Concretely, a residual architecture (R$^{2}$esNet) is proposed for quality inspection, along with an error-correction scheme to lessen noise impact. Our method is validated in printed circuit board (PCB) manufacturing, through the identification of defects related to glue dispensing before the attachment of silicon dies. Finally, a detection system is developed to localize PCB regions of interest, thus offering flexibility during data acquisition. Our methodology is evaluated under operational conditions achieving promising results, whereas PCB inspection takes a fraction of the time needed by other methods.

I NDUSTRY 4.0 aims to automate manufacturing processes using smart technologies and has greatly benefited from recent advancements in deep learning (DL). The success of DL has been demonstrated by its numerous applications in a variety of different industrial sectors ranging from telecommunications [1] to electronics [2]. However, the large amounts of accurately annotated data needed to train DL models necessitate the deployment of multiple high-accuracy sensors for industrial Manuscript  process monitoring. The use of high-end laboratory sensors, which are accurate, stable, and robust, is not always a feasible solution due to the high deployment and maintenance cost they incur. Hence it is necessary to rely on less accurate, yet low-cost sensors. This article is a direct extension of a previous work by Dimitriou et al. [3], in which a printed circuit board (PCB) defect detection system is developed. Prior to the attachment of integrated circuits, it is necessary to dispense conductive glue on the PCB's substrate surface, during which defects related to the dispensed glue volume may occur. Identification of such defects is achieved through the use of a modular scanning system and a 3-D convolutional neural network (3DCNN), which regresses the volume of each glue deposit of a PCB. Interestingly, glue volume is accurately estimated even after die attachment when only part of the glue is visible and partial information about its shape is available.
Despite the promising results obtained, this method relies on the use of a laser profilometer, which is expensive, slow, needs to be calibrated, and requires controlled illumination conditions during measurements. These limitations are the main motivation for our work, where we propose to replace the laser profilometer with a single RGB camera and a DL architecture that provides laboratory level accuracy in the shop floor (in situ). The replacement of a profilometer with an industrial camera has numerous advantages, as it significantly reduces the cost of automated inspection, making it a suitable choice for in situ deployment, whereas both data acquisition and processing are faster, thus notably reducing inspection time. To automate the inspection process using a low-cost sensor, ground truth is acquired in the laboratory by analytically calculating the volume of each sample from its corresponding 3-D point cloud representation and is subsequently used to train a deep network that estimates glue volume from 2-D image data. Additionally, we develop a segmentation and detection system that localizes glue regions within the image irrespective of the PCB's orientation. Finally, we propose a method to deal with label noise, inherently existent due to the finite resolution of the 3-D scans, using only a few reliable measurements. The main novelties of our work are summarized as follows.
1) We propose a new methodology to replace expensive sensors with in situ ones. 2) We provide increased flexibility during data acquisition, as accurate part placement is not necessary. 3) We extrapolate a 3-D variable (volume) from 2-D data (RGB image). 4) A modified residual network for regression (R 2 esNet) is introduced, and the impact of different choices of depth is examined. 5) We demonstrate the potential of this approach in a semiconductor manufacturing use case. 6) A new benchmark dataset is generated. The rest of this article is organized as follows. Section II reviews the related work in soft sensors (SS) and metrology methods for defect detection for industrial applications. Section III outlines the examined defect detection use case as well as the proposed methodology. A detailed description of the available data is given and the various deep architectures used are introduced. In Section IV, our methodology is evaluated and the experimental results are presented. Finally, Section V concludes this article.

II. RELATED WORK
In this section, we give an overview of the most recent work related to our method. Since we develop a SS model to indirectly estimate glue volume from RGB image data and use the estimated value to identify defects, we further elaborate on the recent advancements in the fields of defect detection via deep learning, soft sensor models, and AI.

A. Defect Detection via DL
The successful application of DL in the industry can be partially attributed to the introduction of several deep convolutional architectures that have revolutionized image classification and have since been used as backbone networks for various tasks. In [4], the authors introduce VGG, a deep convolutional architecture that makes use of multiple layers, while keeping a small, fixed, 3 × 3 size for the convolutional kernels, which has since become a standard process. Proposing the use of residual connections, He et al. [5] developed residual network (ResNet), achieving state-of-the-art results in multiple image classification datasets. Extending this idea, DenseNet is proposed in [6], by concatenating the feature maps of each layer with the input of every subsequent layer, thus facilitating the gradient flow to the initial layers during backpropagation. To address the increasing demand for mobile and real-time embedded vision applications, Howard et al. and Sandler et al. [7] and [8] proposed MobileNet, which is based on depthwise separable convolutions to lower the computational cost both during training and inference without significantly affecting performance.
In recent years, DL approaches have been successfully adopted for various industrial applications, mainly focusing on optimizing the production process through the identification of defective parts. In [3], upon which our work is based, the authors consider a PCB defect detection use case and develop a 3DCNN regression architecture called RNet to estimate the volume of a conductive glue deposit before die attachment. In [9], a data-driven fault diagnosis system is developed using a 2DCNN in combination with a parameter-free conversion method to transform 1-D signals into 2-D images, and its effectiveness is demonstrated on three distinct fault diagnosis use cases. Moreover, in [10] a deep convolutional transfer learning network is developed to address the discrepancy between source and target domain during training and testing and is evaluated on three motor bearing fault diagnosis datasets. Lee et al. [11] propose the use of a fault diagnosis classification CNN, which enables the association of the output of the first convolutional layer with the structural meaning of the raw data, making it possible to extract information regarding the cause of defects, and test their method on a chemical vapor dataset. To cope with noisy data, a wavelet-inspired soft thresholding approach is adopted in [12] in which the optimal threshold values are learned, and it is applied for the identification of bearing and gear faults in rotating machines. A joint detection and classification scheme is proposed in [13] for steel plate defect inspection through the fusion of multilevel features. Another defect detection application in the hard metal industry is developed in [14], where data gathered from multiple sensors are used for quality assessment. Chen and Li [15] developed another feature fusion method exploiting sparse auto encoders, whereas in [16], the use of a segmentation-based deep architecture that can generalize based on only few training samples is suggested to identify surface crack defects. Finally, both [17] and [18] take advantage of deep learning models to identify welding defects.

B. Soft Sensors and AI
Another line of DL applications in the industry, involves the development of SS for the indirect monitoring of hidden variables during production. A data-driven approach to estimate hazardous gas concentrations using principal component analysis to decorrelate the input signals along with a deep belief network is proposed in [19]. Furthermore, a soft sensor application is developed in [20], where support vector machines and other predictive models are used to regress key variables in a refinery isomerization process. In a related application [21], a stacked autoencoder soft sensing model is introduced, where a product concentration prediction use case on an industrial debutanizer column process is examined. In [22], a nonlinear finite impulse response model is used to estimate the deflection of a polymeric mechanical actuator, whereas in [23] a semisupervised approach is introduced to exploit unlabeled data. Another semisupervised method is suggested in [24] to estimate the CO 2 concentration in an ammonia synthesis process. A spatiotemporal attentionbased long short-term memory network is proposed in [26] and is evaluated on an industrial hydrocracking process use case.

III. PROPOSED METHODOLOGY
In this section, the proposed methodology is outlined. First, an overview of our method is given and the studied use case is described in detail. Next, the in situ sensing module under consideration is introduced along with the high-end sensor that is replaced. Finally, our R 2 esNet architecture and the segmentation model for glue detection are described.

A. Overview
In order to transfer the accuracy and robustness of a laboratory sensor in the shop floor we adopt the following approach: For each data sample, we obtain two distinct representations x l and x s , corresponding to the measurements acquired using a highend laboratory sensor and a low-cost in situ one, respectively. Subsequently, measurements x l are used to create annotated samples by analytically estimating the quality variable under consideration through y = f (x l ), where f (·) is considered to be known. Exploiting the existing correspondence between the samples we create a dataset where θ * are the optimal network parameters estimated by minimizing the empirical cost as and L(h(x s i , θ), y i ) is any valid cost function. A schematic overview of this approach is given in Fig. 1. The deep network predicts the value h(x s , θ) =ŷ, and its parameters are tuned to minimize the prediction error e = y −ŷ.

B. Studied Use Case
The growing wave of small electronics products, such as wearables and Internet of Things devices has led to increased demand for small-scale printed circuit boards. A critical stage in the microelectronics manufacturing industry is the dispensing of conductive glue on an liquid crystal polymer (LCP) substrate surface placed by a glue dispensing machine before the attachment of integrated circuits.
The volume of the dispensed glue deposit is a crucial variable that needs to be monitored during production as it directly affects the quality of the produced circuit. Specifically, excessive glue may lead to internal short-circuits, whereas insufficient glue leads to weak die bonding. The current practice for the detection of such faulty conditions is the manual inspection of the PCB, which is both a time-consuming and highly inaccurate process.
The development of a soft sensor model for the monitoring of the dispensed glue volume and the inspection of a part benefits production and is considered crucial toward the automation of  quality control. The use of a laboratory sensor, such as a laser profilometer [3], even though produces highly accurate and reliable measurements, is expensive and may slow down production. Our proposed methodology aims to transfer the accuracy of a high-end sensor on the shop floor, while simultaneously offering increased flexibility during the acquisition of measurements and highly reducing inspection time and cost. An example of a PCB consisting of 18 identical circuit modules is shown in Fig. 2. Each module contains several glue deposits whose volume needs to be monitored during inspection. There are five different types of glue containers, as can be seen in Fig. 3, which we label A, B, C, D, and E.

C. Sensing Module
For PCB inspection, the sensing modules under consideration include a modular laser scanning system (high-end sensor) and an industrial RGB camera (low-end sensor). The high-end setting comprises an Optimet Conopoint-10 sensor, a Newport XPS-RL2 motion controller, two linear stages, and the support breadboards. The same zig-zag scanning strategy, as in [3], has been employed to reduce inspection time. Each glue deposit is scanned twice, once with a resolution of 50 μm and once with 20 μm. Indicative point cloud representations of the acquired scans can be seen in Fig. 4, from where it is evident that the 3-D geometric structure of the glues is successfully captured. The low-cost setup consists of a Baumer VCXG-201 C.R industrial camera and a Fujinon machine vision CF25ZA-1S lens. The lens' focal length is 25 mm, and, thus, when placed in the right height, the PCB captures a large part of the image compared to the background, thus fully exploiting the setup's potential. All inference computations are performed using a Jetson AGX Xavier, and, thus, the whole system can easily be deployed in  the shop floor following the edge computing paradigm. The proposed setup is shown in Fig. 5.

D. Generating a Dataset
Obtaining an annotated dataset requires expert knowledge and is usually a challenging and time-consuming process, especially in industrial applications, where the sample parts, particularly defective ones, are scarce. Automating the annotation process is highly beneficial as it largely reduces the time and workload needed. To address this common issue, we propose two different glue volume estimation processes, one using the 50 μm to automatically generate a noisy dataset for training, and another using the 20 μm to manually generate a reliable and accurate set of measurements for validation purposes.
In both cases, for each regional scan RANSAC [25] is applied to estimate the substrate LCP surface plane. Subsequently, for the low resolution scans the points belonging to the glue deposits are identified as the outliers of the plane's model, whereas for the high resolution scans they are manually cropped. Each 3-D point of the glue point cloud is then projected onto the estimated substrate plane to form a closed surface. Following this process, the resulting 3-D representation is is the glue cloud and S = {g i } M i=1 the corresponding projected substrate surface cloud. Glue volume is approximated using the formula where Δs i is the rectangular surface area element and · 2 the standard Euclidean 2-norm of a vector. Δs i depends on the resolution r of the point cloud through the relation Δs i = r 2 . Equation (3) is geometrically interpreted in Fig. 6. Essentially, the above formula is a discrete approximation of its continuous where D is the domain of integration, and g(x, y) the height of the glue deposit at any point. The coordinate system chosen is such that the substrate surface plane corresponds to z = 0. Due to the existence of sensor noise during scanning, the imperfect plane fitting and glue-substrate segmentation of the point cloud, and the discretization error introduced in (3), the estimated glue volume is unavoidably corrupted by noise. Let v i be the true volume value for sample i. By construction, we obtain a set of measurements {y d i } N i=1 , estimated from the 20-μm resolution point clouds and a corresponding set {y s i } N i=1 , estimated from the 50-μm glue clouds. Both sets of measurements are corrupted by noise such that where n s i ∼ W N(μ s i , σ s ) and n d i ∼ W N(μ d i , σ d ) are modeled as white noise random processes, where it is assumed that σ d σ s and that μ d i ≈ 0. After acquiring the PCB images, all glue regions are manually cropped thus obtaining representations x s i , which are used to form a noisy training set using the more reliable, dense resolution measurements for evaluation purposes.

E. Deep Architectures
The developed system consists of a two step process. First, the acquired image is fed into an instance segmentation network that predicts the pixel coordinates of each glue deposit and classifies its type. Subsequently, each detected glue deposit is fed into a regression network that estimates its volume.
Segmentation and Detection: During the last few years, computer vision has immensely advanced, especially regarding object detection and instance segmentation tasks. Extensive research in the field has led to the development of two-stage detection methods, like the seminal Faster R-CNN [27] method that is capable of fast and accurate object detection through the utilization of a region proposal network. A variant of faster R-CNN, called Mask R-CNN, is introduced in [28] to perform semantic segmentation on the detected objects.
Since it is necessary to identify all glue deposits within an image, we use Mask R-CNN to first segment the PCB into its constituent modules and Faster R-CNN to detect and classify the glue deposits within each module. For each PCB, the coordinates of its modules are manually annotated. Five of the PCBs are kept for training and one for testing. Due to the small-scale dataset available, we perform heavy data augmentation by applying random rotations, crops, random scaling, and random horizontal and vertical flips during training. Besides increasing the size of the dataset, augmentation also contributes to the development of the desired invariances. Subsequently, for every detected module, we fit the minimum area rectangle to its binary mask and warp the image using the rectangle's four corners. The glue detection faster R-CNN is trained to localize all glue deposits within a module and classify their type. To this end, the pixel coordinates of all glue deposits are manually annotated to create a dataset. The same kind of augmentation is used as before to develop translational, rotational, and scaling invariance. Both networks use a ResNet50 backbone and are pretrained on the ImageNet dataset [29]. Finally, by performing another affine coordinate transform using the detected bounding boxes, the glue deposit images are isolated. A schematic representation of the instance segmentation system is shown in Fig. 7.
Volume Regression: To automate the glue volume estimation process, we develop R 2 esNet, a deep regression network based on the well-known ResNet architecture [5]. ResNets exploit the use of shortcut connections between intermediate layers, and by doing so can define very deep architectures that avoid overfitting and increase the convergence rate during training. As the authors advocate, it is easier to learn residual rather To perform regression, we modify the existing ResNet architecture by replacing the last classification layer with a singleoutput fully connected layer and omit the use of the softmax activation function. We experiment with three choices of depth by stacking 4, 8, and 16 residual blocks, which correspond to 10, 18, and 34 layers, respectively. Each network is trained for 90 epochs using Adam with an initial learning rate of 10 −4 , which is divided by 10 every 30 epochs. Weights are initialized as in [30] and trained from scratch. All training samples are normalized by subtracting the per-channel mean and dividing by the standard deviation over the whole dataset, whereas the labels are scaled to the interval 0-1. Data augmentation is applied in the form of random rotations, translations, horizontal and vertical flips, as well as brightness, contrast, saturation, and hue jitter. Moreover, due to the small dataset available, we use a mini-batch size of one and, therefore, replace all batch normalization layers with layer normalization. The cost function used is the typical for regression mean square loss (MSE) loss. The schematic overview of the proposed architecture is shown in Fig. 8.
Compared to other deep architectures, R 2 esNet achieves faster inference due to its reduced size, while the use of residual connections lowers the risk of overfitting. As in most industrial applications, annotated data are scarce, and computational resources are limited, we conclude that R 2 esNet is well suited for edge deployment. In general, the few hundred samples available would be inadequate to train a deep network and would likely result to the model overfitting the training data. However, due to the controlled image acquisition environment and the relatively small variability of the glue samples, even a small dataset is enough to successfully capture the problem's statistics.
Error Correction: As has been described in Section III-D, the available measurements for training are corrupted by white noise, and, thus, it is necessary to employ some regularization strategy to cope with this issue and develop robustness for the trained model. The two sets of measurements for type A glues are illustrated in Fig. 9. It is made evident that besides the random fluctuations present in the sparse resolution measurements, there also exists a deterministic bias directly correlated with the amount of the dispensed glue. The existence of this bias is  attributed to the overestimation of the substrate plane's height, as well as the inability of RANSAC to correctly classify the points near the boundaries of the glue. Inspired by this observation, we hypothesize that the existent error is related to several other characteristics, such as glue shape, and can thus be partially predicted from the 2-D input images. Formally, we assume that there exists a deterministic error term e s i such that In order to refine the network's predictions, we employ a parallel error-correction network that functions as a regularizer and whose aim is to regress the deterministic error from the input image x i . The structure of the proposed model is illustrated in Fig. 10.
To minimize the computational load added through the errorcorrection network, we choose to employ a R 2 esNet architecture of depth 10 in every case. A small number of reliable dense resolution measurements acquired from the 20 μm scans is used for training, during which the parameters of the volume prediction network remain frozen. The same optimization and augmentation strategy is used as before. The small number of available training samples further justifies the choice of a shallower architecture.

IV. EXPERIMENTAL EVALUATION
In this section, our proposed methodology is evaluated. First, our instance segmentation system is visually evaluated, and its robustness to translations and rotations of the input image is demonstrated. Subsequently, we evaluate the developed regression model R 2 esNet. The three different choices of depth are compared through the use of tenfold cross-validation, whereas testing set predictions are shown. Finally, inspection times for the six defective and nondefective parts are calculated and presented.

A. Module Segmentation and Glue Detection
To evaluate the robustness of our segmentation model, we artificially apply random rotations and translations to the acquired input images to simulate the effect of part misplacement. As can be seen in Fig. 11, the developed system successfully segments the PCB into its modules in all three cases. Even in the more challenging cases, where the image is either rotated or translated, all modules are detected despite the partial occlusion near the boundaries of the PCB, which indicates that the system has learned to generalize. Qualitative results of our glue detection system are shown in Fig. 12. Notice that despite the large difference in both illumination and the amount of the dispensed glue deposits, all regions are accurately localized and their type is correctly predicted.

B. Glue Volume Estimation
The available data for the development of the proposed system consists of six PCBs, each containing 18 circuit modules, as shown in Fig. 2. There are five different types of glue containers annotated A, B, C, D, and E, each corresponding to a different shape and amount of glue that needs to be placed on the LCP substrate surface, as is illustrated in Fig 3. Within each circuit module, there are four placeholders for each glue type thus resulting in a total number of 20 placeholders per module. On the top row of one of the PCBs dies have been attached, and hence the total number of available samples per glue type is To obtain reliable testing results we adopt the following strategy to split the dataset into training, validation, and testing sets: We keep the glues located in the third row of every module for testing, whereas the remaining samples are randomly divided into training and validation samples. This results in a 75%-25% training\validation-testing split, in which the testing set contains glues from all PCBs, and whose volume varies from very little to excessive, that way successfully capturing the whole dataset's statistics. For the same reasons, we use the ground truth volumes estimated using the 20-μm resolution scans from the first row of each PCB to obtain a clean dataset, which is used to train the error-correction network. That way, the number of the resulting reliable training samples is 99. The average validation error over all tenfolds is shown in Fig. 13. We observe that the deepest model trained consisting of 34 layers performs the worst, whereas R 2 esNet10 and R 2 esNet18 perform comparably with R 2 esNet10 converging faster in most cases.
In order to quantitatively evaluate and compare the developed models, we use the normalized root mean square error (NRMSE) metric as To evaluate the effectiveness of R 2 esNet compared to other popular backbone architectures we perform the same experiments using VGG, DenseNet, and MobileNetV2. For training, all hyperparameters are kept the same as in R 2 esNet, whereas batch normalization layers are replaced with layer normalization. As in R 2 esNet, we decrease the number of layers used by stacking eight densely connected blocks for DenseNet, which results to a 20-layer network. Due to its efficient and lightweight  Tables I and II, respectively. Contrary to the usual case, where deeper architectures yield improved results, we observe the opposite, even though ResNets are known to tackle the degradation problem associated with increased depth. During cross-validation, in most cases R 2 esNet10 produces the best results, whereas R 2 esNet34 performs worse by a significant margin. In the testing set, irrespective of the choice of depth, all networks perform approximately the same, whereas we observe a significant improvement when using a parallel error-correction network. Compared to other backbone architectures, R 2 esNet performs better by a small margin in most cases, whereas the classical convolutional VGG network performs significantly worse.
Inspection times for all six PCBs are shown in Table III. We observe that for the most shallow models R 2 esNet10 and ec-R 2 esNet10 it takes less than a minute for the inspection of a part, which is a drastic reduction compared to the 20-30 min needed in [3] only for the scanning module to operate, as    [3] AND OUR METHOD shown in Table IV. It is also noted that we have not optimized our implementation for the Jetson so there is further room for improvement in terms of execution time.
Testing set predictions for all the developed models are shown in Fig. 14. We observe that the predictions follow the increasing ground truth trend, and, thus, offer insight into the amount of the dispensed glue. Without error correction to regularize the outputs, the predicted volume is usually underestimated, which is attributable to the bias existent in the training samples. On the other hand, after regularization, predictions accurately follow the mean. Furthermore, we notice that the performance of the models is directly related to the variance of label noise. Specifically, the MSE is lowest for types A and E, which are the least affected by noise, whereas it is maximal for types B and D.

V. CONCLUSION
In this article, we propose a methodology to replace expensive laboratory sensors with in situ ones, and demonstrate its potential by applying it for the development of a PCB inspection system that only relies on the use of an industrial camera and a Jetson unit. The developed system makes use of our R 2 esNet architecture to perform glue volume regression along with an error-correction network to cope with label noise. Moreover, a segmentation and detection system that significantly simplifies the data acquisition process was developed. An important contribution of this work is the ability of R 2 esNet to regress a 3-D geometric quantity from 2-D data.
The effectiveness of our system was demonstrated by the various experiments performed. Despite the challenging nature of the problem addressed, we obtained satisfactory results. Specifically, predictions were accurate enough to facilitate the quality inspection process and hence reduce inspection time, which is a crucial factor toward the optimization of the production process. Another interesting finding is the performance degradation observed for deeper architectures, even though ResNets are specifically build to address this issue.
To further benchmark and evaluate the limitations of the proposed methodology, a potential extension is its deployment in other industrial use cases, where we can explore how well it can generalize in terms of domain adaptation and inferring values outside the nominal ones used during training. Another interesting extension that can further improve the inspection process is the deployment of the proposed system on augmented reality gear.

ACKNOWLEDGMENT
The opinions expressed in this article are those of the authors and do not necessarily reflect the views of the European Commission.