Extraction of Device Structural Parameters Through DC/AC Performance Using an MLP Neural Network Algorithm

We proposed a neural network (NN) approach that uses two multi-layer perceptron (MLP) NNs an encoder and a decoder to estimate the structural parameter (S<sub>para</sub>) of a 14-nm node fully depleted silicon on insulator (FDSOI) field-effect transistor (FET). When outputs defined by the same input exist, the proposed NN algorithm achieves loss function convergence during NN training. The decoder takes inputs of on/off current ratio, delay, and power to represent DC/AC performance for high performance (HP), low operating power (LOP), and low standby power (LSTP) applications. With the pre-trained encoder learned with R coefficients of the regression plot over 0.99 and an average percent error of approximately 1%, the decoder was modeled to estimate the S<sub>para</sub>. Our decoder successfully estimated all S<sub>para</sub> within the range that satisfies the technology node. The tendency of S<sub>para</sub> satisfying the desired figure-of-merits (FOMs) in device design can be confirmed by comparing the estimated S<sub>para</sub> of the upper 5 % and 10 % cases. Furthermore, it can provide device design guidance from various perspectives by presenting numerous alternatives of distinct S<sub>para</sub> sets, even when the FOM value is the same (duplicate input values). If undesirable FOMs are extracted, it is possible to determine the causal S<sub>para</sub> and provide immediate process feedback on the related unit process using the S<sub>para</sub> estimated from the lower 5 % of FOMs. We performed a detailed physical analysis as an example of a delay in LOP application. NN estimation results were analyzed using gate length (<italic>L<sub>g</sub></italic>), SOI thickness (<italic>T<sub>soi</sub></italic>), and drain-side spacer length (<italic>L<sub>spd</sub></italic>), which mainly affect gate capacitance (<italic>C<sub>g</sub></italic>) and effective current (<italic>I<sub>eff</sub></italic>). In addition, source-side spacer length (<italic>L<sub>sps</sub></italic>) and source/drain junction gradient (<italic>L<sub>sdj</sub></italic>) showed behaviors different from those generally selected by human experts and cases where maximal values were not estimated within the set range. The estimation of S<sub>para</sub> using the NN was effective and powerful, reducing process cost and feedback time.


I. INTRODUCTION
Numerous design and production processes are required to manufacture semiconductor chips suited for various applications; these processes are large-scale, expensive, and time consuming more than a month. Various test inspections are carried out throughout the process to improve product quality. Before packing, a wafer test is typically performed, followed by a chip test. Electrical die sorting (EDS) is a test performed on wafers before packaging. It detects whether each die meets the necessary quality level by measuring the electrical parameters and determining whether the device operates appropriately. In other words, by selecting the defective die, the rate of defect that arises during the early stages of the product can be effectively limited. Thus, the product will not pass packaging unless it meets the performance requirements of the target application. In addition, identifying and resolving defects through various test inspections try to increase yield, reduce costs, and manufacture high-quality semiconductors. Despite numerous test methods, issues to be resolved to improve yield always exist. First, when the EDS test is executed for a manufactured wafer, it is impossible to run a full test on all transistors due to time, space, and cost constraints; thus, only some transistors for the test pattern are determined. Second, while numerous figure-of-merits (FOMs) can be used to evaluate device performance, it primarily measures and assesses FOMs that can be used to reflect the qualities of an application that are appropriate for a certain purpose. Third, it is not easy to extract the structural parameter (Spara) of the fabricated device from the electrical characteristics in terms of the device, which is the lowest level in semiconductors. Spara can be investigated by a transmission electron microscope (TEM) or a scanning electron microscope (SEM). However, it is expensive and has the drawback of destructive testing, which necessitates wafer cutting. In addition, the existing method is difficult to apply to a large number of wafers or chips due to time and cost limitations. Spara is primarily concerned with the design and affects device performance. It is also directly involved in the Spara-related unit process. Thus, it is difficult to rapidly determine which Spara is problematic and which unit process is related to it when an incorrect measurement result is obtained. Because different components in various production processes operate together due to the sequential process, understanding Spara in the semiconductor manufacturing process is critical for determining the origin of the defect and resolving the issue on the device side. Owing to the development of higher computing capacity, such as GPU parallel computing and the creation of distributed processing environments, machine learning (ML) technology [1] has recently been employed as a novel approach in different domains. ML can predict future occurrences by learning complicated correlations between inputs and outputs. It has the advantage of making accurate predictions in a short time. In the field of semiconductor devices, for example, the correlation between Spara and electrical properties can be forward-estimated and backward-optimized using ML approaches and quickly and reliably applied to design and analysis. Furthermore, ML can be applied to various new devices, such as vertical nanowire FETs, to aid electrical characterization and provide insights regarding device and process design [2][3][4]. This study intends to provide insights for problem solving at the device level for incorrect results throughout the semiconductor test process by estimating Spara using the ML technique. In addition, a guideline for device design that can satisfy the FOM of the desired application is offered through the estimated Spara. We used a 14-nm node fully depleted silicon on insulator (FDSOI) field-effect transistor (FET) to achieve these goals. Its excellent performance and ultra-low leakage qualities make it popular in the network, consumer devices, a microcontroller unit (MCU), and internet of things (IoT) goods. Particularly, the introduction of buried oxide (box) facilitates body bias control, enabling wider threshold voltage modulation and lower static power consumption, which can be applied to various applications [5][6][7][8]. We are interested in using semiconductor devices that are classified as high performance (HP), low operating power (LOP), and low standby power (LSTP) applications. Therefore, the on/off current ratio, delay, and power indicating DC/AC performance for each application are used as input. Then, we propose a neural network (NN) algorithm to quickly find the Spara corresponding to the application.

II. RELATED WORK
For a 32-nm node high-k metal gate transistor, Choi et al. [2] established a new framework for semiconductor device design and analysis using the ML approach. They used NN to achieve precise electrical modeling between Spara and FOMs. Using the gradient descent (GD) method and modeled NN, device optimization was performed to automatically find the optimal Spara set that satisfies the specified FOM. The results of NN optimization were similar to those obtained by human experts. However, it has been demonstrated that a significant amount of time is saved. They can also analyze the tendency of changes in Spara without performing numerous simulations by the sensitivity of each Spara on the FOM with the modeled NN. Yun et al. [3] used NN to estimate the relationship between the Spara of 14-nm node FDSOI FETs and the on/off current ratio for three semiconductor applications. The FOMs were then improved through device optimization, which determined the best device structure for each of the three applications. Furthermore, the analysis of sensitivity of FOM to significant Spara performed with NN was shown to be quite comparable to that performed using actual device physics. Choi et al. [2] assumed Spara was a completely independent input feature within a specified range at the time. In contrast, Yun et al. [3] considered a design guideline that demands a fixed range or correlation of some Sparas from a given technology node in actual device design. Thus, Spara partially depends on existing input features. By altering the range of these Spara in real-time throughout the optimization, they could find the best option for the technology node. Choi et al. and Yun et al. performed device design, optimization, and analysis for semiconductor devices using multi-layer perceptron (MLP) NN [9][10]. Furthermore, NNs comprise architectures in which the input dimensions are larger than the output dimensions, and the input features have a nearly or completely independent relationship. We use the ML technique for failure analysis in the semiconductor process, not for semiconductor device optimization and analysis. The proposed technique can be directly used in the semiconductor test process by considering the FOMs retrieved in the actual wafer test process as input and calculating the output Spara. The input dimensions of our NN are smaller than the output dimensions, and there is a positive or negative correlation between input features. That is because only the applied voltage varies depending on the application, and each FOM uses the same formula to calculate it. Furthermore, the pair of input and output data does not have a one-to-one correlation because data have the same FOM value even when the Spara sets are different. This is because structures of different devices have the same FOM value. Thus, the present MLP NN fails to learn as the training loss does not converge. These challenges are solved using two MLP NNs. Finally, instead of device development and optimization, our goal is to provide a mechanism to immediately identify device-side concerns during the fabrication.

III. DATA CONFIGURATION AND NEURAL NETWORK METHODOLOGY A. Data Configuration of the FDSOI FET Device
We used the Sentaurus TCAD simulator [11], a semiconductor device simulation tool, to simulate 40,000 massive data for NN training. Using 17 parameters related to geometry and doping in the device design and manufacturing process, we collected data through random variation. The minimum and maximum ranges were assigned to each Spara during parameter randomization, reflecting the technology node of the device. We set the range based on the design rule of the node because we used a 14-nm device [6,7]. First, we obtained the current-voltage (I-V) and capacitance-voltage (C-V) curves from a TCAD simulation, tested at various gate voltages, Vg, depending on the application. Then, for three applications-HP, LOP, and LSTP-FOMs of on/off current ratio (Iratio), delay, and power were calculated using the current, voltage, and capacitance (Eq. 1-3). Typically, while designing a good-performance semiconductor device, an operation point (Q-point), which is a target for the optimum DC performance, is initially established. Subsequently, the small-signal (AC) characteristic is optimized. Although this technique is sequential, we estimate Spara using the NN to assess both performances simultaneously.
The currents, ION and IOFF, flow when the transistor is turned on and off, respectively. The operating voltage is Vdd, and the gate capacitance is Cg. When Vg and Vd are Vdd and 0.5Vdd, respectively, IH is the extracted current. When Vg and Vd are 0.5Vdd and Vdd, respectively, IL is the extracted current. We  This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. used the density-gradient model and the mobility model describing doping-dependent and high-field saturation as the device physics model in TCAD simulation. Furthermore, the doping-dependent Shockley-Read-Hall model and Auger generation-recombination model were combined with bandto-band tunneling of the Hurkx model for the recombination model. In addition, the implant technique used a gaussianfunction doping profile, and the gate-electrode/dielectric interface material was HfO2/SiO2, which has a fixed charge concentration of 10 12 cm -3 . We adopted the Spara that can estimate the actual length at the corresponding technology node and can be manipulated in TCAD simulation. Thus, we adopted Spara estimated from TEM image of 14-nm node FDSOI FET hardware [5]. The definition of Spara and the range of Spara values for data creation are shown in Table 1. The Spara range was set in consideration of the design rule within the range that does not deviate from the technology node. In Fig. 1, a correlation matrix depicts the Pearson correlation coefficient-calculated correlation for each input feature and output. There are positive or negative correlations between input features (Fig.  1a). First, there is a positive correlation in different applications of the same FOM (solid box). Because each FOM is calculated using the same formula, and only the applied voltage varies depending on the application. Second, there is a negative correlation between Iratio and power (dashed box). The original correlation between two FOMs is positively correlated because they depend on the Ion. However, we take the reciprocal of Iratio so that it has a small value to facilitate NN training. Therefore, it appears that Iratio and power have a negative correlation. The output, in contrast, exhibits essentially no correlation (Fig. 1b). However, we can confirm a weak negative correlation for some Sparas because the epitaxial length (Ls/d) of each area determines the maximum source-and drain-side bottom contact (Lconb(s/d)). The overall gate pitch Ltot is set to 70 nm, and Ls/d is the same as that in Eq. 4. In other words, Lconb(s/d) is somewhat dependent on Lg, Rsd, and Lsp(s/d). In addition, the maximum of Lconb(s/d) is (Ls/d-Rsd) (Eq. 5). Thus, a weak negative correlation emerges, which is unavoidable due to the 14-nm node design rule of the device.
NN, particularly MLP NN, learns the relationship between input and output from the given data by treating the input as an independent variable and output as a dependent variable. However, we make up inputs as the correlated dependent variables and the outputs as the independent ones. Because the FOMs measured in the semiconductor test process depend on Spara, we use them as inputs. Furthermore, the input and output data pairs have data pairs with identical input values, implying that the data to be trained as a one-tomany relationship can produce multiple solutions for a single input. Therefore, we enable NN training using the proposed NN algorithm and correctly estimate Spara for these circumstances, as illustrated in the next section.

FIGURE 2. (a) A conceptual diagram of the decoder designed to estimate Spara with a pre-trained encoder and limiter. Schematic diagram of (b) encoder and (c) decoder MLP NNs.
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. We propose an approach that allows NN training when inputs and outputs are in one-to-many correspondence using the MLP (Fig. 2a). Vanilla MLP can successfully handle the problem of non-linear relationships by adding a hidden layer with weight and bias terms between the input and output layers. It learns the correlation between input and output, assuming an independent relationship between input features and a unique solution of output to input. Thus, it can overcome one of the major weaknesses of single-layer perceptron (SLP) [12], which only works with linear relationships. Furthermore, by altering the number of perceptrons for appropriate input and output, vanilla MLP can quickly solve various issues and be flexibly applied to new fields [13][14][15]. However, our proposed NN algorithm deals with the case where correlations exist between input features, and the solution of the input to the output is not the only solution. At this time, if we train like vanilla MLP, it fails to train due to the characteristics of our data. Therefore, we propose an algorithm consisting of two vanilla MLPs, a pre-trained encoder with a larger input dimension (Fig. 2b) and a decoder with a larger output dimension (Fig. 2c). The pre-trained encoder supports a train for the decoder, and the decoder is a core NN that estimates Spara by receiving FOMs from each application. In general, during NN training, the NN calculates the loss value through the set loss function, and the NN is updated in the direction to minimize this value. Also, in supervised learning, where the output ground truth (gt) exists, the main factors of the loss function are the output value extracted by the NN and the gt of the output. Therefore, the loss is calculated according to the definition of the loss function set through these two factors. However, the loss function of our decoder is newly defined through the pretrained encoder. In other words, the pre-trained encoder is an MLP that has learned the correlation between Spara (input) and FOMs (output) in advance, and it contributes to updating the loss function of decoder (Input-FOMs, output-Spara).
Therefore, the loss function process of the decoder is done in the following order (Eq.6): 1) The output (Spara') estimated by receiving the FOM from the decoder is input to the pretrained encoder.
3) The loss function of the decoder is calculated using the FOM estimated from the pretrained encoder (FOMencoded) and the FOM of the gt (FOMgt) as the input of the decoder, then the NN is updated. Thus, the proposed NN algorithm can find solutions even for duplicate solutions and it can be applied to modeling any arbitrary nonlinear function. In addition, when the decoder estimates Spara, it uses a limiter, g, which ensures that each Spara does not deviate from a pre-determined range. The limiter maps the existing range of each Spara between -1 and 1 and corresponds to the hyperbolic tangent (tanh) transfer function of the output layer. The g -1 function, which performs inverse operation of the limiter, is used to restore the original range of each Spara.
A flow chart for training the decoder including the pre-trained encoder is shown in Fig. 3. We found the optimal dataset size needed for training empirically, which is obtained by acquiring additional data if the encoder is not properly trained. Then, re-training is performed by tuning the network hyperparameters of both the decoder and the pre-trained encoder if training fails during the modeling phase. Finally, the modeled NN is evaluated through the R coefficient of the regression plot and the percent error calculated from the estimated value. We used the following network hyper-parameters. First, common to both the decoder and the pre-trained encoder, the dataset is partitioned into training, validation, and test sets at ratios of 0.80, 0.10, and 0.10, respectively. The transfer function of the hidden layer is implemented using the tanh function. The mean squared error (MSE) between the output and target values is used as a function to reduce the training loss. In addition, the log-scale was applied to Spara related to doping and FOMs to prevent NN training failure due to largescale differences between input and output values. Second, the pre-trained encoder comprises 9-MLP NNs for each FOM, and the identical network hyper-parameters are applied across them. For training, we adopted the Levenberg-Marquardt (LM) optimizer [16][17][18] to solve the non-linear least-squares problem. There were fifty hidden layers, and the transfer function of the output layer was linear. Third, the decoder used a resilient back-propagation (Rprop) train optimizer [19], one of the fastest weight update mechanisms available. In most cases, the hidden layer uses a sigmoid or tanh transfer function, whose slope approaches zero when the input value becomes very large or small. While utilizing the steepest descent method to train a multi-layer NN, the magnitude of the gradient may be quite small when updating the gradient. Thus, the change amount is minimal when the weight and bias are far from optimal. Thus, during network training, the problem of updating the gradient in an undesirable direction may arise. The Rprop optimizer eliminates the magnitude of the negative influence of the partial derivative. Therefore, the magnitude of the derivative does not affect the process of updating the weight. The direction of updating the weight to minimize the loss function is determined solely by the derivative sign. Thus, the weight change is halved when the sign changes and the gradient progresses from one iteration to the next. When the sign does not change, however, it increases by 1.2 times. If the slope is zero, the same updated value is maintained. Furthermore, the weight change diminishes with each vibration of the weight. Weight changes increase as the weight shifts in the same direction for multiple iterations. The decoder has 100 hidden layers, and the transfer function of the output layer, unlike the pre-trained encoder, uses the tanh function to reflect the result of the limiter. We also set the minimum performance gradient to 10 -5 to avoid over-fitting in training. The number of validation checks which were repeated continuously with degradation, was set to five. Thus, during training, we specified these two stop conditions to pursue network generalization. In the next section, we show the training results of the pre-trained encoder and the decoder, as well as the Spara found using the modeled decoder.

A. Neural Network Model Performance Evaluation
When the decoder is trained, the pre-trained encoder is modeled to update the network performance by minimizing the loss function. Because the performance of the pre-trained encoder influences the performance of the decoder, it must guarantee network performance and reliability of the pretrained encoder for successful decoder modeling. The training results of the pre-trained encoder are shown in Tables 2 and 3.  Table 2 shows the regression coefficient, R, of the regression plot for the training, validation, and test datasets and the MSE training loss modeled using MLP NN for all FOMs to Spara as input. R represents the relationship between the output and target values. When R is 1, the output and target values are the same, in an ideal scenario. For all FOMs, the R we derived is approximately close to 1. Furthermore, the final MSE shows a sufficiently small value within an acceptable range after completing the training. For the training and test datasets, Table 3 shows the average and minimum percent error values obtained from the value of the FOM estimated by the pretrained encoder. In both datasets, the average error is typically minimal, i.e., less than 1 %. Thus, the pre-trained encoder has been confirmed to have been reliably trained via the two tables. The larger the on/off current ratio shown in the text, the better the performance. However, for ease during training, we used the off/on current ratio by taking the reciprocal of Iratio. Thus, the current ratio refers to the off/on current ratio. Following this, the network performance of the decoder is demonstrated using the procedure and training results using the pre-trained encoder (Fig. 4). The observed loss reduction during training is shown in Fig. 4(a). According to the training optimizer, the training (blue line) and validation (green line) gradually   This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. to training. In addition, as the epoch rises, the network performance gradient declines progressively, and the gradient becomes saturated when the loss reaches saturation in Fig. 4(b). The best validation performance at this time was at the 3904th epoch. The validation checks surpassed the training stop condition. Hence the final epoch was halted at the 3909th epoch (Fig. 4c).  [2], in which the best solution was found along the full hypersurface of the trained NN without the constraint of the range of permitted solutions. We do not intend to find a case outside the technology node of the target device. Thus, within a given technology node, we can successfully estimate Spara that satisfies the FOMs of the required application. Fig. 5 shows the percent error of FOMs extracted using the estimated Spara to ensure that the decoder results are reliable. In all FOMs, the average error is 0.1 %, and the highest error is not more than 1 %. The trained NN deals with 17 Sparas for each of the 9 cases (3 FOMs for each of the 3 applications). Therefore, it is not easy to show Spara's analysis for all cases due to space issues, so we selected a specific FOM, delay, for a specific application, the LOP application as an example. The selected Sparas are parameters that directly affect the delay, and device analysis was performed through them. Note that our NN results are not optimizations of semiconductor devices. Instead, our goal is to find the device Spara of a larger dimension when input FOM of a small dimension for each application. Thus, the NN result does not estimate one Spara set for one FOM value but can quickly find several Spara sets that satisfy the desired FOM. For the LOP application, the boxplots of the selected Spara corresponding to the delay that meets the upper 5 %, upper 10 %, and lower 5 % requirements are shown in Fig. 6    This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication.

FIGURE 7. Changed values (solid lines) and rates (bars) in gate capacitance (Cg) and effective current (Ieff) when gate length (Lg) increases within the range of a given technology node.
upper 5 % and upper 10 %, each Spara implies the target criteria for the design it should have, considering the correlation with other Spara. Tendencies of the individual Spara values can be observed as the boundary becomes larger. Through parameter splitting, this work can reduce the load of analyzing the influence of the related Spara. Furthermore, if the vertical range for each Spara is narrow, it must be carefully controlled during device design. This implies that the design margin is rather large in the opposite case. Second, when comparing the upper 5 % and lower 5 %, we can confirm that Spara has opposite tendencies or overlaps with the upper and lower cases. Therefore, when actual measurements result in unwanted FOM values such as lower 5 %, these FOM values can be entered into the trained decoder. Then, the NN can determine which Spara is abnormal by estimating the Spara set corresponding to the FOM value. This allows for immediate feedback on the unit process associated with the abnormal Spara. It can also help with device design considerations that should be avoided. Table. 5 shows the median values of some Sparas for semiconductor analysis on the delay of LOP application. Delay is affected by gate capacitance (Cg) and effective current (Ieff), and a small value is required to enable high-speed operation (Eq. 2). Therefore, five Sparas to be analyzed in terms of semiconductor technology were selected as examples. First, we selected the Lg, Tsoi, and Lspd, mainly affecting Cg and Ieff. Second, with Lsps and Lsdj, we show the impressive results found by NN, usually not picked up by human experts. In devices small enough to show the short channel effect (SCE),  an increase in gate length (Lg) decreases the area where the source/drain-substrate depletion region penetrates the channel. Thus, gate controllability increases, SCE minimizes, and Ieff increases. Also, the Cg increases due to the wider gate area at fixed oxide thickness. In Fig. 7, in devices where SCE occurs, an increase in Lg affects the increase in Ieff more than an increase in Cg. In general, it is attempted to decrease Lg to reduce device size, but it shows that Lg needs to be increased to decrease the delay within the range set from 20 to 26 nm. Therefore, the NN estimates a large Lg value to satisfy a smaller delay value (upper 5 %) and a small Lg value for the lower 5 % case. Since the purpose of the NN is not an optimization process to find an optimal value but a process to find a Spara that satisfies the input FOM value, the maximum/minimum Lg values are not estimated for the minimum/maximum delay values.
Reducing FDSOI FET device size requires thinner SOI thicknesses (Tsoi) to maintain the strong electrostatic property. Thinner Tsoi can reduce SCE by eliminating the leakage path, thus, increasing Ieff. Also, it causes an increase in Cg. At this time, the change in Ieff is more sensitive than the change in Cg, which is more pronounced when Lg is small (Fig. 8). Therefore, a thin Tsoi is required to obtain a good delay characteristic (small value), which agrees with the NN results that estimate a low Tsoi value in the upper case and a high Tsoi value in the lower case. Fig. 9 shows the delay characteristics according to Lg and Tsoi. When Tsoi is the minimum value, it has a fairly small delay value regardless of Lg. However, when Lg is relatively large, the change in delay according to Tsoi is less sensitive due to the improved SCE. Therefore, the NN estimates large Lg and small Tsoi to find Spara satisfying good delay characteristics. Drain-side spacer length (Lspd) was estimated to have a large value for the upper 5 % case and a small value for the lower 5 % case. Fig. 10 shows the delay and Ieff as a function of Lspd. When Lspd increases, doping at the drain moves away from the gate edge, so the parasitic fringing capacitance decreases, and thus Cg decreases. In addition, as Lspd increases, the series resistance increases due to the extension of the gate underlap, resulting in a linear decrease in Ion. However, as effective channel length increases, SCE such as DIBL and SS is minimized, and Ioff decreases exponentially, resulting in an Ieff increase. Therefore, delay reduction can be achieved due to a decrease in Cg and an increase in Ieff. Thus, it can be seen that the trained NN adopts a large Lspd to satisfy a small delay value and a small Lspd value to satisfy a large delay value.
Note that our NN estimation result is not an optimization process to find the best device, and the semiconductor physics is not reflected in the NN training process. Therefore, Lsps shows that the NN estimation results are different from the human expert selection, and Lsdj shows that the numerically found results by the NN do not necessarily have maximum or minimum values. For a device symmetry, Lsps and Lspd, spacer lengths in the source and drain regions, are designed to have the same value. In addition, Lsps and Lspd have the same effect on the device. Especially, Lsps/d determines the overlap length in the source and drain regions and greatly affects SCE. Therefore, the NN will estimate large Lsps and Lspd for good delay characteristics and low for poor. However, since the NN finds a solution that satisfies the proposed NN without learning the physical mechanism of the semiconductor, different values of Lsps and Lspd were estimated, as shown in Table. 5. In particular, the difference between Lsps and Lspd was more than 2 nm in the lower case, and Lsps was estimated to be relatively high. When Lsps and Lspd have the same low value, the extracted delay is out of the range of the ground truth due to severe Ieff degradation. Therefore, the NN cannot estimate the delay within the set range, and Lsps is estimated to be relatively high to satisfy the lower 5 % case. Since a high bias is applied to the drain, the delay has a larger change rate in Lspd than Lsps when the spacer length increases (Fig. 11). Thus, NN satisfies the ground truth by slightly increasing the Lsps, having relatively little change rate. As a result, in the lower 5 % case, the difference between Lsps and Lspd occurs, which Lsps has a slightly higher value. This is a different result from the human expert designing symmetrically with the same values of Lsps and Lspd. As such, our NN can present a variety of design perspectives by providing options for Spara that are not normally selected.
Lsdj is the junction gradient, which means the distance at which S/D doping is 1/10 from the peak. As Lsdj increases, the effective doping concentration in the channel increases, and the S/D resistance decreases. Therefore, a suitably large Lsdj will achieve a small delay value. Therefore, Lsdj was estimated to be relatively higher than the lower 5 % case in the upper 5 % case. However, values above 9 nm are not estimated (Table. 5). Because, as a result of TCAD simulation according to Lsdj with other Sparas fixed as median values, the extracted delay exceeds the delay of ground truth when Lsdj is 9 nm or more. Therefore, the delay through the Lsdj in the non-estimated interval does not satisfy the ground truth, so the NN did not estimate the Lsdj for this interval. In other words, the estimation result through the numerical correlation between input and output does not always become the maximum or minimum of the range set. In addition, although NN does not learn the physical mechanism, it can confirm that the semiconductor mechanism is reflected through the ground truth that cannot This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. find due to the degradation caused by a specific Spara. Actually, Lsdj is controlled according to the annealing temperature and time in the actual process, but Lsdj in FDSOI FET has a small value, and the value itself is not a parameter that changes as much as the set range. That is, the amount of change in Lsdj is significantly less than that of other Spara changes. We gave an example of semiconductor analysis through 5 Sparas, but similarly, detailed analysis of semiconductor aspects can be applied to other Sparas as well.
Our method can estimate the Spara set in different scenarios for the same FOM value (duplicate input values). Therefore, Spara offers numerous design alternatives to satisfy the desired conditions. The design and production of semiconductor devices is a conservative process. Thus, if unfavorable results are produced, semiconductor engineers frequently try to solve the problem by altering the cause. However, using the proposed method to rapidly and diversely provide the range of the corresponding Spara to satisfy the desired FOM, makes a design perspective over a wider range possible.

V. CONCLUSION
We proposed an NN algorithm using two MLP NNs to estimate the Spara affecting the device design and unit process of 14-nm node FDSOI FETs. The NN input is a set of FOMs with smaller dimensions than the Spara output. In addition, a correlation exists between the input features and duplicate input values, which overcame the problem of nonconvergence and enabled NN training. For all FOMs, the pretrained encoder used to calculate the loss function for convergence is trained with an R value close to 1. Furthermore, in both the training and test datasets, the percent errors from the actual value show average values of 1 % or less. The encoder was used to train the decoder, and the training loss of the decoder fell in line with the validation loss without overor under-fitting. Thus, 17 Spara values were successfully estimated within the range specified by the 14-nm technology node. The percent errors of the decoder show averages of 0.1 % after inputting the estimated Spara into the pre-trained encoder. The parameter trend can be confirmed through the Spara that satisfies the FOM values belonging to the upper 5 % and 10 %. In addition, the Spara estimated from the duplicate inputs provides a different set of optional Sparas. If an abnormal FOM value is derived, as in the case of the lower 5 %, the corresponding Spara can be derived, and feedback on the unit process corresponding to the abnormal Spara is available. Therefore, the cause of failure on the device side can be immediately identified using the proposed NN algorithm in the semiconductor test process. In addition, we performed a detailed physical analysis as an example of a delay in LOP application. NN estimation results were analyzed using Lg, Tsoi, and Lspd, which mainly affect Cg and Ieff. Lsps and Lsdj showed behaviors different from those generally selected by human experts and cases where maximal values were not estimated within the set range. Our methodology can improve the inspection speed and yield during the test process by aggregating the estimated Spara values and spotting trends. This is more stable (non-destructive inspection) and economical compared to the existing methods (TEM or SEM image, destructive inspection) to extract the Spara of the manufactured wafer or chip. Furthermore, since our proposed method learns the relationship between input and output, the artificial neural network does not learn the physical phenomena of the device or the side effects that may occur when scaling. Therefore, this can be applied regardless of the type of device or technology node. Moreover, it is applicable to arbitrary non-linear function modeling. Finally, the proposed NN algorithm can be applied to various tasks in the semiconductor manufacturing process, including estimating and analyzing any systems with more outputs than inputs.