Machine Learning (ML)-Based Model to Characterize the Line Edge Roughness (LER)-Induced Random Variation in FinFET

ML (Machine Learning)-based artificial neural network (ANN) model is proposed to estimate the LER (line edge roughness)-induced performance variation in Fin-shaped Field Effect Transistor (FinFET). For a given LER features such as rms amplitude(<inline-formula> <tex-math notation="LaTeX">$\Delta$ </tex-math></inline-formula>), correlation length along x-direction (<inline-formula> <tex-math notation="LaTeX">$\Lambda _{\mathrm {X}}$ </tex-math></inline-formula>), and correlation length along y-direction (<inline-formula> <tex-math notation="LaTeX">$\Lambda _{\mathrm {Y}}$ </tex-math></inline-formula>), the metrics for device performance such as on-state drive current, off-state leakage current, threshold voltage, and subthreshold swing can be computing-efficiently estimated with the ANN model.


I. INTRODUCTION
For the last a few decades, complementary metal oxide semiconductor (CMOS) technology has been successfully evolved with the adoption of new techniques such as stress engineering in 90 nm technology node and beyond [1], high-k/metal-gate in 45 nm technology node and beyond [2], and 3-D advanced device structure in 22 nm technology node and beyond [3]. In every new CMOS technology platform, the physical dimension of metal oxide semiconductor field effect transistor (MOSFET) has been scaled down not only to increase the density of devices in integrated circuit (IC) but also to improve the functions of IC per cost. However, process-induced random variations (i.e., transistors' electrical characteristics such as threshold voltage, on-state drive current, and off-state leakage current, are randomly fluctuated/affected while fabricating transistors in FAB), have negatively affected the manufacturability of CMOS devices, and thereby, it would significantly hinder the evolution of CMOS technology [4]. The root-causes of process-induced random variation are classified as (i) line edge roughness (LER), (ii) random dopant fluctuation (RDF), and (iii) work function variation (WFV) [5]. Especially, LER would degrade the device performance but also indirectly affect the other random variation sources (i.e., RDF and The associate editor coordinating the review of this manuscript and approving it for publication was Sneh Saurabh . WFV) because it induces structural variations in device [6].
With the most radical shift in device structure in the year of 2011, i.e., from planar bulk MOSFET to 3-D MOSFET (i.e., FinFET), the process-induced technical issues become much more severe [7]. Therefore, as the device architecture becomes more complicated (in reality, multiple bridge channel field effect transistor (MBCFET), stacked nano-wire FET, stacked nano-slab FET, etc. for 3 nm CMOS technology node [8] and beyond), understanding the impact of LER on device performance is desperately required in developing variation-robust silicon device at 3 nm technology node and beyond [9].   A few studies have reported to understand, quantify, and analyze the impacts of LER on device characteristics [10]- [12]. TCAD (Technology Computer Aided Design)based method has been adopted to propose model for finely and accurately predicting the impact of LER [13]. However, the TCAD-based approach is fundamentally very time-consuming and computationally-inefficient when predicting thousands of LER-induced input transfer characteristics of MOSFETs in integrated circuit. Thus, a few studies [14], [15] have tried to compactly model the impact of LER on the device performance. Nevertheless, due to many technical barriers in developing a new compact model, the compact model for analyzing the impact of LER [14], [15] would not be timely developed, even though the LER on the fin sidewall of FinFET should be modeled for two-dimensionally characterizing/understanding the sidewall surface [7], [13]. Therefore, using Machine Learning (ML) technique, simple but eye-catching novel approach with reasonable accuracy is proposed in this work, to provide an alternative device solution for predicting the process-induced variation.

II. DEVICE DESIGN AND DATA GENERATION A. LINE EDGE ROUGHNESS PARAMETERS
Generally, 2 or 3 parameters (e.g., , , and α) are used to describe the LER profile in planar MOSFETs, and 3 or 4 parameters (e.g., , x , y , and α) are used in 3-D MOSFETs. The impact of each parameter in LER profile is comparatively described in Fig. 2. The details of each parameter used in Fig. 2 are as below [16]: (i) Amplitude ( ): the root-mean-square(rms) value of roughness amplitude. The smaller is, the smoother the surface is.
(ii) Correlation length ( ): how closely the correlated edge is associated to its neighboring edge. The larger is, the smoother the surface is. (iii) Roughness exponent (α): the high frequency component of roughness. The larger α is, the smoother the surface is.

B. DEVICE DESIGN WITH LINE EDGE ROUGHNESS
A three-dimensional (3-D) bird's-eye view of FinFET with a 3-D LER on its sidewall fin is shown in Fig.1. The device design parameters of nominal FinFET device are summarized in Table 1. To reconfigure the surface roughness on the sidewall fin of FinFET, the quasi-atomistic model [13] was used. The steps to generate a rough surface are as below: Step I: Define key parameters such as , x , y , α, Step II: Obtain the 2-D power spectrum by taking the fast Fourier transformation (FFT).
Step III: Obtain the amplitude spectrum by taking the square root of the result in Step II.
Step IV: Obtain the 2-D impulse response by taking the inverse fast Fourier transformation (IFFT) on the result in Step III.
Step V: Generate the white Gaussian noise (wgn) and take the 2-D convolution of the result in Step IV and wgn.
Step VI: Once the steps above are done, import the generated surface coordinates to TCAD with Sentaurus Structure Editor. ACVF (x, y) In (1), x and y are the correlation length along x-direction and y-direction of surface, respectively. determines the relation between x and y direction.

C. DATA GENERATION
To build and verify the Artificial Neural Network (ANN) model, 100 different data sets (note that each data set consists of 50 different FinFETs with identical LER parameters) were  x = 20 nm, y = 50 nm, α = 1, = 0. Afterwards, the value of three LER parameters ( , x , y ) are randomly chosen from a given range for each LER parameter, as follows: from 0.2 nm to 0.8 nm, x from 10 nm to 100 nm, and y from 20 nm to 200 nm. The distribution of each LER parameter in the limited range follows the uniform distribution. Note that α is set to 1, and is set to 0. In fact, in order to take account into the impact of α on a LER profile, a very small sampling distance is necessary.  However, the small sampling distance should cause the tremendous amount of computational works in TCAD simulation runs. In real, α is usually out of sight in many other studies on LER [11], [14], [15], [20]. Regarding , we set as 0, for simplicity. This means that the roughness along x-direction is independent of that along y-direction. Then, I d -vs.-V g characteristic of all FinFETs in 100 different data sets were simulated using the TCAD, and thereafter, the performance metrics (e.g., I off , V t , I on , SS) were extracted out [see Table 2 ].
Those data sets were separated into three groups: training data sets, validation data sets, and test data sets. The training data sets are used to update the ANN model components such as weight matrices and bias vectors. The validation data sets are used to monitor if the ANN model is well trained or over-fitted in the training process. After the training process is finished, the test data sets are used to verify if the ANN model is well trained or not [see Fig. 3].

III. ARTIFICIAL NEURAL NETWORK MODELING A. FULLY CONNECTED LAYERS
This ANN model has 1 input layer, 1 output layer and 3 hidden layers with 3 activation functions (ϕ), [see Fig. 4]. The hyperbolic tangent (tanh) is used for activation functions. It is mathematically defined as follows: Weight matrices (W 1 , W 2 , . . . , W 4 ) and bias vectors (b 1 , b 2 , . . . , b 4 ) of ANN model can determine outputs. When training the ANN model, those matrices and vectors are updated in order to be fit to the training data sets for specified number of iterations.

B. GRAFTING PROBABILITY DISTRIBUTION
In this study, we assumed that the distribution of performance metrics follows the multi-variate Gaussian distribution to securely build the model for estimating the LER-induced performance variation of device. It is known that the LER-induced variation of V t , I on , SS, and log 10 I off approximately follows the Gaussian distribution in various devices [11], [21], [22].
To train the ANN model with probabilistic layer, we used Maximum likelihood estimation (MLE) method. Based on the observation (e.g., Y ), the MLE method is a technique for estimating parameter θ , when there is the input X. In other words, the final goal in this method is to find θ that maximizes P(Y|X; θ ) or can be mathematically rewritten as in (3): The parameters such as X , Y , and θ can be redefined in our model as follows: X : , x , and y (LER parameters) Y : {y 1 , y 2 , . . . y 50 }, y i : observed I off , V t , I on , and SS θ: mean vector and covariance matrix To train the probability-grafted ANN, we used ''Negative log likelihood'' (negloglik) as a loss function. Negloglik notifies how much two other distributions are different from each other.
Using Adam Optimizer [23], the training process was executed for 200,000 epochs (776 sec) with learning rate of 10 −5 . The model was trained without overfitting [see Fig. 5]. Fig. 6 shows how I on is varied with modifying the LER parameters. Table 4 and Fig. 7 show the comparison between the TCAD data (=test data set) and the prediction data by the ANN model. Based on the probability density function determined by the mean vector and covariance matrix, the prediction data was ''randomly'' extracted. Hence, they are slightly different from TCAD samples, but they can never be identical to TCAD samples. Thus, the accuracy of prediction data was evaluated using the confidence interval calculated by the standard error of mean and standard deviation [24]. Herein, the predicted values of population mean and standard deviation by the ANN model are considered as the true population mean and standard deviation.

IV. RESULTS AND EVALUATION
n : number of samples in 1 set of data.
Standard error of standard deviation ≈ σ √ 2 (n − 1) (5) VOLUME 8, 2020 Table 3 shows the comparison of simulation time of TCAD vs. ANN. It is noteworthy that the advantage of using the ANN model becomes conspicuous when the number of data (or the size of data) is bigger than 10,000 or more. Note that the ANN model was built using the Tensorflow 2.0 and Tensorflow-probability python library [25], [26].

V. CONCLUSION
Line edge roughness (LER) is one of key sources inducing undesirable variation in transistor performance. These undesirable fluctuations affect the operation of circuit, and thereby, they can cause unexpected errors. Therefore, it is important to understand the factors causing the random variation in an accurate manner within reasonable time. In FinFET, the structural deformation by LER appears not as a shape of line but plane. Thus, the compact modeling method would not be the right option for solving a problem with increased complexity. To avoid these difficulties, we used the ANN model and suggest alternatives to predict the process-induced random fluctuations. With accurate predictions (which meets the confidence interval of 99%), our method is expected to help analyze the effects of LER in fabrication process and to evaluate yield of integrated circuit (IC).