Device-Simulation-Based Machine Learning Technique for the Characteristic of Line Tunnel Field-Effect Transistors

With the rapid growth of the semiconductor manufacturing industry, it has been evident that device simulation has been considered a sluggish process. Therefore, due to downscaling of semiconductor devices, it is significantly expensive to obtain the inevitable device simulation data because it requires complex analysis of various factors. To develop a competent technique to analyze the performance of the line tunnel field-effect transistors (TFETs), the 3-D stochastic device simulation is integrated with a machine learning (ML) algorithm, named random forest regressor (RFR). Despite producing tremendous researches by the RFR model in the field of computer vision, the adoption of these ML algorithms in the field of the semiconductor industry has a lot of margin for progress. The ML-based RFR model is exploited to predict the effect of variability sources of line TFET under different biasing conditions. Results are promising and reducing the computational cost of device simulation by 99%. The prediction of effect of source variation is less than 1% as compared to the device simulation of line TFET. The application of the RFR on the line TFET device exhibits the power and flexibility of this approach because its evaluation with different bias conditions shows outstanding results.


I. INTRODUCTION
The electrical characteristics of tunnel field-effect transistors (TFETs) outperform as compared to complementary metal-oxide semiconductors [1]. However, for commercial device manufacturing, TFETs still need to improve some specific electrical characteristics, i.e., on-state current (I ON ) and steep or average subthreshold swing (SS avg ) while maintaining controlled off-state current (I OFF ). Nevertheless, TFET exhibits improved I ON and SS avg that can inaugurate various TFET applications regarding different device structures The associate editor coordinating the review of this manuscript and approving it for publication was Yiqi Liu . and material options [2]. The tunneling effects can be enhanced through crucial factors such as material [3], oxide (high-κ) [4], gate engineering (gate-all-around) [5], geometrical options (nanowire or nanosheets through confined width) [6], etc., [7]. In addition, the recent outbreak reveals that the utilization of ferroelectric material in TFET devices introduces the improvement of I ON through internal voltage amplification [8]. There is another way to improve the tunneling probability is by producing a stronger electrical field by experiencing new and different options, such as vertical or line tunneling mechanisms [9]. Other techniques such as the utilization of 2D materials and multi-channel concepts can enhance the performance of TFET [10], [11]. Owing to these options, we provide the promising structure of the line TFET in our recent demonstration [12]. However, there is a few prior research that can examine TFET performance exclusively by implementing emerging machine learning (ML) technology [13], [14]. For the first time, we analyze the effect of the line TFET characteristics by varying several device parameters such as source overlap length (L ov ), epitaxy (n) thickness (t n ), oxide thickness (t ox ) and work function (WK). Thus, it is vital to analyze the effect of these parameters of the line TFET by implementing emerging ML techniques.
Recently, ML is becoming visible in all research areas such as physics [15], mathematics [16], chemistry [17], etc. As we know that, ML has already paved its path in computer vision [18] and image processing [19]. But, ML has very few legitimate applications in the semiconductor industry. Therefore, in the field of semiconductor manufacturing, it has a broad way to explore its harmony with device simulation. In [20], the comparison of three different deep learning (DL) algorithms, i.e., artificial neural network, convolutional neural network and long short term memory, were implemented using device simulation data of gate-all-around silicon nanowire MOSFET and electrical characteristics were predicted through the work function fluctuation of random nanosized metal grains on MOSFET channel. Similarly, in [21], the ML algorithm was applied to predict the I D -V G curves obtained through device simulation of multichannel gate-all-around silicon nanosheet MOSFETs. Moreover, in [22], the DL model was implemented to predict five output features obtained from I-V/C-V curves. In [23], the DL model was investigated to predict the worst random discrete dopant configuration obtained through the device simulation. In [24], DL approach is utilized to estimate the work function fluctuation of gate-all-around silicon nanoseheet MOSFETs with a ferroelectric HZO layer. In our prior work [25], the ML-based random forest regressor (RFR) model was implemented on simulation data of the line TFET while considering only two device parameters, i.e., L ov and W with the fixed biasing condition, i.e., V D = 0.5 V and the error rate between predicted and simulated values was 5% which is recognized as a huge number in the field of ML technology.
RFR model has many advantages that make it an outstanding ML model. It emphasizes the feature selection that can help to give more importance to the valuable features and can prune the noisy/unimportant features. The other main advantage of RFR model is that it can handle non-linear data as well whereas, the other ML models have lacked this property. In this work, ML algorithms aim to overcome the computational cost and non-holistic optimization of a complex structure of the line TFET and its electrical characteristics. Instead of the derivation of conventional complex equations, ML algorithms are subsequently optimizing the possible solution for the device parameters without any specific knowledge of the device physics. In order to overcome the primary issues [26], three main contributions of this work are: (1) overcoming the complexity and ambiguity of device simulation at sub-3-nm technology node. (Nevertheless, the quantum confinement is crucial for the device dimensions below the effective width (∼5-7nm) [31]. Our specifications that fall under the sub-3-nm technology node are L g = 15nm, effective width >14 nm, and so on), (2) complex modeling and optimal solution without compromising the ML model's prediction accuracy, and (3) holistic and optimized solution for flexible design of the line TFET applications. In addition, our explored ML model is also able to predict the region of operation of the line TFET device by considering the I D -V G values. Instead of encountering few device parameters, the RFR model deals with the predictive inference using four crucial device parameters of the line TFET, i.e., L ov , t n , t ox and WK.
This paper is structured as follows. In Section II, device simulation and data collection procedures are explained. Section III demonstrates the modeling of the ML model based on the device simulation. Section IV presents the results and the discussion and Section V defines the conclusions and suggests the future work.

II. DEVICE SIMULATION AND DATA GENERATION
To provide the best accuracy of device simulation, the calibration [31] is performed with experimental data, as shown in Fig. 1. It can be seen that though the calibration is performed with respect to si device the SiGe has minor variability especially in terms of tunneling mass and energy bandgap (which are crucial for tunneling probability) [11]. Nevertheless, the figure is updated by calibrating with SiGe device [31]. In this paper, our proposed scaled line TFET (SLTFET) with nanosheet geometry is utilized, as shown in Fig. 2(a). 3-D device simulations [11], [12], [28], [29] by considering the band-to-band tunneling (BTBT) model of dynamic nonlocal and trap-assisted tunneling (TAT) for effective estimation of I OFF are utilized. The tunneling transport is determined by the evaluation of the BTBT model and TAT models, especially in TFETs. Here, TAT model estimates the influence of estimates the influence of trap-assisted-tunneling, which VOLUME 10, 2022 More discussion on these calibrations can be found in our recent articles [11], [25]. Fig. 2 demonstrates the design of SLTFET using an n-epitaxial layer over the channel and source and to improve the vertical gate-field via a L ov . The working principle of the explored TFET depends on both vertical (source-epitaxy) and lateral (source-channel) tunneling mechanisms. During the off-state; i.e. drain voltage (V D ) = 0.5 V and gate voltage (V G ) = 0 V, the tunneling length (λ) is longer to have reasonable BTBT across both the junctions. As long as the applied potential increases (on-state; V D = V G = 0.5 V) the BTBT rate exponentially increases for the generated gate verticaland lateral-fields as shown in Fig. 2(a'). Here, the magnitude of the vertical field is stronger than the lateral field as long as L ov exists. It is to be noticed that the key parameters of TFET depend on energy bandgap, effective mass, tunneling length, and so on, which are related to material engineering. However, the geometrical options with respect to structure selection influence L ov , t n , oxide thickness (t ox ), work function, etc. [30]. Hence, we investigate the performance of TFET through geometrical options rather than material considerations. The significance of each device parameters are described below.

A. SIGNIFIANCE OF OVERLAPPING LENGTH (L OV )
The significance of L ov is to modulate the vertical gate-field as well as vertical tunneling via p ++ -n (refer to Fig. 1(a)). The factor of L ov helps to improve the area of tunneling (A tun ), i.e. A tun = L ov * 2(W + t n ), where t n and W are the thickness and width of the channel. This refers to the proportionality of tunneling or A tun with respect to L ov and device dimensions such as t n and W . Here t n is varying as well as W is varied accordingly, as listed in Table 1.

B. SIGNIFIANCE OF EPITAXIAL THICKNESS (T N )
The value of t n is also significant because the band alignment between p ++ -n will determine the tunneling barrier length (λ). An appropriate band alignment is responsible for a greater tunneling rate [29]. Other material factors that will influence the tunneling rate are t ox and WK.

C. SIGNIFIANCE OF OXIDE THICKNESS (T) OX
The factor of t ox highly influences λ, as λ = (ε ns t ox t ns )/ε ox , where ε ns and ε ox are the permittivity of nanosheet and gateoxide, respectively.

D. SIGNIFIANCE OF WORK FUNCTION (WK)
The WK would make an effect on the subthreshold operation of the device that influences with a low threshold (V t ) and deviation in SS values. Here, titanium nitride (TiN) is used for making WK as an n-type device structure. The work function range of 4.2-4.4eV is considered to maintain a high I ON /I OFF ratio. This is because the TFETs suffer with low on-current at high WK. Therefore, it is meaningful to consider a low work function range that implies steeper band banding ( φ), proportionally high tunneling rate or on-current. In reality, it has been identified that the variation in WK from low to high with meaningful offset can be achieved through plasmaion implantation [33].
Hence we have specifically input these parameters into the ML RFR model to understand the device structure of line TFET. Furthermore, to improve the tunneling rate the explored SLTFET is utilized with hetero-structure having Si 0.6 Ge 0.4 as the source and the rest with that of Si. The data obtained through the device simulation is fed into the RFR model such as the input features are V G , L ov , t n , t ox and WK. These input features are composed of different ranges and each device parameter is generating 100 I D − V G curves, as shown in Fig. 2(b). Therefore, our explored RFR model is intrigued using 400 I D -V G curves. Similarly, the output feature is I D . After specifying the input and the output features for the ML model, it is necessary to split the data into the training and the testing sets. The split is user-specific as well as ML model-dependent. Furthermore, the normalization of a dataset is performed to compose the whole data into the standardized range to improve the accuracy of the ML model. Fig. 2(c) illustrates the input and output from the ML model as well as the possible ML application for the line TFET simulated data.

III. MODELING OF MACHINE LEARNING ALGORITHMS
Since 40 years ago, researchers have been struggling to formulate the simple equation of the complex structure of semiconductor devices. Therefore, to make a general model based on multiple hyperparameters, the RFR algorithm is implemented that can work for the SLTFET with the given input and output vectors. The ML-based RFR model is tuned with the help of various hyperparameters as listed in Table 2. RFR model having a bunch of parallel decision trees has the advantage to make a flexible model by varying the hyperparameters. In this work, the RFR model is based on 50 decision trees, as shown in Fig. 3(a). Before feeding the training set into the RFR model, the training dataset is preprocessed such as shuffling, normalization, and splitting of data into an appropriate ratio for training and testing the model. In general, the testing data remain unknown to the ML model. Firstly, the I D −V G curves are shuffled so that the ML model is able to be trained from each possible curve from all ranges of device parameters. Secondly, the data is normalized to eliminate the outliers. The linear normalization is performed by subtracting the data from its mean value and then dividing it by its standard deviation. After normalization, all the training dataset is in the range of -1 to 1. Thirdly, the data is split into 80% for the training set and 20% for the testing set. The partition of data into the training and testing set is presented in Fig. 3(b). Notably, while training the RFR model and during the evaluation of the trained RFR model, the mean squared error (MSE) value is calculated as a loss function. Moreover, in this work, the R 2 -score is also considered as the source of evaluation of the trained ML model. The higher value of the R 2 -score shows that the input variables are perfectly correlated, whereas, a value closer to 0 shows that the ML model is not valid and suffering from many problems related to train/test data split, noise in the data, unavailability of tuned hyperparameters of the ML model, and so on. Our approach is to use the RFR model to predict the I D curves from the given device parameters and the electrical characteristics.  are generated to collect the I D − V G curves with the variation of all the device parameters to exhibit the relationship among WK, t ox , t n and L ov .
This proposed idea is more efficient and accurate because many input features are investigated with respect to the hidden parameters of the ML model and the output reflects the collective as well as individual effect of all device parameters, i.e., WK, V D , t ox , t n and L ov . Notably, Table 1 lists the varying range of the device parameters of the SLTFET device utilized in the modeling of the RFR algorithm.
While optimizing the RFR model, there is no certainty to obtain the controllable range of real semiconductor device parameters because the training of the model does not relate to any device physics and may try to find the optimal solution for all the input features. To obtain the well-regulated range of the predicted output, it is necessary to scale the input features using Python's library, i.e., Scikit-learn [18]. All the experiments are operated in Python console on the computer with Intel i7-10700K CPU (3.97 GHz) and 32.0 GByte RAM.

IV. RESULTS AND DISCUSSION
50 trees-based RFR model using five input nodes and having an adjustable depth for each decision tree, has been implemented via Python's library, i.e., Scikit-learn. While implementing the RFR model, the number of trees and the depth of each tree has been determined by the number of I D − V G curves to learn the relationship between the device parameters and the target value. In addition, to achieve robust results, the number of training samples and hyperparameters must be in an appropriate manner to avoid both overfitting and underfitting of the RFR model.
Generally, the dataset is split into three subsets, i.e., training set, testing set and validation set. Validation and test set play an important role but sometimes validation set is not required when dataset is small and the error rate for training as well as testing data are in good agreement. Before feeding the dataset into the ML model, the dataset is split into the training and testing set. It is a complicated task especially when the dataset is small. The training and testing data split affects the various attributes of the ML model. For example, the accuracy and the best fitting of the ML model depend on the appropriate selection of train/test split. The appropriate train/test split tune the hyperparameters in such a way so that it can produce the best accurate predictive ML model. Therefore, while investigating the 400 I D -V G curves, to determine the best splitting ratio, the R 2 -score of the training of the explored ML model is calculated by considering varying dataset splits, as listed in Table 3. It can be seen that the most appropriate split for our small dataset is the case (d) and case (e). Therefore, randomly 320 I D -V G curves are selected as a training dataset and the rest of the curves are utilized for the evaluation of the trained RFR model. The RFR model is trained by considering the case d. The tunned hyperparameters utilized by case d are listed in Table 2. Moreover, Fig. 4 shows that 50 trees have the minimum root mean squared error (RMSE) for this dataset. Thus, our explored RFR model is constructed by using 50 number of trees. During the training of the RFR model, we stop splitting the nodes in the trees based on the generalization of the performance of the MSE value. For example, if the MSE value remains the same for the previous nodes, then the output is taken from that node. We repeat this process for several decision trees and output is obtained by taking an average of all the acceptable outcomes. As it has been known that the RFR model is stochastic, therefore, its performance can vary by the initial random parameter values. To avoid overfitting and for the best accuracy performance, the RFR model is trained and evaluated by initializing with different parameter values. The training of the ML model is physicsfree, i.e., it is working without any knowledge of the device physics. I D -V G curve is evaluated by using the RMSE value and R 2 -score. RMSE value measures the difference between the true value/simulated value and the predicted value (output from the RFR model). The lesser the RMSE value, the more accurate is the performance of our explored model. Similarly, R 2 -score is a statistical measure that reflects the fitness of the ML model. Its value ranges between 0 to 1. Moreover, a value closer to 1 exhibit the best performance of the ML model and vice versa. Notably, the training and the testing of the RFR model using different device parameters are illustrated in  simulated data and marker (o) represents the predicted value from our explored RFR model. It can be noted that the I D -V G curve fitting outperforms for all the device parameters and due to the rigorous R 2 -score, the RMSE value is diminished as well. Therefore, it can be concluded from Fig. 5 that, our explored RFR model learned the complex equations of the SLTFET for the given range of device parameters. Moreover, our well-trained RFR model can predict the I D -V G curves for the unknown device parameters but in the same perturbation range of simulated data. VOLUME 10, 2022  An illustration of the combined effect varying ranges of L ov and t n . It also shows the effect of these two device parameters on the prediction of I D -V G curves. It shows that the testing of the combined effect is two device parameters also outperform in terms of RMSE value and R 2 -score for fixed biasing condition, i.e., V D = 0.5 V.  Fig. 6, the relationship between the predicted and device simulated I D values is approximately linear which shows that the predicted I D values are close enough to the simulated (test) values. Therefore, it can be seen that our explored RFR model outperforms in terms of accuracy.
After exhibiting the effect of source of variation independently using 400 fluctuated devices, the second dataset is investigated, i.e., 200 fluctuated devices by varying L ov and t n simultaneously (WK, t ox remain constant) to study the relationship between them. Fig. 7 presents the I D -V G curves obtained through the device simulation as well as the prediction via the ML-RFR model. It can be observed that the R 2 -score is approximately equal to 99% and the RMSE value is very close to 1% as well. Thus, it can be concluded that our explored ML-RFR model performs well with two device parameters as well.
Thirdly, in order to demonstrate the compatibility and physics-free modeling of our explored RFR model, the model is trained and evaluated by different biasing conditions such as V D = 0.5, 0.05, and 0.005 V. It can be seen from Fig. 8 that the prediction of the I D -V G curve is outstanding in terms of RMSE value and R 2 -score. In short, it can be concluded that ML modeling is physics-free and does not require an exact equation to predict the target values.
Furthermore, to establish the relationship between all the explored device parameters, the model is trained by splitting the 2500 curves into 80% for training and 20% for testing the RFR model. Before feeding into the RFR model, the dataset goes through the preprocessing steps (as we already 53104 VOLUME 10, 2022  Table 4. (b) shows the testing of the RFR model and for the sake of visualization four cases are shown. Detail of the device parameters of the testing curves is listed in Table 4. discussed). The same hyperparameters are utilized in training the RFR model except for the number of trees. In this training, 100 number of trees are explored for the converged solution. It can be observed from Fig. 9 that the training and the testing of the RFR model outperform. The RMSE value and the R 2 -score show that the performance of the RFR model by varying all the device parameters is approximately similar to Fig. 5 and Fig. 7. Therefore, we can conclude that the relationship between all the explored device parameters is uncorrelated with each other. Notably, for the sake of visualization, out of 2500 fluctuated devices, only four I D -V G curves are shown in Fig. 9, for the training and the testing of the ML model.
Lastly, to demonstrate the holistic and flexibility of our explored RFR model, the model is tested with the randomly generated device parameters. Firstly, after training the model with the specific range (listed in Table 1) of device parameters, the model is evaluated with unknown device parameters. Thus, we have tested our trained ML model with random values (unknown to the model) as shown in Fig. 10, L ov = 0.15 nm, t ox = 3.04 nm, t n = 2.04 nm, WK = 4.44 eV. Notably, these device parameter values are not included in our simulated dataset, although, these values lie in the range FIGURE 10. An illustration of evaluation of the well-trained ML-RFR model by testing through the random device parameters. The error rate for L ov = 0.15 nm, t ox = 3.04 nm, t n = 2.04 nm, WK = 4.44 eV, is less than 1%. The difference between the ideal line and the scattered predicting points shows that it is possible to evaluate the model using any value within the specific range of device parameters.  Fig. 9) by training and testing of the RFR model. of our explored parameters. The comparison is established between the simulated values and the predicted values for random cases. The regression line shows the ideal scenario and the closeness of predicted values to the ideal line exhibits the outstanding performance of the ML model. Fig. 10 concludes that the evaluation of our explored ML model using randomly selected parameters outperforms. The error rate between the predicted and the tested values are not greater than 1% which is considered a remarkable achievement in the field of ML and semiconductor device simulation. Moreover, VOLUME 10, 2022 after accurate prediction of the ML-RFR model, the crucial parameter, i.e., minimum subthreshold slop (SS MIN ) is extracted from the simulated data as well as from the predicted I D -V G curves. The extracted SS MIN of line TFET device and predicted values are listed in Table 5.
Our explored RFR model takes 150 seconds to be welltrained and 20 seconds to be evaluated. Whereas, device simulation takes 5 hours to generate 100 I D -V G curves with one device parameter at a time. Hence, it can be concluded that ML modeling can accelerate the complex device simulation with an error rate of less than 1% and reduction of computational cost by around 99%. Moreover, innovative ML techniques can model the complex device structure and can find the optimal solution easily. Moreover, our trained model can accelerate the fabrication process because TFET is a complex and time-consuming device simulation process due to its quantum tunneling models. Therefore, predicting the I D -V G curve from our well-trained ML model is reliable to accelerate as well as minimize the computational cost of the fabrication process by demonstrating the electrical characteristics of any specific parameter within seconds.

V. CONCLUSION
In this work, the ML algorithm has been utilized to optimize the solution for the complex device simulation of the line TFET by training the RFR model with each possible fluctuated device having a specific range of device parameters. Five crucial device parameters, i.e., L ov , t ox , t n , WK and V D were explored to predict the I D variation. Therefore, from the predictive results, it has been concluded that the ML algorithm is an efficient and flexible approach to predict the behavior of the line TFET for the sub-3-nm technology node. In addition, it performs well by evaluating the device parameters on other bias conditions. Therefore, it has been shown that our explored ML model is physics-free and high compatible with other device conditions as well. Furthermore, it has been accepted that the accuracy of the predictive model is far better than the human expert's optimization algorithms using device simulation. Our explored RFR model converges faster than the other traditional algorithms. The R 2 -score of the welltrained and evaluation model is above 99%, similarly, the error rate for training and testing the line TFET simulated data is less than 1% which is considered computationally efficient. Moreover, it has been concluded that all the explored device parameters are independent of each other and a generalized algorithm has been modeled for the line TFET device with specific device parameters which will be extended further in a near future by adding more material parameters as well as the process voltage temperature variations.