Wafer Edge Yield Prediction Using a Combined Long Short-Term Memory and Feed- Forward Neural Network Model for Semiconductor Manufacturing

In semiconductor manufacturing, maintaining a high yield and ensuring accurate yield prediction are considerably important for improving productivity, customer satisfaction, and enhancing profitability. Despite its importance and merits, achieving wafer yield prediction with high quality and accuracy is challenging. In this paper, we propose a method for wafer edge yield prediction using a combined long short-term memory (LSTM) and feed-forward neural network (FFNN) model. Unlike previous research, we focus on the edge yield because of the higher yield loss at the wafer edge. The combined LSTM-FFNN model uses a dataset divided into two types according to data characteristics. Time-series data are used in the case of LSTM, and non-time-series data are fed into the FFNN. When preparing the time-series data, comprising data related to the equipment and chambers, data of different chambers do not overlap, thereby rendering them as independent entities. The proposed model outperforms other models in terms of all evaluation metrics. The coefficient of determination of the proposed combined LSTM-FFNN model is 34.14%, which is almost 13% higher than that of the other compared models on average.


I. INTRODUCTION
In recent years, as advanced technologies such as smartphones, deep learning, the Internet of Things, and artificial intelligence have emerged, the demand for semiconductors has increased exponentially. Meanwhile, semiconductor manufacturing, which involves several process steps, is becoming increasingly complex and difficult to manage. The semiconductor manufacturing process involves monitoring numerous parameters from the early stages of production up to the packaging of an end product [1]. Metrology, the most important parameter among these, is the key to achieving high product quality in semiconductor manufacturing. In general, metrology is measured physically using a helium ion microscope or scanning electron microscope (SEM) [2], [3]. Each wafer is measured after each process; thus, the quality of each wafer can be estimated. However, this is The associate editor coordinating the review of this manuscript and approving it for publication was Rahul A. Trivedi . impractical because every measuring process added between each pair of contiguous processes can significantly increase the total time of production [4]. Combined focused ion beam and SEM could be a possibility for on-line metrology [5].
Therefore, virtual metrology, one of many fabrication parameters, has been developed to augment physical metrology. Recently, virtual metrology has been employed for obtaining additional information from the analysis of scrapped wafers or electrical test values. Virtual metrology, which significantly enhances fabrication productivity and quality assurance [6], is a parameter that correlates various sensors on the process equipment with metrology. Because valuable data can be obtained without wafers having to proceed through metrology steps, it is possible to reduce the cost of metrology tools and the overall process time [7]. Despite the advantages of virtual metrology, checking wafer quality should not be solely dependent on it because of the need for consistency, which necessitates periodic updates. Both metrology and virtual metrology entail time-series data. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Certain machine parts that constitute the process equipment are worn away gradually through repetitive processes, and these machine parts and materials such as photoresist, etching gas, and other chemicals should be periodically replaced to maintain high quality. Engineers monitor the fabrication parameters in real time for each equipment chamber independently. To achieve the target value, Run-to-Run (R2R) [8] control is employed, and engineers refer to previous parameter values and intentionally tune process settings, such as temperature, pressure, and gas flow, for the next parameters, thereby creating time-series data. The semiconductor wafer yield is defined as the ratio of the number of good chips to the total number of chips. Yield is a widely used performance metric in semiconductor manufacturing; moreover, maintaining a high yield via reliable and accurate quality control is a key performance metric. Accurate yield prediction is highly important for improving productivity, customer satisfaction, and enhancing profitability [9]. Reference to yield predictions enables semiconductor manufacturers to implement supply chain management and guarantee high-quality products. Yield prediction is becoming more important as a future task [10]. Despite its importance and merits, wafer yield prediction has a significantly challenging goal of being systematic with high quality and accuracy. Under these conditions, many engineers in semiconductor manufacturing have attempted to predict the yield constantly in practice. These yield prediction models still need to resolve poor performance and are one of the most important goals for the department of yield enhancement. However, yield prediction models focus on the total average wafer yield. Moreover, the wafer yield differs for different regions of the wafer, consisting of the inner and edge yields. In general, the edge yield is substantially lower than the inner yield. Although the yield at the edge only accounts for a small proportion of the wafer, it has a significant influence over the decrease in the total yield. Thus, the edge yield is an important aspect of wafer yield that should be prioritized. A typical example of the inner and edge areas of a wafer is illustrated in Fig. 1.
The process variation across a wafer may be greater at the edge compared to that at the center, resulting in a higher yield loss at the wafer edge [11]. The thinning process could induce damage at the wafer edge, which would directly impact the physical yield [12]. According to Yavas, several factors can lead to significant edge yield loss. These factors include non-uniformities in the wafer thickness and etch profiles due to plasma inhomogeneity toward the wafer edge, wafer bow due to film stress, residues at the bevel and backside, chuck damage by reactive gases or particles, and plasma or handling-induced mechanical damage at the bevel [13]. Almost every process is focused on the inner area of a wafer, which may explain the aforementioned results. Because process variation at the edge contributes to edge yield loss, the edge yield should receive more attention. This motivated us to propose a wafer edge yield prediction model using edge parameters. In this study, we used several fabrication parameters, that is, metrology, virtual metrology, and equipment information obtained during critical steps, as the features. We additionally introduced a wafer edge yield prediction model that combines long short-term memory (LSTM) with a feed-forward neural network (FFNN) model. The main contributions of this work are summarized as follows: -We divide the dataset and input the smaller datasets into the corresponding models to understand the characteristics of the dataset. Because the data in semiconductor manufacturing evidently exhibit the time-series property by R2R control or engineers' tuning to achieve the target value, this characteristic should be included in the wafer yield prediction model. Sequential data are used in the case of LSTM, and the remaining data are fed into the FFNN.
-We use a time-series dataset consisting of data related to the equipment and chambers. When several chambers are connected to a single piece of equipment, they should be regarded as independent entities because they are controlled by different recipes. Data of different chambers do not overlap with each other when this sequential dataset is prepared for the LSTM model.
-We propose a method for wafer edge yield prediction, which is the key to enhancing edge yield, as opposed to focusing on predicting the total yield of the wafer.
The remainder of this paper is organized as follows. In Section 2, we describe previous studies in detail. In Sections 3 and 4, we present the proposed method and compare our experimental results with those of other well-known methods. Finally, we present a brief conclusion and discuss future work in Section 5.

II. RELATED WORK
Several prior studies have focused on wafer yield prediction, and the results of sequential research have been used in semiconductor manufacturing.

A. PRELIMINARY YIELD PREDICTION EXPERIMENTS
Shin et al. proposed a hybrid machine-learning strategy using fabrication parameters. Their strategy involved a neural network and memory-based learning for lot-based yield prediction in semiconductor manufacturing [1]. Li et al. proposed a comprehensive data mining method for predicting and classifying product yields in semiconductor manufacturing processes using a genetic programming approach [14]. Chien et al. suggested a framework for yield prediction and the classification of abnormal process stages in semiconductor manufacturing using the Kruskal-Wallis test and a decision tree [15]. However, these lot-based prediction models have limitations because only sampling and measuring two to three wafers in one lot that includes 24 wafers may entail an excessive assumption. This ultimately led to the proposal of wafer-based yield prediction models.
Nam et al. proposed a prediction model to predict wafer yield based on virtual metrology process parameters in semiconductor manufacturing [16]. Chien et al. proposed a novel data-driven approach to analyze big data generated during semiconductor manufacturing. The method is intended for low-yield diagnosis to detect the root causes of processes for yield enhancement. The data reflect the production process steps, tools, recipes, and vendors [17]. Jang et al. proposed a novel yield prediction model based on deep neural network algorithms by using the spatial relationships among the positions of dies on a wafer and die-level yield variations extracted from a wafer test without process parameters [18]. Although they used wafers for yield prediction, their approach is problematic in that metrology parameters are not included as features. Metrology parameters are the most powerful features for predicting wafer yield with high reliability because these parameters are the only values that result from actual measurements. However, the use of metrology for yield prediction is difficult because values will inevitably be missing from metrology owing to productivity and time limitations.
An et al. suggested an efficient way to distinguish high yield and low yield using a stepwise support vector machine. Measurements of the unit voltage, current, and other electrical characteristics were used for yield prediction after fab-out [19]. However, real-time prediction at the fabrication level is not possible because the input dataset is obtained after fab-out.

B. SEQUENTIAL RESEARCH IN SEMICONDUCTOR MANUFACTURING
Yang et al. proposed a novel approach that incorporates the interactions among spec-out events using spec-out event network analyses with time-series process sensor data such as temperature, pressure, and voltage data [20]. Lee et al. proposed a convolutional neural network (CNN) model, in which a receptive field tailored to multivariate sensor signals slides along the time axis, to extract fault features. In semiconductor manufacturing processes, all recipe parameters should reach their individual set points in a timely manner and maintain the set points without severe fluctuations for specified process durations [21]. Chen et al. proposed a method for anomaly detection in semiconductor manufacturing through time-series forecasting using three models: autoregressive integrated moving average, multi-layer perceptron, and LSTM [22]. Kim et al. proposed fault detection and diagnosis using self-attentive CNNs for variable-length sensor data in semiconductor manufacturing [23].

III. PROPOSED METHOD
In this section, we present the details of the proposed model for wafer edge yield prediction. First, various input features for the proposed model are described. Second, we provide a detailed account of the combined LSTM-FFNN model and its ability to effectively use both time-series data and nontime-series data.

A. DESCRIPTION OF INPUT FEATURES
The input features are summarized in Table 1. Four types of input features are used: metrology, virtual metrology, equipment output data, and equipment information. Furthermore, semiconductor manufacturing involves numerous parameters; therefore, the selection of an appropriate and optimal dataset is necessary. Among the several selection methods, domain knowledge from practical experience and statistical analysis is adopted to select the features. While engineers focus on yield analyses for defining the root cause, they not only identify the critical process steps but also the fabrication parameters when the root cause matches the specific fabrication parameters in the critical process steps. Three critical steps, namely A, B, and C, from among hundreds of process steps, and three additional steps, namely D, E, and F, determined via statistical analysis (Kruskal-Wallis), were selected [24]. The Kruskal-Wallis test entails the nonparametric analysis of variance to compare several independent samples. The results of the test are summarized in Table 2. We represent the edge yields as Y (numerical values) and all process steps as X (categorical values). Among all the process steps, three steps, namely D, E, and F, were found to have the lowest p-values. Steps B and A, which were selected based on domain knowledge, are in the fourth and fifth places on the list, respectively.
Step A is the most critical step; this is the reason metrology and virtual metrology data are obtained after step A has been completed. In the processing of step A, it is found that the edge pattern is slightly different from the center pattern because of the wafer topography. This difference is quantified by metrology and virtual metrology, and also affects the edge yield directly. Predicting wafer edge topography and then interpreting this to yield and yield degradation classification might be a more accurate method to determine edge yield, however checking wafer topography is costly in terms of money and time. Thus, in this study we predict edge yield directly. Because the proposed prediction model is for wafer edge yield, we only selected six metrology and twenty-three   virtual metrology data corresponding to the wafer edge area among tens of metrology and virtual metrology. All of the virtual metrology used corresponds to data for the additional values that make input features informative with metrology. The measuring rate for metrology is less than 100%. The output data obtained regarding the equipment are plasma-on time values obtained from the process equipment, and the equipment is set to plasma-on by each chamber after product maintenance and plasma-off after the determined time passes or serious events occur. These time values verified that the start and end times affect the wafer yield due to fluctuations in the processing rate. We extract these data from steps A, B, and C. There are also hundreds of other equipment output data. Some of them were used for input features, but they did not perform well because several wafers exhibited the same value for each feature. Finally, equipment name information was obtained for all six steps and converted to a one-hot encoded dataset.

B. COMBINED LSTM-FFNN MODEL
In this subsection, we discuss the proposed prediction model that combines LSTM with the FFNN. The architecture of the proposed model is illustrated in Fig. 2.
In general, the wafer yield prediction model uses only the non-time-series data of each process step. However, certain data clearly have time-series characteristics. Each data value is connected to another because engineers in semiconductor manufacturing refer to previous parameter values and engage in a series of fine-tuned value adjustments for the next set of data, thereby creating time-series data. Therefore, time-series data should be taken into account when using fabrication parameters as input features. The combined LSTM-FFNN model proposed in this paper effectively uses both time-series data and non-time-series data to improve the yield prediction performance.
First, we extract the features from the time-series data obtained via metrology, virtual metrology, and equipment output data corresponding to step A using the LSTM architecture. We employ multi-layer LSTM as networks with stacks of several LSTM models, where the hidden representation of the previous layer is used as the input for the next layer. Stacked-LSTM can solve more complex problems and extract hidden hierarchical information. Assuming that l denotes a layer, the hidden state of time-step t in layer h (l) t can be calculated as follows: where x t denotes the input metrology data at step t.
In this study, we use a two-layer stacked-LSTM model, and the final encoded metrology feature can be expressed as h (2) t . Second, we extract the features from non-time-series-data, that is, the equipment output data of steps B and C and the one-hot encoded equipment name information. To this end, we used multilayer neural networks. The output of the l-th layer can be calculated as follows: where X (l) , W (l) , and b (l) are the input, weight, and bias of the l-th layer, respectively, and σ is the activation function. In our work, we used the rectified linear unit (ReLU) as the activation function [25]. We designed fully connected neural networks with four hidden layers. Each hidden layer consisted of 128 nodes. The final encoded feature of the non-time-series data can be expressed as s (4) .
Finally, we concatenate the features h (2) t and s (4) , which represent time-series data and non-time-series data, respectively, and the final wafer yield y i is obtained using a fully connected neural network with one hidden layer.
The network is trained by conducting back-propagation using the Adam optimizer with a learning rate of 0.001. We use the mean squared error (MSE) as the loss function, which can be calculated as follows: where {x i } n i=1 are the training inputs, {y i } n i=1 are the wafer yield labels, θ are the weights of our architecture, and f is the prediction function in our proposed combined LSTM-FFNN model.

IV. EXPERIMENT A. EVALUATION DATASET
The dataset we used consisted of data regarding an advanced 3D vertical-NAND flash memory device from a semiconductor manufacturing company in South Korea. Both the product name and process step information are kept confidential for security reasons. Data relating to a total of 89,093 wafers were collected. In addition, we used a time-series dataset with data from equipment and chambers, and nearly 70 equipment chambers corresponding to the process step A were used. The dataset with time-series characteristics is prepared using a three-sequence length for each equipment chamber. A sequence length of 3 is used because engineers usually tune the recipe for the equipment chamber by referring to 3 points of change when monitoring fabrication parameters. An overview of the dataset is illustrated in Fig. 3. The training set was composed of 71,108 data samples, and the test set consisted of 17,778 data samples.

B. EVALUATION METRIC
To objectively evaluate the performance of the model, three evaluation metrics were adopted to compare the quality of different models: coefficient of determination (R 2 ), MSE, and mean absolute error (MAE). These metrics are mainly used to evaluate the performance of the regression model.
where y i ,ŷ, and y are the actual value of y, the predicted value of y, and the mean value of y, respectively. N denotes the number of observations.

C. COMPARING METHODS
In our experiments, we used four regression algorithms to build the yield prediction models: neural networks, support vector regression, decision tree, and partial least square (PLS) regression.
• Neural networks [26] are widely used computing systems inspired by biological neural networks for time-series prediction, nonlinear multivariate prediction, and anomaly detection in the field of manufacturing. We used feed-forward neural networks with four hidden layers and the Adam optimizer; the MSE loss function is the same as that employed in the combined LSTM-FFNN model. The only difference between this neural network and our model is the formation of the inputs. Neural networks receive their inputs without taking the sequence of the dataset into account, which enables us to verify the effectiveness of a time-series dataset in semiconductor manufacturing.
• Support vector regression [27] is a regression algorithm; it adds an -insensitive loss function for solving regression problems via the support vector machine [28], used for solving classification problems. The support vector algorithm is advantageous for complex models, and it is sufficiently simple for analyzing a space-related nonlinear problem mathematically. This is because it corre-VOLUME 8, 2020  sponds to a linear method in a high-dimensional feature space that is nonlinearly related to the input space [29].
• Decision trees [30] are one of the most widely used practical methods in statistics and machine learning in terms of both classification and regression. The target function of a decision tree has discrete output values, assigns each example to a class, and efficiently classifies new data. When the target variable takes continuous values, it is known as decision tree regression. Decision tree regression is a tree-based structure used to predict the numeric outcomes of the dependent variable, and these trees are constructed beginning with the root of the tree and proceeding down to its leaves by minimizing the predefined fitness function. The process continues until the termination criterion is satisfied.
• PLS regression [31] is a statistical regression that combines features from and generalizes principal component analysis and multiple linear regression, respectively.
The goal is to predict a set of dependent variables from a set of independent variables or predictors. This prediction is achieved by extracting from the predictors a set of orthogonal factors known as latent variables, which have the best predictive power [32].

D. EXPERIMENTAL RESULTS
The values of the three evaluation metrics of the five regression models are listed in Table 3 and illustrated in Fig. 4

V. CONCLUSION
In this paper, we proposed a method for edge yield prediction using a combined LSTM-FFNN model. Unlike previous research, we focused on the edge yield owing to the higher yield loss at wafer edges. Six critical process steps with four types of fabrication parameters (metrology, virtual metrology, equipment output data, and equipment name information) were used as features. The metrology, virtual metrology, and equipment output data of step A were connected via time-series to the LSTM model, and the other equipment output data and one-hot encoded equipment name information were used as inputs to the FFNN. Four regression algorithms-a neural network, support vector regression, decision tree, and PLS regression-were compared with the proposed model. The experimental results showed that the neural network outperformed the other regression models in terms of all three evaluation metrics (MSE, MAE, and coefficient of determination). The combined LSTM-FFNN model outperforms the other models with regard to all evaluation metrics. Moreover, the sequential nature of the fabrication parameters proved to be important. The following problems remain to be addressed in future research. First, additional process steps and parameters should be considered for a high-quality prediction model; this is vital because semiconductor manufacturing involves hundreds of process steps. However, both missing productivity values and time limitations present problems. Furthermore, the number of missing values may increase with the number of additional process steps or parameters. A more accurate prediction model is needed to consider the application of series metrology, a system that consistently measures the same wafers across several metrology steps. Second, in this paper, we only proposed a model for edge yield prediction. Thus, a methodological extension of total yield prediction should be researched.