Anomaly Detection in Time Series Data and its Application to Semiconductor Manufacturing

Anomaly detection is essential for the monitoring and improvement of product quality in manufacturing processes. In the case of semiconductor manufacturing, where large amounts of time series data from equipment sensors are rapidly accumulated, identifying anomalous signals within this data presents a significant challenge. The data in question is multivariate and of varying lengths, with an often highly imbalanced ratio of normal to abnormal signals. Given the nature of this data, traditional data-driven methods may not be appropriate for its analysis. This paper proposes a novel unsupervised anomaly detection model for the analysis of multivariate time series data. The model utilizes a unique recurrent neural network architecture and a special objective function to detect anomalies. Furthermore, a relevance analysis method is introduced to facilitate the interpretation and analysis of the detected anomalous signals. Our experimental results indicate that this deep anomaly detection model, which summarizes sensor data of different lengths into a low-dimensional latent space, enabling the easy visualization and distinction of anomalous signals, can be applied in real-world semiconductor manufacturing factories and used by on-site engineers for both analysis and execution purposes.


I. INTRODUCTION
Semiconductor manufacturing system comprises of numerous intricate and precise processes, and the detection and classification of faults in each of these processes is critical for ensuring quality control.With the advancements in Industry 4.0, cutting-edge technologies and equipment sensors are being implemented to monitor the various processes, resulting in the accumulation of large quantities of data, including information on temperature, pressure, gas flow, and equipment status.Early identification of anomalies and fault diagnosis from such massive data is imperative for efficient semiconductor manufacturing while preserving high yields.
The associate editor coordinating the review of this manuscript and approving it for publication was Sajid Ali .
Anomaly detection have been considered an important issue in vairous domains, including cybersecurity [8], medical diagnosis [23], and video surveillance [25].The importance of anomaly detection is particularly pronounced in the context of a manufacturing system [28], where it serves to facilitate monitoring of faults during production processes.Data in the industry is typically in the form of multivariate time series, collected continuously by multiple sensors over an extended period.With the significant growth in the size and complexity of this data, there is a corresponding increase in the demand for efficient data-driven methodologies capable of effectively addressing the challenges of anomaly detection.
One approach to data-driven anomaly detection is the utilization of supervised learning, where a model learns to differentiate between normal and anomalous data based on labeled examples [10], [16].In the big data regime, numerous studies have demonstrated the enhanced performance of supervised anomaly detection through the training of a binary or multi-class classifier with properly labeled data.However, this approach frequently falls short in industrial applications due to the unique characteristics of manufacturing data.For instance, the manual labeling of anomalous data is a challenging task that necessitates specialized knowledge, and even if it can be performed, the resulting data may exhibit label imbalances, with a limited number of anomalous samples relative to normal ones.This is because faulty events in most manufacturing systems are infrequent and occur during carefully planned procedures.From a data scientific perspective, this imbalance in the data can lead to difficulties in the extraction of meaningful features.
An alternative solution to address this challenge is the implementation of unsupervised learning.In recent years, numerous studies have demonstrated the efficacy of deep autoencoders [11] in extracting the inherent features of data.Autoencoders are neural networks that approximate the identity mapping through low-dimensional representations in intermediate layers, enabling the summarization of common information and identification of discriminatory features that determine data normality.However, constructing an anomaly detection model using industrial data is complex, as the model must capture both the temporal dependencies of each time series variable and the interaction between different variables.This study proposes the use of a recurrent autoencoder model to learn the features of time series sensor data and a novel loss function motivated by one-class classification for anomaly detection purposes.
One of the characteristics of deep learning models is their ''black box'' nature, where it is possible to predict highly nonlinear and complex relationships between input and output variables with high accuracy, but it is difficult to determine the underlying patterns of these predictions.In the context of real-world anomaly detection, interpretation and diagnosis of anomalies are crucial challenges.In recent years, significant efforts have been made to address these issues for both academic and practical purposes.For instance, the layer-wise relevance propagation (LRP) technique was proposed by [3], which operates by propagating the prediction f (x) backwards through the network using specified propagation rules.The backward pass reaches the input x, producing a quantitative relevance score for the output f (x).This relevance score indicates the contribution of each input variable to the prediction, making the model prediction interpretable.Additionally, [1] extended this research to recurrent neural networks.In this study, we have employed these methods to interpret and diagnose the anomalies detected by our model.
In this work, we present a novel approach for anomaly detection in sequential sensor data, collected from a real semiconductor manufacturing process.To handle multidimensional time series data, we use a cutting-edge recurrent neural network autoencoder.After the network is pre-trained, it is further trained using a deep Support Vector Data Description (SVDD) objective function, which maps normal data points inside a circle and anomalous data points outside of it Figure 1.Furthermore, through relevance analysis, we present a method for diagnosing the sensor or time instance at which the anomaly occurred.In summary, this paper makes the following contributions, compared with previous studies.
• An unsupervised learning method is proposed for detecting and diagnosing anomalies in sensor data generated during semiconductor manufacturing processes.
An efficient recurrent architecture is employed, enabling anomaly detection in multivariate data of varying lengths.
• Through the utilization of visualization techniques in a low-dimensional latent space, it is possible to observe the behavior of anomalous signals between various chambers.This capability enables real-time detection of anomalies, thereby enhancing the practical application of the deep learning model in the field.
• The proposed proposed relevance analysis serves to increase the interpretability of the model and can be utilized to determine the root cause of anomalies that have been detected.

II. RELATED WORKS A. DATA-DRIVEN FAULT DETECTION AND DIAGNOSIS
In the context of automated fault detection, one can consider either model-based or data-driven methods [21].Modelbased methods utilize models that are developed through a comprehensive understanding of the physical system and dynamics involved in the process.However, the implementation of such methods in a factory environment can be challenging, as it requires the completion of time-intensive tasks such as the appropriate setting of model parameters.
As an alternative, data-driven methods are a viable option for fault detection when sensor data is available.These methods seek to discover or extract useful features from the data, thereby enabling fault detection and diagnosis.This approach encompasses statistical analysis, data mining, and machine learning techniques such as principal component analysis [5], [9], and various classification and regression models [4], [6], [7], [13], [20], [24], [27] used for industrial purposes.In recent years, deep learning-based methodologies have emerged as an effective feature learning approach, and numerous studies are underway to apply them to the fault detection task in the semiconductor manufacturing process [2], [14], [15], [19].Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.  in data [11].In particular, anomaly detection tasks aim to identify anomalous data by learning the common features of normal data.This approach has been successfully applied to computer vision, taking advantage of the advancements in convolutional neural network architecture [17], [18].
Studies have also integrated classical anomaly detection algorithms into the deep learning framework, such as the work by [22] who proposed a deep one-class classification model by incorporating the objective function from the SVDD model [26].However, it is important to note that our proposed anomaly detection model differs significantly from previous studies in that the industrial data is often in the form of multi-dimensional time-series data, as opposed to single-instance image data.

III. PROPOSED APPROACH
In this section, we present our methodology for detecting anomalies in multi-dimensional time-series data.Our approach considers a sequence of input vectors  we assume that the anomaly label is not available during the training phase.
In subsection III-A, we present our proposed unsupervised learning model for anomaly detection of time-series sensor data.Subsequently, in subsection III-B, we provide a method for interpreting the abnormal signals using relevance analysis.

A. LEARNING FEATURE REPRESENTATIONS FOR ANOMALY DETECTION
In order to construct an effective anomaly detection model, it is necessary to capture the feature representation of the long-term dependencies and interactions between various variables.The obtained encoded features will then be utilized for detecting anomalies.In light of this requirement, we propose a method that combines a Long Short-Term Memory (LSTM) Autoencoder [12] with the  Deep Support Vector Data Description (SVDD) objective function [22].
The proposed methodology involves the use of a recurrent autoencoder that is trained to reconstruct sequential data x.The encoder and decoder networks, denoted by φ enc and φ dec , respectively, are designed with a Long Short-Term Memory (LSTM) architecture.The LSTM structure is selected to effectively capture the relationship between the input vectors and their preceding and subsequent values in the time series data.The encoder network compresses the input vectors into a latent space, which is then decompressed back to the original space by the decoder network through the function x = (φ dec • φ enc )(x).The model is pre-trained with a reconstruction error: min 1 where D is the set of the given training data.In our experiments, we utilized a Long Short-Term Memory (LSTM) autoencoder architecture to effectively summarize the representative information contained within the input data, regardless of varying lengths.The specifications of the architecture used can be found in Table Table 1.
130486 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Now, we implement the one-class classification approach utilizing the encoded vectors derived from the pre-trained model.To accomplish this, we train an auxiliary autoencoder model consisting of the encoder ψ enc and decoder ψ dec .The networks of this autoencoder model are constructed with dense layers, as detailed in Table 2.This specific architecture is essential in our model, as it facilitates an intuitive visualization of anomalies in a low-dimensional space.Upon completion of the pre-training stage, the model is further trained by minimizing the following loss function: where c is the prescribed center c, and R enc is the network weight regularization term.The loss function utilized in our model is influenced by the deep SVDD objective function presented in [22].The function operates by penalizing the deviation of the latent feature of the data from a circle, resulting in the mapping of the majority of normal data inside a circle in the latent space and mapping outliers away from the circle.In our experiments, the L 2 -weight decay was chosen as the regularizer.

B. INTERPRETATION OF ABNORMAL SIGNALS VIA RELEVANCE ANALYSIS
Now, we perform an analysis to determine the relevance of the input variables to the encoded feature.This analysis aims to represent the encoded latent feature φ enc (x) as a summation of relevance scores R (i) t of the i-th component of the input variable x at time instance t, as expressed in the equation: A commonly used technique to calculate the relevance scores is the Layer-Wise Relevance Propagation (LRP) method, as described in [1] and [3].LRP backpropagates the relevance score for the activation of each layer from the output layer to the input layer in a top-down manner.For instance, the relevance of lower-layer neurons can be propagated from the higher-layer relevance by considering the forward pass z j = k z k w kj + b j , with weights w kj and b j , using the following equation: where ϵ is a positive value related to the stability of propagation, N is the number of lower-layer neurons connected to neurons, and δ determines the relevance ratio propagated as a bias in the propagation process.Thus, LRP can be used for backpropagation for convolution layers, dense layers, and simple recurrent layers.

IV. EXPERIMENTS A. SEMICONDUCTOR MANUFACTURING DATA
In our study, actual semiconductor manufacturing data collected by Samsung Electronics were utilized.The data consisted of multivariate time series measured by nine sensors, however, due to confidentiality reasons, only encrypted names such as '84500X', '9QFPT5', and 'HX9CDQ' were provided and no further details about each sensor were available.The sensor data were collected from five chambers and 15 different processes during the manufacturing process.The time series data contained a limited number of anomaly signals, the proportion of which is summarized in Table 3 for each chamber.Figure 2 provides a visualization of time series data sampled from Chamber 1.It is noted that since the processing time varies for each wafer, the resulting data are time series of different lengths, which presents challenges for the application of existing anomaly detection models.

B. EXPERIMENTAL RESULTS
The results of the proposed methodology, applied to semiconductor manufacturing data, are presented in this section.

a: QUALITATIVE RESULTS
In order to visually represent the anomalies in each time series, a scatter plot was generated, as depicted in Figure 3.The scatter plot is characterized by circular contour lines, with each point representing a time series datum in the low-dimensional latent space, defined as (ψ enc •φ enc )(x).The variables z 1 and z 2 indicate the coordinates of the encoded vector in the latent space.
A comparison of the scatter magnitude of each process reveals that the tendency of anomalies showed the greatest  difference between processes -1 and 11.However, for the rest of the processes, no significant difference was observed.
The scatter plot of the results for each chamber in Figure 3 reveals that Chambers 2 and 3 appear to be relatively unstable.
This observation is based on the fact that many time series samples are distributed farther away from the circle, but the average performance of these chambers is similar to that of the other chambers.In addition, the pattern of anomalies occurring in Chambers 2 and 3 is located in the lower right corner of the circle, suggesting that these anomalies were likely caused by a similar malfunction.
In contrast, the anomaly generated in Chamber 4 was located on the right side of the circle, indicating that the cause of the z 1 -axis was shared between Chambers 2 and 3, but the cause of the z 2 -axis was not.It is important to note that the unstable appearance of Chambers 2 and 3 could simply be due to the larger number of time series data that was available for these chambers.

b: QUANTITATIVE RESULTS
The proposed recurrent model enables the calculation of the negative likelihood of each time series signal.This negative likelihood can serve as a measure of the probability of the signal being an anomaly.As shown in Figure 4, a high number of signals with high negative likelihood can be observed in Chambers 2 and 3.This finding is consistent with the visual observations made from the scatter plots in Figure 3.The same pattern can also be seen in the other chambers.This result highlights the validity of the proposed method and suggests that the two-dimensional projection of the algorithm results can accurately represent the information without loss.Furthermore, it supports the validity of the circular contour lines as an indicator of anomalies.This result indicates that the proposed recurrent model is capable of accurately capturing the patterns in the time series data and identifying potential anomalies.
The time series samples closest to the center of the circle and the farthest away are presented in Figures 5 and 6, respectively.The background colors in these figures indicate the relevance score, which can have both positive and negative values.The presence of a dark yellow background in the figures represents a high score with positivity, whereas a dark blue background represents a high score with negativity.Comparing the relevance scores in the figures, it can be observed that normal signals tend to have similar relevance scores, whereas abnormal signals exhibit varying patterns for each data point.This relevance analysis provides valuable insights into the factors that contribute to the anomaly score and helps to identify the root cause of the faults.By analyzing the time and sensors that greatly contribute to the anomaly score, the methodology enables a more effective and efficient approach to fault detection and diagnosis.This is a crucial aspect in the field of semiconductor manufacturing, as it can lead to improved product quality and increased production efficiency.The proposed relevance analysis thus offers a useful tool for understanding and addressing faults in the manufacturing process.

V. CONCLUSION
The present paper proposes a novel deep learning-based methodology for detecting and interpreting anomalies in the semiconductor manufacturing process.Our proposed approach leverages the strengths of LSTM autoencoders and a specialized anomaly detection objective function to effectively encode time series signals into fixed-length vectors and visualize anomalies in two-dimensional plots.
The visualization of anomalies on a 2D plot enables a comparison of the tendencies of anomalies occurring in different chambers, providing valuable information for field engineers in determining the root cause of the anomalies and taking appropriate action.However, it is important to note that a statistical anomaly does not necessarily correspond to a physical one, and further refinement of the model's determined range of anomalies may be necessary through in-depth discussion.In addition, our proposed relevance analysis allows for the identification of the specific time and sensors that greatly contribute to the anomaly score, providing valuable insights into the manufacturing process and the origins of faulty signals.This information can be leveraged to improve the efficiency and quality of the manufacturing process.
In conclusion, the proposed deep learning-based methodology offers a promising solution for detecting and interpreting anomalies in the semiconductor manufacturing process.Further research is required to develop more explainable and field-applicable deep learning models for failure detection and classification in the future.

FIGURE 1 .
FIGURE 1. Illustration of the proposed anomaly detection methodology for semiconductor manufacturing data.The multivariate sequential data with different lengths is summarized into a low dimensional latent space, wherein anomaly data are mapped away from the center.

FIGURE 2 .
FIGURE 2. Illustration of data samples from chamber 1.

FIGURE 3 .
FIGURE 3. Scatter plots of the anomaly data.The first row corresponds to the results for process -1, and the second row for process 11.

FIGURE 4 .
FIGURE 4. Plots for the negative likelihood of each time series signal computed using the proposed recurrent autoencoder.

FIGURE 5 .
FIGURE 5. Visualization of samples of data close to the center of the circle in the latent space.

FIGURE 6 .
FIGURE 6. Visualization of samples of data that were farther from the center of the circle in the latent space.

TABLE 1 .
The model architecture of the autoencoder for the sequential data summarization.

TABLE 2 .
The model architecture of the autoencoder for the one-class classification.