Machine Learning-Based Anomaly Detection for Multivariate Time Series With Correlation Dependency

Recent advances in data collection facilitate the acquisition of large quantities of multivariate time series (MTS) data from various real-world systems. Anomaly detection in high-dimensional MTS data is essential to improving the productivity and safety of such systems; however, capturing the complex intercorrelations between different pairs of time series related to anomalous patterns is challenging. In this study, two different anomaly detection problems—mean shift and structural change—were defined based on the correlation dependency of MTS. Existing algorithms were experimentally analyzed and compared based on their correlation dependency encoding methods using synthetic datasets, with the results revealing that the explicit encoding of correlation dependency improves the predictive performance of anomaly detection in MTS data.

Constructing an ML model for anomaly detection in MTS data is challenging because it requires capturing two dependencies in data. First, time series data exhibit ''temporal dependency,'' in which time series observations are interdependent according to temporal order because data points are affected by the data observed at previous time steps. Traditional anomaly detection methods such as the The associate editor coordinating the review of this manuscript and approving it for publication was Li He .
histogram-based outlier score (HBOS) [17], isolation forest (IF) [18], local outlier factor (LOF) [19], and autoencoders (AEs) [20] assume that observations at different time steps are independent and identically distributed. Therefore, these methods cannot effectively encode temporal dependency. Recently, long short-term memory (LSTM) [21], [22], gated recurrent unit (GRU) [23], [24], and attention-based models [25], [26] have been used to address this problem. Second, the variables in MTS data are intercorrelated under what is referred to as ''correlation dependency.'' Traditional rangebased anomaly detection approaches independently monitor each time series to determine whether the observed values lie between the lower and upper bounds and aggregate the results for a decision. However, complex systems such as manufacturing plants, power grids, and communication networks comprise multiple intercorrelated components, making it necessary to exploit cross-variable dependency to model the system's normal conditions [27].
Two major problems with respect to correlation dependency have been defined for anomaly detection in MTS data. The first problem type involves the ''mean shift'' that occurs in single or multiple variables because of an unexpected change in the data generation process. A significant mean shift in a single variable can be monitored in a univariate manner. However, if the variables are intercorrelated variation in one variable transfers to the others along the correlation structure. Therefore, anomalous patterns can be detected in advance if the model exhibits correlation dependency. The second problem is ''structural change,'' which occurs in the relationship between variables. In MTS data, events within individual variables and variations in relationships between variables affect the distribution of the multivariate observations. Explicit monitoring of the variation in a correlation structure is essential for anomaly detection in a complex system that can exhibit accumulated relationship changes.
Despite its significance, correlation dependency recieved less attention than temporal dependency. We have identified two primary reasons for this. First, before recent advances in the Industrial Internet of Things (IIoT), there had been a lack of data containing information on the interconnections between sensors, actuators, and other system instruments. Although machine learning methods have been developed to overcome the problem of data scarcity, large quantities of multivariate data with complex relationships are now being collected from various industrial sites. Second, machine learning methodologies, are generally worse at reflecting correlation dependency than they are at reflecting temporal dependency. Recently, we found that Graph Neural Networks (GNNs) [28], [29], [30] can address correlation dependency in MTS through the explicit mapping of structures using graphs. The multi-scale convolutional recurrent encoderdecoder (MSCRED) proposed by Zhang et al. [30] and the Graph Deviation Network (GDN) proposed by Deng and Hooi [28] are representative explicit methods that utilize graph structures to explicitly represent correlation dependencies in MTS data.
In this paper, we provide an approach to understanding MTS anomaly detection problems using correlation dependency. We first define two problem types related to correlation dependency and then comprehensively analyze the characteristics and effectiveness of the explicit incorporation of correlation dependency into an anomaly detection model. The contributions of this study are summarized as follows: • We define the two problem types in MTS anomaly detection related to correlation dependency, i.e., mean shift and structural change, using rigorous definitions and examples.
• We propose a categorization of machine learning-based MTS anomaly detection models according to the correlation dependency encoding method.
• We comprehensively compare the performances and characteristics of the prevalent algorithms for mean shift and structural change at various correlation levels.
The remainder of this paper is organized as follows: Section 2 briefly introduces related works. Section 3 proposes a categorization of MTS anomaly detection algorithms based on correlation dependency by discussing ten relevant algorithms. Section 4 describes our experimental study and results. Section 5 presents our conclusions.

II. RELATED WORKS
Here, we summarize the works related to our research. First, we review the previous work on MTS anomaly detection with correlation dependency. We then introduce the use of the linear Gaussian model as a graphical model for representing multivariate data correlation structures. Finally, we define the two MTS anomaly detection problem types related to correlation dependency, i.e., mean shift and structural change.

A. ANOMALY DETECTION IN MTS WITH CORRELATION DEPENDENCY
The past few decades have seen a growing corpus of research on machine learning methods for anomaly detection in MTS data. As noted in the introduction, correlation dependency has received relatively little attention owing to a lack of data and methodologies. Recently, the big data collected by the IIoT and the rise of advanced machine learning techniques have allowed more attention to be focused on the problem. In particular, graph-based deep learning methods have proven to be capable of effectively encoding the correlation dependencies of MTS data in models. Yu et al. [31] used Graph Convolution Networks (GCNs) to learn spatial-temporal correlations in time series forecasting problems in the traffic domain. Zhao et al. [29] proposed a prediction-based GNN combining a feature-oriented Graph Attention (GAT) layer and a timeoriented GAT to capture both spatial and temporal dependencies in MTS. Zhang et al. [30] proposed the Multi-Scale Convolutional Recurrent Encoder-Decoder (MSCRED) for anomaly detection in MTS data. MSCRED encodes correlation dependency using a convolutional encoder and decoder with a signature matrix and incorporates temporal patterns using attention-based Convolutional Long-Short Term Memory (ConvLSTM) networks. It constructs the signature matrices to characterize multiple levels of system status across time steps, which are used to indicate the severity of different abnormal incidents. Dang et al. proposed the Graph Deviation Network (GDN) [28], a novel attention-based GNN approach that learns graphs of the correlation dependencies between variables, and identifies and explains deviations from these relationships. Both algorithms have demonstrated excellent performance in real-world applications of MTS anomaly detection using graph-based explicit correlation dependency encoding.

B. LINEAR GAUSSIAN MODEL
Multivariate time series in real-world systems often exhibit complex correlation structures among multiple temporal sequences [32]. Various metrics can be used to analyze the correlation between any two time series by measuring the degree to which one series evolves relative to another. The Pearson correlation coefficient is a typical correlation measure of the linear dependency between any two time series defined as follows: wherex andȳ are the mean values of X and Y , respectively. However, there are limits to the effectiveness of representing the overall correlation structure of MTS data using this type of correlation measure alone. The correlation structure of MTS can also be modeled using a graphical model that provides a graph-based representation of the relations between the variables. A Bayesian network (BN) is a probabilistic graphical model that represents a set of conditional probability distributions of different variables using a directed acyclic graph. In particular, such graphical models represent the correlation structure of multiple variables and describe their direct influence on each other in specific directions [33]. A linear Gaussian model (LGM) is a special type of BN that can model relationships between multiple continuous variables using conditional linear gaussian distributions. One significant advantage of LGMs is that they can model the causal relationships between variables once the graph structure is determined, enabling the modeling of real-world processes that generate MTS data [34], [35]. Figure 1 illustrates an LGM for MTS data. Each node in the graph denotes a random variable; each directed edge connecting a parent and child node represents the causal effect between the two variables. In Figure 1, the time series for Y is generated by the conditional linear Gaussian distributions of X 1 and X 2 as follows: where σ denotes the random noise and the linear coefficient β represents the effect of the parent node on the child node. The variables in the root nodes without any parents such as X 1 and X 2 are independently generated by the corresponding univariate Gaussian distribution N µ, σ 2 . Thus, a directed graph generated using LGM can represent a correlation structure of multiple variables via conditional distributions.

C. TYPES OF ANOMALIES IN MTS WITH CORRELATION
In this paper, we define two types of MTS anomaly detection problems, mean shift and structural change, in terms of correlation dependency.
LGMs are used to demonstrate the defined problem types and to generate synthetic datasets.

1) MEAN SHIFT ANOMALY
Time series data in a steady state are assumed to be generated from a normal distribution with a constant mean and variance. Abnormalities owing to mechanical failure, sensor malfunction, human error, or intrusion attacks can significantly shift the mean value of the distribution. We define this problem as a ''mean shift'' as follows: Definition 1: Mean shift anomaly. A mean shift anomaly is an anomalous pattern in the mean value of single or multiple variables caused by an unexpected change in the data generation process.
Because a mean shift that occurs in one or multiple variables propagates to the correlated variables, understanding the correlation structure can help detect such anomalous patterns more quickly and accurately. Figure 2 demonstrates the effect of one variable's mean shift on other variables according to the correlation structure. Assume that a significant mean shift has occurred in X 2 . This change propagates directly to X 4 , which is a child node of X 2 , and affects X 6 through the relationship between X 4 and X 6 .

2) STRUCTURAL CHANGE ANOMALY
A multivariate joint probability distribution is determined by the distribution of each variable and the conditional probability between different variables. Whereas the mean shift relates to the variation of individual variables, an anomaly pattern occurring in a multivariate system can also reflect changes in the conditional distributions. We define this problem as ''structural change'' as follows: Definition 2: Structural change anomaly. A structural change anomaly is an anomalous pattern caused by a significant change in the relationship between variables rather than in the variables themselves. For instance, a drilling machine can produce or enlarge holes in a solid material using drills. Input parameters such as cutting speed, feed, and drill diameter determine the process output. However, as the drill bit gradually wears out, output values such as the diameters or depths of holes vary despite identical input parameter settings. If the magnitude of a structural change increases or its effect accumulates over time, anomalies can arise in the multivariate system. Figure 3 illustrates these structural changes using an LGM. As the coefficients β 1 , β 3 , and β 6 transform to β 1 , β 3 , and β 6 , respectively, the distributions of X 4 and X 6 vary as a result of the structural changes.

III. METHODS
Here, the existing anomaly detection methods are categorized to compare their characteristics and performance for MTS data with correlation dependency. We first divide the methods into univariate and multivariate methods according to whether or not the method inputs multiple variables individually. Because a univariate method independently processes individual time series, the correlation dependency cannot be reflected in the model. By contrast, a multivariate method simultaneously inputs multiple time series. Multivariate methods are further categorized into implicit and explicit methods in terms of how they encode correlation dependency. An implicit method assumes that the input variables are independent, i.e., without any explicit encoding of the relationship between variables. By contrast, an explicit method explicitly models the correlation structure using a representation such as a correlation matrix or graphical model. The characteristics of the methods we assessed are summarized in Table 1.

1) HISTOGRAM-BASED OUTLIER SCORE (HBOS)
HBOS [17] measures the multivariate abnormality of a data point in terms of individual variables by summing the univariate anomaly scores. It constructs a histogram for each variable in which the inverse height indicates the outlier score of a variable's data point. Because histograms are easy to construct, HBOS is a computationally efficient unsupervised anomaly detection method.

2) INTERQUARTILE RANGE (IQR)
IQR [36] is a range-based anomaly detection method in which a range is defined using upper and lower bounds and three quartiles to detect out-of-range anomalies in a single time series.

1) IMPLICIT METHODS a: ISOLATION FOREST (IF)
IF [18] randomly constructs multiple classification trees that isolate each training observation from the others. In recursive partitioning using trees, the path length from the root to the leaf node measures the observation's abnormality because a forest of random trees collectively produces shorter path lengths for anomalies.

b: LOCAL OUTLIER FACTOR (LOF)
LOF [19] is a distance-based anomaly detection method that uses local densities in multivariate data to detect anomalies. The k-nearest neighbors for each data point are evaluated to calculate the local densities of all data points, or local reachability densities (LRDs). The anomaly score for each data point is then estimated by comparing its LRD values with those of its successive k-nearest neighbors.

c: AUTOENCODER (AE)
AE [20] is an artificial neural network that learns a compressed representation of input data using reconstruction. As it is assumed that an AE model trained on a normal dataset will fail to reconstruct unseen anomalies, the anomaly score is estimated using the reconstruction error; this approach is known as reconstruction-based anomaly detection.   [39] models an ensemble of GANs that utilize multiple generators and discriminators that are randomly paired and trained via adversarial training. Each discriminator obtains feedback from multiple discriminators, each of which is fed training samples from multiple generators. The anomaly score is computed as the average of the anomaly scores from all the generator-discriminator pairs. The authors found that enGAN can better model the distribution of normal data, which allows it to outperform a single GAN in detecting anomalies.

2) EXPLICIT METHODS a: MULTI-SCALE CONVOLUTIONAL RECURRENT ENCODER-DECODER (MSCRED)
MSCRED [30] constructs multi-scale signature matrices to characterize multiple levels of system status at different time steps. A convolutional encoder incorporates the correlations between variables using the provided signature matrices. Convolutional LSTM (ConvLSTM) is employed to capture temporal dependency. Based on the feature maps, a convolutional decoder reconstructs the input signature matrices, and the residual measures the anomaly score.

b: GRAPH DEVIATION NETWORK (GDN)
GDN [28] represents cross-variable correlation dependency using a graphical model. A GDN model learns the graph structure of variables and predicts a variable's behavior using an attention function over its neighbors in the graph. Furthermore, graph deviation scoring identifies anomalies that deviate from the learned relationships in the graph.

IV. EXPERIMENTS
This section describes our experimental assessments and discusses the results.

A. DATASETS
The synthetic datasets for mean shift and structural change were generated using an LGM, as shown in Figure 4. The LGM comprised 10 variables, namely, X 1 to X 10 . The root node variables X 1 , X 2 , X 3 and X 5 were independently generated from N (0, 1). Given the parent variable values, the child node variables were generated from the corresponding conditional linear Gaussian distribution with the linear regression coefficients β and unit variance.

1) MEAN SHIFT
A mean shift, which significantly shifts the mean value of X by from the starting time point s to the end time point e, is formulated as where t = 1, 2, . . . , T . In our experiments, we generated 10,000 observations (i.e., T = 10, 000). We assumed that a mean shift occurred at s = 6, 000, and the mean value of X 1 changed from the initial value µ = 1 to µ + until e = 8, 000. Thereafter, the mean value returned to the initial value, µ = 1. The results were compared on various mean shift levels, with = 1, 5, and 10. The mean shift in X 1 propagated to the other variables along the edges of the LGM, with each edge representing the correlation between the parent and child nodes via the linear regression coefficient β. The effect of mean shift on the occurrence of anomalies in the entire multivariate system depended on the magnitude of the correlation between variables. For comparison, three different correlation levels-low, moderate, and high-were set as follows: • Low correlation: β = 0.1 • Moderate correlation: β = 2 • High correlation: β = 5

2) STRUCTURAL CHANGE
A structural change occurs in the causal relationship of one or more pairs of variables. An LGM represents the structural change between two variables by applying different coefficient values β for the initial coefficient β after an event.
A structural change increases the value of β by per unit time after the event time s as follows: where t = 1, 2, . . . , T . In the initial state until s = 6, 000, data were generated by the LGM in Figure 4 with β 1 , β 3 , and β 6 set to 2, 3, and 4, respectively, and the other coefficients set to 1. It was assumed that structural changes occurred in β 1 , β 3 , and β 6 from s = 6, 000 to e = 8, 000 as the three coefficient values increased by = 0.01 at each time step.

B. EVALUATION
Here, we evaluate the anomaly detection methods using synthetic datasets generated for mean shift and structural change.
The datasets were standardized to zero mean and unit variance and then divided into training and test data in a 60:40 ratio (i.e., T = 1-6,000 for training and T = 6,001-10,000 for testing, respectively). Out of the training data, 20% (T = 4,801-6,000) were used as validation data to determine the anomaly score threshold. We note that all of the anomaly detection methods used in the experiments require normal data alone for training because it was unfeasible to obtain a sufficient quantity of labeled abnormal data from the industrial sites. The standard evaluation metrics-accuracy, precision, recall, F1 score, and area under the receiver operating curve (AUC)-were adopted for the evaluation of anomaly detection performance. All of the methods were implemented using Python 3.7. HBOS, IQR, IF, and LOF were trained using Scikit-learn [40], and AE, LSTM-AE, DAGMM, enGAN, MSCRED, and GDN were implemented using the TensorFlow [41] and Keras [42] frameworks. All experiments were conducted using an Intel R Core TM i5-10400 CPU @ 2.90GHz. 24GB RAM, and Windows 10 (64-bit).

1) MEAN SHIFT DETECTION
Tables 2-4 present the anomaly detection performances of the respective methods at various levels of correlation and mean shift. The best performance for each case is highlighted in bold. Regardless of the level of correlation and mean shift, the explicit methods performed best and the univariate methods performed worst, as the latter assumes that all variables are independent and therefore cannot effectively encode correlation dependency. However, LSTM-AE performed well relative to the other implicit methods at all correlation levels and performed particularly well on highly correlated data.  enGAN was also competitive compared to the other implicit models. As the correlation level increased, the explicit methods performed better than the implicit methods, as shown in Figure 5. MSCRED encodes the correlation dependency for normal data in signature matrices, allowing it to better identify anomalous variables as these signature matrices change. GDN explicitly learns correlation dependency using a graph, allowing it to accurately detect anomalies in MTS data with correlation. The results indicate that explicit multivariate approaches outperform other methods in the mean shift detection of MTS data with correlation dependency.  enGAN performed best. Although it lacks the explicit encoding of multivariate correlation dependency, LSTM-AE performed well by reflecting the temporality of data. DAGMM uses GMM to achieve better parameter learning of deep autoencoder models, allowing it to optimize network parameters and provide a robust threshold between normal and structural change anomalies. AE performed poorly in structural change detection relative to mean shift detection.

V. CONCLUSION
Understanding the correlation dependency between variables is essential for anomaly detection in MTS data. In this study, mean shift and structural change, which are MTS anomaly detection problems related to correlation dependency, were defined using rigorous definitions and examples. The existing anomaly detection methods were categorized based on the techniques used to incorporate correlation dependency, compared, and analyzed on a synthetic dataset generated using the LGM. The explicit models, GDN and MSCRED, outperformed the implicit and univariate methods in the detection of mean shifts and structural changes. From the results, the following insights were obtained. First, the explicit incorporation of correlation dependency was found to improve mean-shift detection performance, a result that became more evident as the correlation between variables increased. Second, explicit models were able to effectively detect structural changes. Additionally, the encoding of temporal dependency improved the structural change detection performance, as indicated by the competitive results of LSTM-AE. Our experimental results also revealed the limitations of the implicit methods in capturing correlation dependency. We hope the reported results will motivate future research work to address this problem, which is highly relevant to real-world applications.