Segmentation of Multivariate Industrial Time Series Data Based on Dynamic Latent Variable Predictability

Time series segmentation is an important vehicle of data mining and extensively applied in the areas of machine learning and anomaly detection. In real world tasks, dynamics widely exist in time series but have been little concerned. This paper proposes an algorithm which can partition a multivariate time series into subsequences at points where the dynamics change. In process industries, the change of dynamics often relates to operation regime change, working condition shifting or faults. Therefore, to segment time series according to change of dynamics can be useful in data preprocessing and getting deep insight of the process in various industrial process monitoring tasks. The proposed algorithm recursively merges neighborhood subsequences through a heuristic bottom-up procedure. The cost of merging is defined as the mutual predictability of the subsequence models which are constructed using the dynamic-inner principal component analysis algorithm. Then, the goal becomes finding the segmentation scheme which minimizes the averaged dynamic prediction errors of each subsequence model. The proposed algorithm is evaluated on both simulated data and the time series data collected from an industrial processing plant. The results show that it outperforms the static principal component analysis based methods.


I. INTRODUCTION
There has been a growing recognition of the value of time series data because many applications, such as anomaly detection [1], [2], working condition perception [3], [4] and soft measurement [5]- [7], are being enabled by the rapid development of data driven algorithms. Time series segmentation is to partition a given time series into subsequences which are internally homogeneous [8]. By segmentation, each subsequence is expected to be interpreted by a relatively simple model so as to reduce the representational complexity of the data. The problem of time series segmentation can be considered as an unsupervised clustering problem with the constraints that the data in each cluster are successive and has immutable order [9]. Most of the basic segmentation approaches are rooted on the idea of piecewise linear The associate editor coordinating the review of this manuscript and approving it for publication was Jenny Mahoney. approximation (PLA) [10]. Then, the problem of finding the best segmentation can be formulized as a problem of finding the segmentation scheme which minimizes the overall or the averaged approximation error. The implementation of a PLA model is straightforward, e.g. by using linear regression and interpolation for univariate time series data [10].
In many real world tasks, time series segmentation serves as a data pre-treatment tool for representation reduction [11]- [15], change point detection [4], [15]- [18], feature extraction [17], [19] and more. Especially, accurate segmentation of the time series is often the key for the success of unsupervised or semi-supervised time series classification tasks [20].
In process industry, a processing plant constantly generates a huge volume of time series data by sensors, assay reports, operators, control systems, surveillance systems and more. Time series segmentation can be applied on time-resolved measurements to identify major changes in ongoing process. The changes in the dynamics are usually associated with major events contributing to the behavior change of the system, such as operation regime change, working condition shifting or faults [21]. Detecting those changes is an important task for process control, optimization and safety assurance. Industrial time series data commonly include more variables, e.g. flow rates, temperatures, pressures and compositions, and they can be highly correlated. For these situations, multivariate time series segmentation can offer significant advantages over the univariate methods. Various approaches have be developed for the multivariate industrial time series segmentation problem. A nature extension of the PLA approach is to use principal component analysis (PCA) to linearly represent the subsequence data. Then, the Hotelling T 2 statistic and reconstruction error are employed as measures of the approximation errors in finding the optimal segmentation [22], [23]. A further approach extends the PCA based approach to a probabilistic framework [9], [24].
However, industrial time series data typically show both cross-correlation and autocorrelation, and the structure of those are likely to changes due to uncontrollable disturbances or switching of the operation regime. The PCA based approaches are often applicable to the situation where there is little cross-correlation among variables [25]. But they are not suitable for detecting the temporal correlation changes in the data. For the problem dynamic time series data segmentation, Dobos and Abonyi [3] proposed an algorithm based on the dynamic PCA (DPCA) [26] to support the detection of operation regime changes. DPCA uses an augmented data matrix including the time lagged process variables to extract the time-dependent relationships from data. Nonetheless, the DPCA model still relies on PCA and the structure extracted from the augmented data matrices is only determined by variance rather than the underlining temporal relationship. Therefore it is difficult for a DPCA model to differentiate the type of relations among the measured variables because autocorrelation and cross-correlation is mixed in the covariance matrix.
In view of the limitation of DPCA, Li et al. [27] proposed a dynamic latent variable (DLV) algorithm where a vector autoregressive (VAR) model is constructed for the latent variables extracted by the auto-regressive PCA to represent the dynamic relationships. Recently, Dong and Qin [28] proposed a dynamic inner PCA (DiPCA) algorithm which is able to extract explicitly the latent variables capturing the most dynamic variations in the data. This makes it possible to extract an embedded model of the dynamic latent variable which are best predicted from the past values.
Inspired by the idea of DiPCA, this paper proposes a new multivariate time series segmentation algorithm. The goal is to partition time series data into piece-wise stationary subsequences using dynamic predictability as a criterion. Assuming that the time series data are white noised and autocorrelated, the key idea of this work is to represent the time series data using a dynamic latent variable (DLV) model as inner model and a PCA mapping model as outer model. A global objective function of the dynamic prediction accuracy is constructed by choosing the latent variables which are mostly relevant to the dynamic auto-correlation. Then the best segmentation scheme can be obtained by maximizing the global objective, which corresponds to a strong dynamic autocorrelation for data in a subsequence. The main idea is shown in Figure 1.
The remainder of this paper is structured as follows. Section 2 formally defines the segmentation problem and introduces the DiPCA algorithm and the proposed cost function. In section 3, the parameters of the proposed algorithm are discussed using a synthetic dataset. Section 4 presents the test results on both simulated and real-world industrial datasets. And section 5 concludes this work.

II. THE MUTIVARIATE TIME SERIES SEGMENTATION ALGORITHM A. DESCRIPTION OF PROBLEM
Considering a multivariate time series data T = {x k ∈ R m |1 ≤ k ≤ N }, the goal of segmentation is to find a set of points to divide T into K subsequences which are piecewise homogeneous. Let S i (a i , b i ) = {x a i , x a i +1 , . . . , x b i } be a non-overlapping subsequence in T , where a i and b i are partitioning points and 1 = a 1 ≤ a i < b i ≤ b K = N , b i + 1 = a i+1 and K is the number of subsequences. A segmentation scheme is defined as which represents a partition of T into K non-overlapping subsequences.
In order to find the best segmentation scheme, first it needs a measure of the quality of a segmentation scheme. Denote cost(S i (a i , b i )) as the cost function for a given subsequence S i (a i , b i ) for this purpose. The formalization of the cost function depends on the characteristics of the time series data. Most often it is defined based on the distance between the original subsequence and the data fitted model. For example, the PCA similarity factor was used in [29]- [31], and the dynamic time warping (DTW) is used in [32]. If the probability distribution of the data is known, then the data in a subsequence can be treated as samples and the likelihood is used as the cost function [18]. Based on the cost function, the global objective function of a given segmentation scheme is where K is the number of subsequences. Then the segmentation problem can be considered as an optimization VOLUME 8, 2020 problem searching partitioning points a i and b i to optimize G, subjected to constraint b i + 1 = a i+1 .

B. DYNAMIC LATENT VARIABLE MODEL
Let T represent a multivariate time series composed of the sensory data collected from a dynamic industrial process. For an idea segmentation scheme, each subsequence of segment can be assumed stationary or weakly stationary. Under this assumption, a subsequence in T can be described using a linear generative model based on vector auto-regressive (VAR) model. Since there inevitably exists noise in industrial process data, the model can be expressed using dynamic latent variables [27]: where t k ∈ R l is the DLV at time k, A 0 is a constant vector, A 1 , A 2 , . . . , A s ∈ R l×l are parameter matrices, v k ∈ R l is new information and s is the number of time-lags. Then, x k in T can be generated by where C ∈ R m×l is the output coefficient matrix and e k ∈ R m is white noise.
The key idea of this work is to synthesize a cost function based on the prediction accuracy of the dynamic model trained using the subsequence data. Assuming that the new information v k is independent and identically distributed (i.i.d) for each k, the change of dynamic structure can be detected by the error of predicting t k using its past values. This requires extracting the DLVs according to their autocorrelation. However, the DPCA algorithms cannot extract DLVs when static relationship dominates the variance. In this paper, a recently developed tool, DiPCA, introduced in [28] is employed. With DiPCA, the most predictable DLV, t j , can be expressed as the weighted form: where the superscript represents the jth principal DLV, and t j k is a projection of primitive variables on a weight vector: w j is the weight vector, w j = 1. x k represents a data sample at time k in time series T . s is the number of time lags. If s is large enough, the residual v j k can be considered as white noise. Thus, the estimate of the latent variable t j k using the dynamic inner model iŝ where β j = [β j 1 , . . . , β j s ] T and β j = 1, and ⊗ is Kronecker product. It can be shown that minimizing the mean square error and maximizing the covariance betweent j k andt j k is equivalent [28]. Hence the objective function of extracting the DLV is to maximize where N is the number of samples in T . Assume that the time lag, s, is large enough for fitting the DLV model. Normally, it is reasonable to believe that not all those past data make well contribution to the modeling. Therefore, we add a sparsity regularization term to the cost function (7) to prevent over-fitting.
where λ is the penalty factor and · 1 is L1 norm. Recall and form the following sub-matrix from X : then the objective function of extracting DLV in subsequence S i can be shown as follow: The above optimization problem can be solved iteratively and the penalty factor λ can be found by cross-validation. Finally, the proposed L1 regularized DiPCA algorithm is listed in Algorithm 1. The algorithm is a modification of the DiPCA algorithm of [28]. For brevity only the main steps are given.

C. DYNAMIC PREDICTION ACCURACY
The static PCA based segmentation algorithms using objective function built on the Hotelling T 2 and Q statistics of PCA work well in the tasks of finding the best piece linear representation of time series data [22]. However, those methods are not suitable for detecting change in auto-correlation structure because the indices are insensitive to these changes. In this section, the cost function and the global objective function are defined based on the reconstruction accuracy of t k , and we will also show that the reconstruction accuracy can be seen as the prediction accuracy.
Collecting t j , p j and w j into T , P and W respectively, we have for j = 1 → l do 4: for λ in {0.005, 0.01, 0.02, . . .} do 5: w j ← a random unit vector 6: while w j is not convergence do Normalize β j and w j 12: end while 13: end for 14: Choose λ and w j with maximal J 15:

end for 18: end function
Forming T ι from T just as forming X ι from X , ι = 1, 2, . . . , s + 1, and then the VAR model of DLV in Eq. (2) can be built as

The parameter
can be estimated using the least squares method:ˆ Onceˆ is obtained, the estimate of T s+1 iŝ The above equation shows that the reconstruction is based on the auto-regressive equation which calculates the current data by its past-s-step data. Therefore the reconstruction can be seen as a prediction. And the prediction accuracy of T s+1 can be measured by cosine similarity: where · is the inner product and · F is the Frobenius norm. Given a model trained using data X and a subsequence Y , the latent score can be calculated as where X , P are trained using X . Form T Y using The prediction accuracy is Based on the prediction accuracy, the global objective function can be constructed by summarizing the dynamic predictability measure A presented in Eq. (12). The goal of the segmentation algorithm is to find those partitioning points which make each subsequence best reconstructed by the trained VAR model.
are parameters of the trained model and thenT i s+1 , the prediction of T i s+1 can be calculated. The prediction accuracy is Review Eq. (1) in section II A, the global objective function is The problem presented in Eq. (18) is a combinatorial problem where the decision variables are the partitioning points. The globally optimal solution of this problem can be solved by dynamic programming [18]. However, it is often too computationally complex for an analytical approach [18], [31].
In this section, a greedy algorithm is presented for fitting a peace-wise stationary representation to the given industrial process data. The algorithm realizes the ''bottom-up'' approach [33] and can work in a very scalable way. The algorithm starts from the initial state where the time series are partitioned into the maximum allowed number of subsequences. Then, in each iteration, it merges a pair of adjacent subsequences so that by merging them the objective function is maximized. And the iteration will repeat until the termination condition is satisfied. For this goal, a merging cost must be designed. Corresponding to the objective function, for a pair of adjacent subsequences S i and S i+1 , the concatenation S i∪i+1 is used to train the parameter i∪i+1 and the prediction accuracy A i∪i+1 is obtained.
This is the cost of merging subsequences i and i+1. If A i∪i+1 is the largest among i = 1, 2, . . . , K , then S i and S i+1 should be merged, where K is the number of subsequences.
In case that multiple pairs have the same largest merging cost, one pair is chosen arbitrarily, e.g. the pair with the smallest indices. In this case, other policy may be adopted, i.e. to choose the pair having the shortest merged length. The segmentation algorithm procedure is listed in Algorithm 2. Remark 1: The algorithm listed in Algorithm 2 terminates when the minimal number of subsequences has been reached. Partition T into initial subsequences TS 5: Determine the time-lag s 6: n ← number of initial subsequences 7: while n > N target do //termination flag 8: for i = 1 to n − 1 do 9: Calculate the merging cost A(TS(i),TS(i+1)) 10: end for 11: idx ← The index with the maximum merging cost 12: Merge TS(idx) and TS(idx+1) 13: n ← n − 1 14: end while 15: //Filter out invalid subsequences 16: for Each subsequence whose length < len do 17: Merge the subsequence with one of its neighbors having the larger merging cost 18: end for 19: return TS 20: end function This is indeed a very usual choice of the terminating condition in practice, though it is not the only one. Other terminating condition such as the PETE [8] method can be applied when the computational speed is not a concern.
Remark 2: The merging operation in the proposed algorithm is subjected to the problem of data unbalance. This is because the merging prefers long subsequences so that the shorter subsequence of the neighbor of a long subsequence may be deprived the chance of merging at all. This results of an undesirably large number of small subsequences. To overcome this problem, when calculating the merging cost in Eq. (19) the longer one between the pair of adjacent subsequences is always chosen for training the parameters W long , P long and long to predict the shorter one. Therefore, the prediction accuracy for merging cost is Remark 3: The algorithm is greedy as it always chooses to combine the two subsequences with maximum merging cost. Therefore, the algorithm only finds the local optimal segmentation position, which is not necessary to be global optimal.
Remark 4: The post-processing, which is optional, is used to filter out all the subsequences whose length is smaller than the given minimal allowed length which should be determined depending on the application.
Remark 5: Assuming a fix number of iterations in extracting each latent variable denoting as ι, the complexity of evaluating the DiPCA based cost function is O(lmιs). Introducing L1 regularization affects the algorithm only on the lower bound of complexity as we shall see in the following section that L1 regularization can deactivate redundant latent variables. The complexity of the bottom-up segmentation algorithm is O(Ln) whereL is the average segment length of the target subsequence [33]. The overall complexity is O(nlmιs/N target ).

III. SIMULATION AND MODEL PARAMETERS
The proposed segmentation algorithm includes several parameters which must be given. For the convenience of discussion, the algorithm is first tested using the simulated data.

A. SIMULATION SETUP
The data are generated from the following simulation model where the superscript i = 1, 2, 3, 4 represents the ith model and j = 1, 2, 3 represents the coefficients of past-j-step. v 1000 ]. Concatenate t (1) , t (2) , t (3) and t (4) , and express as t = [t (1) , t (2) , t (3) , t (4) ], then a data set including 4 known subsequences is generated by the following formula  Figure 2 shows the segmentation results on the simulated data. The green lines represent true segmentation points and the red lines are algorithm effect. Since the true number of subsequences is known, i.e. K = 4, the termination condition of the algorithm can set to K = 4. The time-lag s and the number of DLVs l are set to 3 and 3 respectively. The L1 regularization coefficient λ is 0.03. The determination of those parameters will be discussed later.
To illustrate the problem of data unbalance mentioned in Remark 2, the segmentation algorithm is performed according to the merging cost defined in Eq. (19). The segmentation result is shown in Figure 3, where the partitioning locations found by the algorithm are at {1000, 2000, 2900}. Obviously, the last location is incorrect. After changing the merging cost function to Eq.(20), the segmentation results are shown in Figure 2, where all the ground truth segmentation positions, {1000, 2000, 3000}, are correctly located.
The proposed method is compared with the other relevant segmentation methods including the PCA based method [23], the DPCA based method [3] and the fuzzy probabilistic PCA based method [9]. There are plenty of other developments in this field. One reason to choose those methods for comparison is that those and the proposed method are all in the same category of generalized piecewise linear approximation approach. Another reason is that the main contribution of this work lies on the way of measuring the approximation error. The methods for comparison all use the same bottomup backend algorithm and the other settings can be made the same. This helps one to analyze the cause of performance  difference and more clearly demonstrate the contribution of this work.
The segmentation results obtained on the synthetic dataset are shown in Figure 4, where the methods for comparison are abbreviated as DiPCAseg, PCAseg, DPCAseg and FuzzyPPCAseg respectively. The green lines represent the ground truth. Note that for the FuzzyPPCAseg algorithm [34], the final number of segments is determined by the threshold for merging similar clusters. And the subsequences whose lengths are smaller than the predefined minimal allowed length are also merged. As we can see, the DiPCAseg outperforms all the rest methods in this case. This is because the dataset is specifically designed to demonstrate the strength of the proposed algorithm which is more sensitive in detecting the change of dynamics.

C. DISCUSSIONS ON MODEL PARAMETERS
The proposed segmentation algorithm includes the following parameters which must be given: • The number of DLVs, l: Similar to PCA, the number of DLVs indicates the amount of information to keep in the model. A large number of DLVs will lead to relatively less information loss and weak dimension reduction, and vice versa.
• The order of time lag, s: Recall in Eq. (2), the time-lag s refers to how long the past data will be considered having correlation with the current data. A moderate selection of s can better capture the dynamic correlation.
• The regularization coefficient, λ. The L1 regularization can suppress the non-zero weights, reduce the complexity of the model and increase the robustness of the algorithm. A larger λ corresponds to a simpler model.
• The number of final subsequences, N target . This parameter corresponds to the termination condition of the VOLUME 8, 2020 segmentation algorithm. It is often unknown in practical tasks.
• The length of the initial subsequence, len. This parameter defines the initial subsequence length, which is dependent on the nature of data and application requirements.
In general, the first 3 parameters are DiPCA related and those can be determined by cross-validation [28]. The rest parameters are application dependent and those can be determined by generic approaches. In the following the effects of parameterization on the algorithm will be discussed.

1) ON THE SELECTION OF l
In PCA algorithm, the number of principal components determines the information loss after reconstruction using the latent variables. This principle is still applicable in DiPCA algorithm [28]. The difference is that the number of DLVs is no longer determined by covariance, but by auto-covariance, which refers to the covariance between the actual sample and its prediction by the VAR model. Assuming the other parameters, i.e. s and λ, are determined, then l can be determined by a constructional approach which stops when 90% of auto-covariance are captured by the first l dynamic latent variables. In the case that it is not possible to reach the desired percentage the construction procedure should shop.

2) ON THE SELECTION OF s
The authors of [28] introduce a cross-validation procedure to simultaneously determine the optimal value of the dynamic order s and the number of DLVs l. This is achieved by using a validation dataset to evaluate the prediction error of the model built on the training data. In segmentation problem, the data to be partitioned may not be homogeneous. Therefore it is not possible to split the training data and the validation data. For this problem, a generic procedure is used. According to Eq. (9), the fitting accuracy of the DLV model is expected to increase as s using the same training data. This is associated with increasing objective J against s. However, for real world industrial data, it is reasonable to assume that the autocorrelation dependence decays with the time interval. Further increasing s will contribute less to J . Hence the optimal value of s is assigned to the point where the rate of increasing J starts to slow down, as it best balances the models expressing power and complexity. It is also worth pointing out that in general the dynamic order s is different for different DLVs. For the task of segmentation, it is sufficient to consider only the most dominant DLVs.
To illustrate this point, a subsequence is randomly chosen from the simulated data, e.g. {x 551 , . . . , x 600 }. Then, by increasing s from 1 to 10, the value of objective, J , denoted in Eq. (9) against s for all the available DLVs is plotted in Figure 5. It can be seen that the first DLV dominates the contribution to J . And an obvious elbow point can be observed at s = 3, which is associated with the point of decay rate slowing down. Though the elbow point is not obvious for  the other DLVs. Therefore, only the first DLV is considered and s = 3 is chosen as the optimal choice for the order of model, which is identical to the true order of the simulation model. Next, some subsequences are randomly sampled from all the available subsequences and find out their optimal s respectively using the above method. Then, the largest one is chosen as the final optimal value for s.

3) THE EFFECTS OF L1 REGULARIZATION
First, to illustrate the effect of L1 regularization on DiPCA algorithm, the algorithm parameters are set to s = 3, l = 5 and let λ having values 0, 0.005, 0.01, 0.02 respectively. Table 1 shows the weights of the first dynamic latent vector, β 1 . It can be can observed that the number of nonzero terms decreases as the value of λ. In fact, even when the number of DLVs, l, is unnecessarily large, i.e. l = 5, introducing L1 regularization can successfully deactivate those redundant latent variables by zeroing their weights.
Secondly, the overall effect of L1 regularization on the segmentation algorithm is reducing the variance of segmentation locations found by the algorithm. The proposed segmentation algorithm starts from an initial state where each subsequence has the minimum allowed length. The output variance arises if there is under fitting caused by possible insufficient initial number of samples in each subsequence. To see this, the segmentation experiment is repeated 100 times on the same simulated data with λ = 0 and λ = 0.02 respectively. Figure 6 shows the three partitioning locations found by each run of the algorithm. The red lines represent the results obtained with and the green lines represent the results obtained with λ. In the first case, the standard deviation of the three locations found by the algorithm are 11.04, 10.73 and 0 respectively. And in the second case, the standard deviation of the locations are 6.53, 7.86 and 0 respectively. Obviously, the L1 regularization indeed reduces the variance of the algorithm outputs. Note that when λ = 0.02 is relatively large, we set T Y = YW (P T W + I l ) −1 in Eq. (14) to prevent problems with singular matrices, where is a small constant, e.g. 0.1.

4) ON THE SELECTION OF N target
In practical tasks, the true number of final segments is rarely known in advance. It is often provided considering the amount of the available data, the goal of time series data analysis, and the temporal granularity of the data, etc. There are also more sophisticated methods such as PETE [8] which can be applied in expecting the number subsequences.
In this work, a generic approach is used to the determination of an appropriate number of subsequences. The global objective G versus the number of subsequences K is plotted to find the elbow point. Figure 7 plots the results using the simulated data as an example. An obvious elbow point is located at K = 8. Since the true number of subsequences and their locations are known, there are 4 incorrect subsequences under this selection of K . Figure 8 shows the segmentation results. It can be seen that the three true partitioning locations are all accurately located. There are also four falsely found partitioning locations, which are labeled using the purple lines. However those false locations are all near the beginning of time series between x 1 . . . x 200 , and the length of those subsequence are equal to the initial subsequences. Therefore, those false locations can be easily filtered out by setting a minimal allowed length of subsequence.

IV. EXPERIMENTS ON INDUSTRIAL PROCESS DATA
The results shown in Figure 4 is obtained on the synthetic dataset to demonstrate the strength of the proposed algorithm. To evaluate the performance more thoroughly, the proposed method is compared with PCAseg, DPCAseg and FuzzyPPCAseg on various datasets with more diversified characteristics.

A. TENNESSEE EASTMAN PROCESS
The proposed segmentation method is tested using the Tennessee Eastman Process (TEP) simulated data [35]. The TEP has been widely used by the process monitoring community for comparing different approaches [36], [37]. The simulated TEP includes five unit operations: a reactor, condenser, compressor, separator and stripper. There are 12 manipulated variables (XMV1..XMV11), 22 process variables (XMEV1..XMEV22), and 19 composition measurements (XMEVS23..XMEVS41). Each simulated time series data represents a 48 hour-run with sampling interval of 3 minutes. The time series dataset is constructed using the 22 process variables and the 12 manipulated variables.
The TEP simulation contains 21 preprogrammed faults, and those faults are introduced after the 8th hour. The test datasets corresponding to faults IDV5 and IDV10 are used. The proposed DiPCAseg algorithm was compared with the PCAseg, DPCAseg and FuzzyPPCAseg on the datasets. The initial length of each subsequence is set to 30, the final number of segments to 4, the number of DLVs to 13 and the order of time lag to 3. For FuzzyPPCAseg, the final number of segments is adaptively determined by the algorithm, and the final subsequences whose length is smaller or equal to 30 are merged. The final results are shown in Figure 9 and Figure 10. Since the preprogrammed faults are introduced after the 8th hour, we can only confidently set one ground truth segmentation position to 160-165, indicated by green lines in the figures. VOLUME 8, 2020 The fault IDV5 denotes a step change in the condenser cooling water inlet temperature. The segmentation results are shown in Figure 9. Since there is no direct measurement of the cooling water temperature, the process variables related to the separator which is the downstream unit operation of the condenser are chosen to visualize the segmentation results. In the figure, those variables are product separator temperature (XMEAS11), product separator level (XMEAS12), product separator pressure (XMEAS13), product separator underflow (XMEAS14), separator cooling water outlet temperature (XMEAS22) and separator pot liquid flow (XMV7). The step change in the condenser cooling water temperature can be indirectly observed in its effects on the product separator temperature. Among the 4 methods for comparison, the DiPCAseg and the FuzzyPPCAseg accurately locate the first segmentation position. While the DPCAseg method shows little discrepancy and the PCAseg based on the T 2 statistics fail to locate it. The rest segmentation locations found by the different algorithms are diverse, and there is no ground truth for evaluation. There is one expectance around 660-670 where both DiPCAseg and DPCAseg suggest a segmentation, which may be explained as the convergence point of the effects introduced to the system by the step change in the condenser cooling water inlet temperature.
For the fault IDV10, random variation is introduce to the feed temperature of component C. Since there is no direct measurement of the temperature of C feed stream, some measurements and manipulated variables of the stripper are chosen to indicate the effects of the fault. Those include stripper pressure (XMEAS16), stripper temperature (XMEAS18), stripper stream flow (XMEAS19), stripper liquid product flow (XMV8), stripper steam valve (XMV9). Figure 10 shows the results. It can be observed that both DiPCAseg and FuzzyPPCAseg accurately detect the change near 180, while the DPCAseg and PCAseg fail to detect it.

B. SHARI EVAPORATION PROCESS
We applied the proposed segmentation algorithm on an industrial process data set collected from the evaporation plant of Shanxi Huaxing Alumina Refinery Industry (SHARI). The evaporation plant at SHARI uses the 6 effects counter current evaporation process which including 6 tube falling film evaporators, 4 self-evaporators and 6 condensation tanks. A brief diagram of the process is shown in Figure 11. The evaporators of alumina evaporation process are heated by steam. The slurry stream is fed into evaporators I, II and III respectively. In order to make full use of the heat source, the secondary steam of the evaporator I is fed into the II evaporator, and the secondary steam of the evaporator II is fed into the evaporator III, i.e., the secondary steam of each effect of evaporator enters the next-effect evaporator as heating. At the same time, full using of the secondary steam heat source of the selfevaporator, the secondary steam from the 1st ∼ 4th selfevaporator is used as the heating source for the evaporators III ∼ VI, and the secondary steam of the last evaporator VI enters the condenser. In addition, the condensate water of evaporator I is fed into the 1# condenser, then the secondary steam regenerated from the 1# condenser is re-used as the heat source to the evaporator I. Similarly, the condensate water of evaporators II ∼ VI inflows the 2# ∼ 6# condensers and the secondary steam is fed into the evaporators II ∼ VI respectively.
From the description of the evaporation process, it can be seen that the process variables, such as the pressure, the temperature and the level are highly correlated. The slurry levels of the five condensers are chosen as an example to demonstrate the performance of the proposed algorithm. The condenser level is the key process variable of the evaporation process and it is also less volatile than other variables such as pressure. The data set includes 4162 samples collected from the plant. For this case, the true segmentation and the number of subsequences are not known. To evaluate the performance of the proposed segmentation method, we invite experienced operators and process engineers to mark the segmentation points by hand. For the current data, the confirmed segmentation points are at {1220, 2535, 2815, 3020, 3120, 3638, 3880}.
For the SHARI data set, the segmentation algorithm parameters are: s = 3, l = 3, λ = 0.005, N target = 10, and len = 30. The segmentation results are shown in Figure 12-(a) where the red and green lines represent the  segmentation locations found by the proposed algorithm and experts respectively. It can be seen that the algorithm found 17 partitioning locations in total, i.e. 18 subsequences. Note that some of those subsequences have very small length. Those data are related to the very short period of transient period of the process or simply caused by noise. Since the preset the minimal allowed subsequence length is len = 30, those subsequences of length shorter than n are filtered out as invalid subsequences. Those are forced to merge with one of their neighbour subsequences. For this goal, the merging cost in Eq. (20) can be used to choose the pair with smaller merging cost to form a new subsequence. After merging these short subsequences, the final set of segmentation points are: {1230, 1650, 2730, 2820, 3000, 3120, 3390, 3600, 3870}, and those are shown in Figure 12-(b).  To evaluate the results of the proposed algorithm, Precision (P) is calculated as the P = TP/(TP + FP), where TP is the number of true positive and FP is the number of false positive. It is worth noting that the expert confirmed segmentation positions should not be considered being crisp. In the current context a position found by algorithm is counted as correct if it is within ±80 (about ±2% of the total length) of one of the confirmed locations. The proposed segemenation method is compared with PCAseg, DPCAseg and FuzzyPPCAseg methods, and the results are shown in Figure 13. It can be seen that DiPCAseg and FuzzyPPCAseg show remarkably higher precision than the rest methods. DiPCAseg locates six out of the seven confirmed segmentation positions and FuzzyPP-CAseg locates five out of the seven confirmed segmentation positions. There are a case where DiPCAseg correctly locates a segment but not by FuzzyPPCAseg, e.g. at 3800, and the other way round e.g. at 2535. The overall performance of each method is listed in Table 2.

V. CONCLUSION
Industrial process time series are characterized by its inherent dynamic nature. This paper investigates the problem of partitioning multivariate industrial time series data into piecewise stationary subsequences. The proposed algorithm seeks VOLUME 8, 2020 the segmentation plan which maximizes the dynamic predictability in each subsequence. The main contribution of this work is the design of the objective function measuring the dynamic predictability using the DiPCA modeling accuracy. Though the proposed method focuses on the dynamic predictability of industrial process data, it is flexible to be extended so as to include more segmentation criteria when the data of interest present other characteristics of the fields, e.g. cross correlation, periodicalness and trending. One limitation of the proposed algorithm is that it cannot guarantee the segmentation plan is global optimal due to the greedy approach it adopts. This might be alleviated by increasing the searching depth under some complexity constraint, or going future to the other end of the spectrum of the dynamic programming based solutions. To tackle the state explosion problem, a possible future direction of this study is to use approximate dynamic programming approach. Finally, the proposed method is implemented in the bottom-up segmentation algorithm and it has been thoroughly tested on simulated data and industrial process data, which shows encouraging results.
SHAOWEN LU (Member, IEEE) received the Ph.D. degree in electronic engineering from the Queen Mary University of London. He is currently a Professor with the State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University. His research interest includes industrial process modeling and simulation. His most recent research has been in the area of industrial time series modeling and analysis, working with both data driven and model driven methods.
SHUYU HUANG received the B.S. degree in automation from Northeastern University at Qinhuangdao, China, in 2016. He is currently a Research Student with the State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, China. His current research interests include time series analysis, data mining, and machine learning. VOLUME 8, 2020