Monitoring Influent Conditions of Wastewater Treatment Plants by Nonlinear Data-Based Techniques

To operate wastewater treatment plants (WWTPs) with optimized efficiency, influent conditions (ICs) as initial states of inflow fed to WWTPs were monitored to identify potential anomalies that would trigger adverse events or system crash. To employ voluminous measurements for data-driven decisions, the non-linear, non-Gaussian, non-stationary, auto-correlated, cross-correlated, hetero-skedastic, case-specific nature of multivariate environmental datasets must be considered. This research proposed kernel machine learning models, the kernel principal components analysis based one-class support vector machine (KPCA-OCSVM) with various kernels, to learn anomaly-free training set then classify the testing set. A seven-years multivariate ICs time series was introduced with exploratory analysis performed to reveal temporal behaviors and statistical properties. KPCA with polynomial kernels sufficiently output representative features, based on which OCSVM with Gaussian kernels sensitively and specifically identified anomalies in ICs that were previously omitted by WWTP operators. The proposed kernel algorithms surpassed previous linear PCA-based K-nearest-neighbors models, and improved outcomes with limited increase in computation cost. Without requiring linear, Gaussian, stationary, independent, and homo-skedastic qualities from data, the proposed flexible environmental data science approach could be transferred, rebuilt, and tuned conveniently for ICs from different WWTPs.


I. INTRODUCTION
Environmental systems are complex.Wastewater treatment plants (WWTPs) are environmental systems where physical, chemical, and biological unit processes are convoluted [1].Modern WWTPs have to function unceasingly while accepting incoming wastewater of volatile quantity and quality.Industrial WWTPs have to manage wastewater with frequent abrupt changes in case-specific composition and temperature, while municipal counterparts often face impact from rainfall and snowmelt.Though having limited storage for them to be buffered, inflow cannot be abandoned or rejected.Nonlinear The associate editor coordinating the review of this manuscript and approving it for publication was Changsheng Li. dynamic multivariate processes, rising operational costs, and stringent discharge regulations altogether ask for optimal efficiency from practitioners [2].
Influent conditions (ICs) are initial states of inflow fed to WWTPs.ICs would affect system states, ongoing process mechanisms, and final product (i.e treated effluent) quality.Monitoring, detection, isolation, and diagnosis of potential anomalies or faults in ICs could, at an early stage, avoid unexpected system crash, maintain steady product quality, support efficient downstream processes, improve WWTPs reliability and reduce labor costs [3].
Conventionally, mechanistic model-based or analytical methods are developed for WWTP forecast monitoring.Utilizing first principals, such classical deterministic models VOLUME 7, 2019 This work is licensed under a Creative Commons Attribution 4.0 License.For more information, see http://creativecommons.org/licenses/by/4.0/could theoretically explain process mechanism, but they were based on ideal hypotheses, requesting prior parameters for calibration and would be challenged by ill-conditioned problems and costly high-dimensional computation in real applications.Expert systems, as established on knowledge bases, rely on subjective judgments from individuals and may lead to biased diagnoses.Environmental data science is an emerging interdisciplinary field, that effectively addresses the intricacy inside environmental systems, and could provide promising solutions for WWTP monitoring.Ever updating instrumentation, control, and automation in WWTPs are producing quantities of multivariate time series data, which are often unexploited.This ''data-rich, information-poor'' dilemma is attributed to the lack of methodology to select the right algorithm for a given case, the lack of standard prototypical data processing procedures, and the lack of trained environmental data scientist or data science expertise among environmental scientists [4].Moreover, the nonlinear, non-Gaussian, nonstationary, auto-correlated, cross-correlated, heteroskedastic, and case-specific nature of multivariate environmental time series data are all adding difficulty for researchers to construct both suitable and flexible models.
Data-based methods can perform systematic and objective exploration, visualization, and interpretation of data [5], identify essential factors, features or patterns, and endorse then optimize data-supported decision-making [6].Validated data-driven monitoring methods could be transferred and shared conveniently among domain experts, in virtue of the versatile nature of data science models [7].
Machine learning is a remarkable multidisciplinary field, where methods could be implemented for fault detection.Artificial neuron network (ANN) displayed good nonlinear projection quality, high fault tolerance, flexible self-adaptation, and parallel computing efficiency, which could deal with complexity and high-dimensionality of implied knowledge in WWTP anomalies.ANN-based predictive control has been introduced by [8] for WWTP monitoring and was tested on benchmark simulation data.Feedforward neural networks have been applied by [9] to predict wastewater effluent ammonia-nitrogen contents.Hybrid ANN has been developed by [3] to predict influent biochemical oxygen demand, which otherwise was expensive and difficult to measure with sensors.The requirements of suitable sample size and network architecture, the following intensive computation, overfitting, local optimal solutions, and inexplicability were hurdles to overcome.
Latent variable methods applied in statistical process control, including independent component analysis (ICA) [10] and principal component analysis (PCA) [11], could treat multi-dimensionality and auto or cross correlations in WWTP records.By dimension reduction, input data were projected onto feature spaces of lower dimension, where statistics could be built to reveal characteristics of interest and identify abnormal conditions.In such cases, non-stationarity, nonnormality, and non-linearity were issues to resolve.ICA may be further affected by the number and sequence of input signals or independent components.
To address the nonlinear nature of wastewater ICs data, kernel methods should be emphasized, where projection onto higher dimensions enables good reencoding of the data when it lies along a nonlinear manifold.Kernel PCA (KPCA) has been introduced by [12] to detect abrupt events in wastewater systems, including pollutions and sensor faults.Kernel ICA has been studied by [13] to detect exterior disturbance from rainfall, and has shown competitive accuracy, efficiency, and reliability.Support vector machine (SVM) has been proposed by [14] to improve oxygen uptake rate on-line measurement and control.Kernel methods demanded comparably larger storage during computation, where parallel computing could offer potential solutions.For ICs data, improved classification efficiency could be anticipated by adopting kernel techniques.
This paper proposed an effective monitoring strategy merging the desirable characteristics of KPCA modeling with an unsupervised one-class SVM (OCSVM) scheme to distinguish normal from abnormal measurements.In this regards, KPCA was used to account for nonlinearities in the multivariate ICs data.KPCA may discover relevant patterns in the data by transforming problems into higher dimensions via kernel functions, enabling non-linear relationships to be revealed as approximately linear.OCSVM may quantify the dissimilarity between normal and abnormal nonlinear features for detection without making assumptions on the underlying data distribution.Tests on real ICs data from a local coastal municipal WWTP showed the effectiveness of the proposed approach.Previous research employed PCA-based k-nearest neighbor (PCA-KNN) algorithms to connect dimension reduction with robust machine learning approach for fault detection, and introduced radial visualization for fault diagnosis, whose performances were partly merged and compared in this paper.Exploratory data analysis with data visualization were involved in uncovering and interpreting historical influent behaviors.
The PCA and KPCA models were briefly presented in Section II.The proposed KPCA-OCSVM monitoring schemes were discussed in Section III.The performances of the recommended approaches were evaluated via real data, together with exploratory analysis and visualization in Section IV. Conclusions were drawn in Section V.

II. LINEAR AND KERNEL PRINCIPAL COMPONENTS ANALYSIS MODELS
This section provided an overview of the linear PCA and the derived KPCA models.

A. LINEAR PRINCIPAL COMPONENTS ANALYSIS (PCA) MODEL
Principal Components Analysis, as a dimension reduction approach, was a popular modeling framework to learn crucial features from multivariate data.By projecting the process variables into lower-dimensional subspace, PCA enabled revealing the inherent cross-correlation among process variables [15].
Let us consider X = x T 1 , . . ., x T n T ∈ R n×m be a scaled data matrix of recorded inflow conditions with n observations and m variables.Based on the PCA model, the data matrix X can be expressed as a sum of the approximated matrix, X, and residual data, E.
where T ∈ R n×m represented a matrix of the principal components (PCs) and W ∈ R m×m was the loading matrix.
In the presence of cross-correlation, the original multivariate data X could be sufficiently preserved and approximated by the first 'k' PCs (where k < m).One necessary step here was to select the number of PCs.For this purpose, the cumulative percentage variance procedure was adopted due to its simplicity and accuracy.The loading matrix was frequently calculated using singular value decomposition of the covariance matrix S of the design matrix X: ) was a diagonal matrix with eigenvalues of S filled in decreasing order.The eigenvalue λ i was equal to σ 2 i , the variance of the corresponding i-th PC t i .

B. KERNEL PRINCIPAL COMPONENTS ANALYSIS (KPCA) MODEL
Kernel PCA, as an extension of the linear PCA with kernel tricks, allowed learning and revealing of nonlinear relationships among process variables.This non-linear dimension reduction algorithm was applied to feed further one-class SVM scheme, aiming at improved fault detection performance over traditional linear PCA-based monitoring methods.Core principles were sketched in Figure 1.By KPCA, firstly the input space was transformed via nonlinear mapping into a high-dimensional feature space, in which data were more linear.Then for dimension reduction, principal components were extracted by applying kernel tricks with inner products of nonlinear functions.Hence, procedures from linear PCA could be largely inherited, and nonlinear optimization with high computation cost was saved.Moreover, hyper-parameters were less involved here compared to other nonlinear methods such as artificial neuron networks where architecture designing could be empirical or metaphysical.
Let us consider the original training dataset x 1 , x 2 . . ., x n ∈ R m , where n was the sample number, m was the number of process variables.The feature space was constructed by using a nonlinear mapping: R m (•) − − → F h , where (•) was a nonlinear mapping function and h, as a huge positive integer, was the dimension of the feature space.Similar to PCA, the covariance matrix in the feature space F, F could be calculated as: where m = n i=1 (x i )/n was the sample mean in the feature space.Let (x i ) denote a mapped point after centering with the corresponding mean as (x i ) = (x i ) − m .To find the principal components, we solved the eigenvalue decomposition problem in the feature space such that: where λ and v denote eigenvalue and eigenvector of the covariance matrix F , respectively.By multiplying (x j ) from left in Eq.( 4), and defining the kernel matrix K or K ∈ R n×n such that: together with α ∈ R n to span the kernel PCs by feature space training samples, satisfying: Then the eigenvalue decomposition problem was reformed as [16]: The eigenvectors identified in the feature space F, namely nonlinear kernel PCs, would characterize nonlinear processes.Since the number of eigenvectors was the same as the number of samples, it was multiple times more than the number of linear PCs that could be offered by the conventional PCA.
Various kernel functions were available in literature [17], including polynomial kernels: cosine kernels: and radial basis functions (RBF): where 2δ 2 = 1 γ was the width of the Gaussian kernel.

III. THE PROPOSED KPCA-BASED ONE CLASS SUPPORT VECTOR MACHINE (OCSVM) MODELS
A. OCSVM MODELS One-class support vector machine was an unsupervised machine learning classifier and a special case of multi-class SVM.OCSVM learned decision rules from anomaly-free data, and then applied to classify new test data as either comparable or different from the learned anomaly-free case [18].OCSVM would project the input data via kernel functions into a high dimensional space where a hyper-plane could be built for classification.This transformation permitted the projected data to be relatively linear, which made it easier to categorize.To infer the properties or features of normal cases, the pre-defined hyperplane would help to perform classification, where the decision function, f (x) would determine whether a new observation lies within the hyperplane side as inliers (normalcies, f (x) = +1) or outliers (abnormalcies, f (x) = −1).
We let x 1 , . . ., x j ∈ D and j ∈ [1, k] be the training dataset.OCSVM mapped input data into the high-dimensional feature space F via kernels such as the radial basis function (RBF) (Eq.12).As illustrated in Figure 2(a), the decision rule f (x) aimed to maximize the Euclidean distance between the origin and the separating hyperplane H, which separated the training data in the features space F. Therefore we have the objective function f (x) expressed as: where w, ρ and Ψ represented respectively a weight vector, an offset, and a feature mapping D → F. Here, W and ρ could be determined by solving the following quadratic optimization problem. min where ν ∈ [0, 1] represented the parameter that defined the solution.
For illustration purpose, a simplified SVM classification was plotted in Figure 2(b), where an SVM was built based only on the first two PCs for ICs classification.The diverging color reflected model prediction.Normal conditions as in triangles were intensely centered around the bottom left red corner.Anomalies as in circles were generally outside the red region and scattered among blue areas.The white band around zero delineated decision boundaries formed in this model.The capability of SVM to tackle nonlinearity by forming nonlinear hyperplanes via kernel functions could be witnessed.

B. KPCA-OCSVM FOR ENHANCED NONLINEAR INFLUENT CONDITIONS (ICS) MONITORING
The proposed KPCA-based OCSVM approach for detecting abnormal changes in multivariate IC time-series was brieïĆy described in this section and illustrated in Figure 3. KPCA possessed simplicity and capability to extract relevant features in multivariate time series data appropriately, and therefore was considered as an efficient approach for capturing important features in high-dimensional nonlinear processes.Features (loadings on PCs) extracted from multivariate IC time series by KPCA could be offered to train OCSVM in an unsupervised manner, which would further tackle nonlinear and non-Gaussian properties flexibly, for not assuming any distribution of the underlying data.By introducing radial basis kernel function in the projection of OCSVM, we could access nonlinear hyperplanes as decision boundaries to differentiate between normal observations and anomalies.In the proposed approach, we expect the KPCA would sufficiently learn and accurately describe the anomaly-freeonly multivariate time series, the training IC dataset, and produce scores enabling the training process of OCSVM.Then the OCSVM would sensitively and specifically capture the presence of anomalies in the testing IC dataset.Several kernel functions were examined with different numbers of PCs in the KPCA step, while OCSVM was trained using the most popular flexible RBF kernel with fixed ν = 0.01 and γ = 0.1 in all scenarios.By connecting the two models, we hypothesized the original non-Gaussian, linearly inseparable influent conditions data could be treated well.

A. EXPLORATIVE ANALYSIS AND VISUALIZATION OF A HISTORICAL MULTIVARIATE IC DATASET
To verify the hypothesis, one dataset from full-scale WWTP was engaged for training and testing.In this dataframe, operators have kept seven-years records of twenty-one variables including various flow quantity and water quality values.Measurements were performed on samples taken from the headwork of WWTP in order to maintain compliance with local regulations and standards (Figure 4).The monitored plant was receiving municipal wastewater, which was representative of cases in the literature.Though maintained with laborious effort and duly examination by local technicians, more than a hundred abnormal influent conditions were not detected.All those anomalies have caused negative effects on the process due to various reasons as reported by the practitioners, which let the classification and soft sensing research feasible and necessary.R package Amelia was imported to impute a few missing data (132 out of 63950, less than 1%) during the preprocessing step [19].To explore the dataset, descriptive analysis and visualization were conducted, during which RStudio with R packages (kernlab, cluster, factoextra, ggplot2, ggTimeSeries, gplots) were employed [20]- [26].
To facilitate visualization, anomaly-free-only dataset grouped by clustering large applications (CLARA) algorithm was further played together with dimension reduction (PCA), as shown in Figure 5(a) with unit aspect ratio.Normal ICs were clustered to five typical classes by which similar ICs were grouped within each category and hence could be represented by their corresponding medoid.ICs of each medoid and the average values of anomalies were shown in Table 1.Clustered data were further submitted to PCA via which scores on the first two PCs were plotted.It was shown in the axes of Figure 5(a) that the first two PCs from linear PCA accounted for only around 40% variation, implying the high dimensionality and complexity of the dataset.The envelope or outline of all 2401 normal ICs as a whole was far from the elliptical shape, demonstrating the non-Gaussian characteristic.Though not separated well in the PCA plot, the fifth cluster was located distinctly from others, which could be attributed to their high conductivity, total dissolved solids (TDS), magnesium hardness, chloride content but low fat-oil-grease (FOG) and nitrate as listed in Table 1.
To investigate temporal behavior, the calendar plot in Figure 5(b) was employed, where entries from the ICs were sequentially vertically aligned, circulated by weeks and months, and colored according to their clusters.Interestingly, clusters showed temporal agglomerative behavior.The first cluster was popular from 2011 to 2013, especially from June to October.Later on, the second cluster replaced the first cluster from 2014 to 2016 -both of the above represented typical cases in summer.The third cluster presented from 2012 to 2017, primarily from November to April, could represent cases in winter.The fourth cluster was intensively concentrated in 2011 from February to June, and surrounded most of the anomalies, with least inflow compared to others.At the beginning of 2012, there was a gathering of consecutive anomalies that were recorded as 'total alkalinity over the limit' by the operators.Such agglomerative phenomenon (i.e lack of complete randomness in different variables on different dates) reflected the existence of non-Gaussian, non-stationary, auto-correlated (seasonal recurrent), crosscorrelated, and heteroskedastic characteristics in environmental datasets.
To summary ICs as univariates, wastewater quality index (WWQI) given by [27] was introduced, of which higher values represented better qualities.Calendar plot of WWQI was displayed in Figure 6.WWQI of medoids and the averaged value of anomalies were listed in Table 1.A significant low WWQI region was observed in 2011, coinciding with the fourth cluster, whose medoid owned WWQI even lower than anomalies.The fifth medoids showed highest WWQI.ICs in summer (first and second cluster) possessed higher WWQI than ICs in winter (third cluster).Probably higher usage of water in summer inflated the inflow and diluted its pollutant contents to some extent.The methods above  supported efficient analysis of ICs and yielded concise information.Moreover, practitioners could gain experience by understanding such visualizations.They would be vigilant during the winter or when the second cluster appeared, since the conditional probability of anomaly was higher in those cases.
By further examining Table 1, it was exhibited that anomalies averagely had smaller WWQI and lower temperature as expected, since abnormal events were observed more in winter and were related to worse influent qualities.Moreover, they were reported to exhibit high pH, conductivity, TDS, calcium hardness, magnesium hardness, total alkalinity, nitrate, phosphate, chloride, and boron contents.Such phenomena coincided with reports from local operators that, majority anomalous events occurred in this westerncoastal-desert municipal WWTP were brackish/saline water  intrusion, whether from groundwater surrounding lift pump station, from upstream desalination plant emergent discharge, or from rare rainfall streamflow.
To show relationships among measured variables and temporal records, double hierarchical clustered heatmap with density plot based on quarterly averages were delineated in Figure 7. Z scores were normalized values calculated by row averages and summarized as probability density distribution via kernel density estimation, as shown in the density plot.Though quarterly averages were taken for smoothing, summary, and visualization, asymmetric unimodal distribution with positive skewness and positive kurtosis was evidently observed.This point coincided with the previous PCA plot to imply extreme non-Gaussian property of this environmental dataset.Temporal records were reordered in columns by clusters.Similar to the CLARA results, the unique year 2011 was isolated from others.Measured variables were reordered in rows by clusters.Similar palette composition in rows signified positive correlations, while the reverse for negative ones.The inflow from lift station one clearly contributed the majority of total inflow.The inflow from lift station eight (connected to a desalination plant) was shown negative to chloride, chemical oxygen demand (COD), and biochemical oxygen demand (BOD), and alkalinity, but its trend was similar to boron contents.This showed utterly different compositions of industrial discharge and municipal wastewater in our case.The inflow from WWTP inside ('Inflow DP') was shown closer to hardness and suspended solids (TSS), probably due to chemicals dosed for membrane cleaning and sludge processing events.The temperature was displayed moving negatively to almost all water quality variables but positively to major municipal water quantity variables, which confirmed previous statements on seasonal wastewater qualities and flow quantities.The overall agglomerative phenomenon demonstrated the non-linear, non-Gaussian, non-stationary, auto-correlated, cross-correlated, and heteroskedastic characteristics.Those empirical historical observations would challenge traditional regression techniques that were typically requiring linear, independent (no autocorrelation or multicollinearity), homoskedastic, and Gaussian distributions.Therefore, versatile preprocessing treatment of environmental data such as kernel techniques could be promising.

B. DETECTION PERFORMANCE OF THE PROPOSED KPCA-OCSVM APPROACH
To assess quantitatively the detection efficiency of the proposed procedures, the following metrics were employed: true-positive rate (TPR, or recall), false-positive rate (FPR), area under the receiver operating characteristic curve (AUC), accuracy, precision, and F1Score.A confusion matrix with equations for frequently visited metrics in assessing classification algorithm performances was summarized in Figure 8.Detection results were delineated in Figure 9 in time series.Derived performance metrics were computed and listed in Table 2. Previous research on PCA-based methods (Table 3) were revisited as comparison [11].Since previously we defined normal/abnormal as +1/ − 1, consistently here they were taken as positive/negative classes.For series in Figure 9, true positives were in green circles, representing normal ICs that were classified normal, trivial cases.False negatives were in blue down triangles, representing normal ICs that were classified as abnormal, raising falsealarms, and would cause unnecessary operating costs.True negatives were in yellow diamonds, representing successfully captured abnormalities.False positives were in red squares, representing missed detection, were most dangerous and may lead to system failures.Therefore, from model evaluation aspect, we were expecting no red squares, less blue triangles, or as numerically in Table 2, minimized FPR, higher TPR, AUC, accuracy, precision, and F1Score.
In Table 2, various kernels for KPCA were compared with increasing numbers of PCs.With more PCs, the anomalyfree training dataset would be learned and approximated more precisely.However, excessive PCs would over-express the original dataset and lead to ill-structured classification problem for later OCSVM processing.Thereby with rising counts of PCs, generally both TPR and FPR would rise, while precision would fall.AUC, as a weighted combination of TPR and FPR, together with F1Score as the harmonic mean of TPR and precision, showed optimal maximum values when tuning.With polynomial KPCA, perfect detections were displayed with around 10 PCs, showing stronger abilities to learn the training dataset.Cosine KPCA showed comparable performance with 15 PCs.RBF KPCAs were similarly poor with fewer PCs but functioned better with more, which would involve intensive computation or potential overfitting and thus not suitable for online process monitoring or knowledge sharing among different WWTP ICs.
By comparison between Table 2 and Table 3, we could conclude that the nonlinear attribute of the dataset was generally better learned by the proposed kernel approaches than directly applying traditional PCA-based methods.Though KNN was applied as a local lazy learning algorithm in PCA-KNN to tackle nonlinear behavior and outperformed other counterparts, outputs from PCA were not sufficiently revealing data nature.Probably the data nature was nonlinear to some extent and could be explained better by polynomial KPCA and summarized by RBF-based OCSVM than linear PCA, but not ''too'' nonlinear that undermined outcomes would arrive when applying RBF in KPCA.
With proper training adopting local data, the proposed data-driven KPCA-OCSVM models could sufficiently learn and approximate nonlinear, non-Gaussian, non-stationary, auto-correlated, cross-correlated, and heteroskedastic features from anomaly-free local WWTP ICs data by KPCA, then effectively identify real anomalies from the test dataset with competitive performances by OCSVM, in contrast to previous linear PCA-KNN algorithms.Kernel techniques successfully improved model outcomes with limited increase in computation cost or model complexity.Without assuming the underlying data structure, the investigated flexible environmental data science approach could be transferred, rebuilt, and tuned for ICs from different WWTPs.

V. CONCLUSION
To operate wastewater treatment plants with optimized efficiency, influent conditions as initial states of inflow fed to WWTPs were monitored to identify potential anomalies that would trigger adverse events or system crash.To involve voluminous ICs data for data-driven decisions, the nonlinear, non-Gaussian, non-stationary, auto-correlated, crosscorrelated, and heteroskedastic nature of environmental dataset must be considered.This research introduced kernel machine learning models, KPCA-OCSVM with various kernels, to learn anomaly-free training set then classify the testing set of ICs.Exploratory analysis with data visualization was performed to reveal temporal behaviors and statistical properties of multivariate IC time series.KPCA sufficiently output representative features, based on which OCSVM sensitively and specifically identified anomalies in ICs that were previously omitted by operators.The proposed kernel algorithms surpassed previous linear PCA-based KNN models, and improved outcomes with limited increase in computation cost.Without requiring linear, Gaussian, stationary, independent, and homoskedastic qualities from data, the proposed flexible environmental data science approach could be transferred, rebuilt, and tuned conveniently for ICs from different WWTPs.Future research would further investigate data from full-scale wastewater treatment process and apply kernel machine learning techniques for effective whole process monitoring.

FIGURE 2 .
FIGURE 2. (a) One-class support vector machine (OCSVM).The hyperplane was formed amid two classes, with maximized distance from the origin; (b) Simplified SVM for illustration, predicting anomalies from only first two PC scores, trained by radial basis function (RBF) kernels.Support vectors functionally affecting the decision boundaries were marked solid while trivials were hollow.

FIGURE 4 .
FIGURE 4. (a) Overview of the wastewater treatment plant; (b) Fine screen inside the headwork, where influent was filtered.

FIGURE 5 .
FIGURE 5. (a) Scatterplot of the dataset via PCA and clustering large applications (CLARA) algorithm; (b) Calendar plot of temporal CLARA result.

FIGURE 6 .
FIGURE 6. Calendar plot of temporal wastewater quality index.

FIGURE 8 .
FIGURE 8. Confusion matrix and associated performance metrics in this study.

TABLE 2 .
Anomaly detection performance of KPCA-OCSVM algorithms with various kernel functions and different numbers of selected PCs in the KPCA.

TABLE 3 .
Anomaly detection performance of PCA-based algorithms, from prior research.PCA reconstructions formed statistics including univariate residuals, squared prediction error (SPE), T2, and K-nearest neighbor distances (Euclidean or Manhattan), for which parametric/nonparametric thresholds were set to detect anomalies.