Detecting Anomaly in Chemical Sensors via L1-Kernels based Principal Component Analysis

We propose a kernel-PCA based method to detect anomaly in chemical sensors. We use temporal signals produced by chemical sensors to form vectors to perform the Principal Component Analysis (PCA). We estimate the kernel-covariance matrix of the sensor data and compute the eigenvector corresponding to the largest eigenvalue of the covariance matrix. The anomaly can be detected by comparing the difference between the actual sensor data and the reconstructed data from the dominant eigenvector. In this paper, we introduce a new multiplication-free kernel, which is related to the l1-norm for the anomaly detection task. The l1-kernel PCA is not only computationally efficient but also energy-efficient because it does not require any actual multiplications during the kernel covariance matrix computation. Our experimental results show that our kernel-PCA method achieves a higher area under curvature (AUC) score (0.7483) than the baseline regular PCA method (0.7366).


I. INTRODUCTION
Chemical sensors are widely used to detect ammonia, methane, and other Volatile Organic Compounds (VOCs) [1]- [3]. The life and performance of chemical gas detection sensors can be affected by various factors, including temperature, humidity, other interfering chemical gases, physical factors etc. Anomalous sensors can produce drifting waveforms and it is a fatal problem for reliable gas identification and concentration estimation [4], [5]. In this work, we determine anomalous sensors and sensor measurements in an array of uncalibrated sensors by using robust ℓ 1 -Principal Component Analysis (PCA) without using a reference timeseries data.
PCA is used in anomalous sensor and sensor signal detection [6]- [8]. In this approach, the covariance matrix is constructed from a set of data vectors and the anomalous items (outliers) or vectors are found by using the reconstruction difference [9], [10]. The principal components of the data covariance matrix is computed and the original data vectors are reconstructed using only the first few principal components. In general, the reconstructed data vector is similar to the original data vectors, and the reconstructed data items that are different from the corresponding original items are considered to be anomalous.
In this paper, we propose to use ℓ 1 -kernel PCA based on a multiplicationfree kernel to detect anomaly in chemical sensors. Although the conventional PCA, which is based on the ℓ 2 -norm has successfully solved many problems it is sensitive to outliers in the data because the effects of the outliers are over-amplified by the ℓ 2 -norm. Recently, it has been shown that ℓ 1 -norm based methods produce better results in practical problems compared to the ℓ 2 -norm-based methods in several real-world signal, image, and video processing problems. In particular, ℓ 1 -kernel PCA usually is more robust against outliers in data compared to the ℓ 2 -PCA [11]. In the ordinary ℓ 2 -PCA, the principal vector is calculated as the dominant eigenvector of the data covariance matrix, which itself is calculated using the standard outer products. In this paper, we propose the eigen-decomposition of the ℓ 1 -kernel covariance matrix obtained using a new vector product, which induces the ℓ 1 -norm without performing any multiplications. Because of low computational complexity, the ℓ 1 -kernel PCA can be implemented in edge devices directly connected to the chemical sensors.
The recursive ℓ 1 -PCA [13] and the efficient ℓ 1 -PCA via bit flipping [14] returns the same result, while the former takes the exponential time and the latter takes the polynomial time. Both ℓ 1 -PCA methods [13], [14] require some parameters to be properly adjusted and rely on recursive methods. On the contrary, the proposed ℓ 1 -kernel PCA approach does not need any hyper-parameters adjustments. Its implementation is as straightforward as the regular PCA because we construct a sample kernel-covariance matrix using the proposed ℓ 1 -kernel and obtain the eigenvalues and eigenvectors of the kernel covariance matrix to define the linear transformation instead of solving an optimization problem.
The rest of the paper is organized as follows: In Section II, we formally introduce the ℓ 1 -kernel PCA and describe its application in an anomalous chemical sensor detection task. In Section III, we compare our method with the regular PCA, the recursive ℓ 1 -PCA [13] and the efficient ℓ 1 -PCA via bit flipping [14]. Finally, in Section IV, we draw our main conclusions.

II. 1 -KERNEL PCA
In our recent work [11], [15], we proposed a set of kernel-based PCA methods related with the ℓ 1 -norm. These Mercer-type kernels are obtained from multiplication-free (MF) dot products.
Let w = [ 1 · · · ] ∈ R ×1 and x = [ 1 · · · ] ∈ R ×1 be two -dimensional column vectors. Similar to the regular dot product of vectors we defined the multiplication-free (MF) vector product [11]. In vector data correlation operations, we use the following vector product: which turns out to be a Mercer type kernel: (w, x) = w x. In Eq. (1), sign( × ) can be computed without performing any actual multiplications and min operation can be implemented by subtraction and checking the sign of the result of the subtraction. For this reason, we call Eq. (1) a Multiplication-Free (MF) dot product. The dot product defined in Eq. (1) induces the ℓ 1 -norm as x x = =1 min( | |, | |) = x 1 and it induces a Mercer-type kernel [11].
Suppose that we collect vectors of sensor data and form a dataset The well-known ℓ 2 -PCA (regular PCA) method relies on the eigendecomposition of the sample covariance matrix C = X X. Similarly, we estimate the kernel covariance matrix as follows: arXiv:2201.02709v2 [eess.SP] 28 Sep 2022 where the matrix A is constructed using the dot products of the form x x . As a result, the construction of the kernel-covariance matrix A is straightforward. We name the kernel-PCA based on the vector product in Eq. (1) as the ℓ 1 -kernel PCA.

A. Anomaly Detection Using 1 -Kernel PCA
We first assume that there are sensors and some of them are anomalous. Sensors are assumed to be close to each other and they produce correlated output waveforms as shown in Fig. 1. We have the The measurement data is generated by normalizing the raw sensor measurement values to [−1, 1] and subtracting the mean. We construct the covariance matrix A = X X ∈ R × and calculate its eigenvectors. Let v 1 ∈ R ×1 be the dominant principal component vector. Finally, we reconstruct the data segment using the vector v 1 : and compute the error vector x −x . The sensor measurement in the -th segment is assumed to be anomalous if the Cumulative Squared Difference (CSD) betweenx and x is larger than a threshold. The threshold can be set as = + , where and are the mean and standard deviation of CSD values learned from a training data set. The parameter is usually selected as 3 with the Gaussianity assumption of the CSD values. If a sensor produces successive anomalous measurement vectors it is considered to be anomalous. In the second case, we assume that we only have a single sensor. For example, the sensor 3 produces impulsive spikes between 0 and 1200 as shown in Fig.1a (in orange). Similar to the PCA-based denoising methods [12], [16] we form data vectors from neighboring temporal data windows and form the measurement data matrix is the number of data windows and we have samples in each window. We form the kernel covariance matrix A = X X ∈ R × and compute its eigenvalues and eigenvectors v ∈ R ×1 . We reconstruct the data matrix X = VV X using the first eigenvectors where v 1 is the eigenvector corresponding to the largest eigenvalue. After this step we compare the actual data vectors x with the reconstructed onesx , = 1, 2, ..., . The vectors significantly different from the reconstructed vectors are considered to be anomalous. We let = 1 and observed that it is sufficient for anomaly detection.
Complexity Analysis: To calculate the covariance matrix C, we perform 2 multiplications and 2 ( − 1) additions because we perform multiplications in each dot product. On the other hand, to calculate the ℓ 1 -kernel covariance matrix A, we perform 2 sign operations, 2 min operations and 2 ( − 1) additions. According to Table I in [17], a multiplication operation consumes about 4 times more energy compared to the MF-operations in compute-in-memory (CIM) implementation at 1 GHz operating frequency. In this letter, we have three sensors and we used an Arduino to collect data so energy efficiency will not be significant but it will be significant in a large network with its own hardware. Since the value of = 3 or 5 is much smaller than the vector length = 125 or 224, the eigenvector computation is negligible compared to the covariance matrix construction in this task. As a result, the ℓ 1 -kernel PCA is about 4 times more energy efficient in CIM implementation. It is also more energy efficient in many other processors because multiplications consume more energy than additions and subtractions. [18].
Contribution: In our recent work [11] we introduced the ℓ 1 -kernel PCA, while this paper introduces a novel method to employ the ℓ 1 -kernel PCA into the anomaly detection problem. Our experiments show that in the anomaly detection task, the ℓ 1 -kernel PCA produces better results than the regular PCA, the recursive ℓ 1 -PCA [13] and the efficient ℓ 1 -PCA via bit flipping [14] in our data set obtained from chemical sensors.

III. EXPERIMENTAL RESULTS
We collected data using three ammonia MQ137 Tin oxide (SnO2) based sensors [19]. Sensors are connected to an Arduino Uno board, and the sampling rate is 2 samples per second. Sensors and a cylindrical ammonia source are placed in an airtight chamber. Sensors are pre-heated for 48 hours before collecting the data. When SnO2 is heated and exposed to the air, it reacts with the oxygen present in the air and form a layer of negative ion on the surface and reduce the surface conductivity [19]. When ammonia vapor comes in contact with the surface, it combines with the oxide ion layer on the top and releases electrons for conduction. As a result, the conductivity of the surface increases. This change in surface resistance can be measured in the form of voltage. Our experimental setup is shown in Fig. 2. Multiple Sensor Anomaly Detection: The three sensors are placed close to each other and one of the sensors (sensor 2) is obstructed with a cylindrical cover with multiple holes. The cover causes the sensor to react more slowly than the other sensors to the ammonia build-up and release. The obstruction level of the outlier sensor is adjusted in each trial to avoid over-fitting to one condition. Moreover, to generate a more realistic environment with varying levels of ammonia concentration, the chamber lid is opened at random intervals and with random duration. Opening and closing the lid is repeated multiple times to create different rise and fall responses. Sensor waveforms from two experiments are shown in Fig. 1.  Table 1: CSDs of the sample data 1 in Fig. 1a Table 2: CSDs of the sample data 2 in Fig. 1b using regular PCA. Sensor 2 is obstructed. Values larger than threshold 21.60 are in bold.

Regular PCA
We apply the ℓ 1 -kernel PCA based anomaly detection method described in Section II-A to the data obtained from the three sensors. We compared the proposed method with the regular ℓ 2 -PCA, the recursive ℓ 1 -PCA [13] and the efficient ℓ 1 -PCA via bit flipping [14]. The later two compute v 1 = arg max v: v =1 =1 |v x |. Tolerance parameter of the recursive ℓ 1 -PCA method is set as 1 × 10 −8 as suggested by the authors [13]. We plot the Receiver Operating Characteristic (ROC) curve in Fig. 3 and compute the Area Under Curve (AUC) scores for each method. As shown in Table 3 states, the ℓ 1 -kernel PCA provides the highest AUC score, and the recursive ℓ 1 -PCA [13] provides the lowest AUC score in this case. This is probably due to the non-convex optimization method that they use to compute the principle vector, and the method requires a suitable tolerance parameter. On the other hand, the ℓ 1 -kernel PCA does not need any tolerance parameters and the eigenvector computations are equivalent to the computational load of the regular PCA.
CSD values of the sensors' response in Fig. 1a using different PCAs are listed in Table 1, and CSD values of the sensors' response in Fig. 1b are listed in Table 2 Tables 1 and 2 will not change. Sensor 2 does not always exhibit anomalous behavior. In general, its response increases due to ammonia gas exposure but not decrease as fast as the other sensors when there is no gas as shown in Fig. 1. The regular PCA detects the anomalous behavior of Sensor 2 in 9 out of 28 data segments. On the other hand, the ℓ 1 -PCA via bit flipping and our ℓ 1 -kernel PCA detect the anomalous behavior of Sensor 2 in 10 out of 28 data segments. Moreover, the regular PCA and the ℓ 1 -PCA via bit flipping produce a false alarm in the second data segment (250 -499 seconds) of Sensor 1 as shown in Table 2, while our ℓ 1 -kernel PCA avoids this false alarm case. In conclusion, ℓ 1 -kernel PCA produces better results than the regular PCA, the recursive ℓ 1 -PCA [13] and ℓ 1 -PCA via bit flipping [14].
Anomaly Detection Using a Single Sensor: During the first three ammonia gas exposures, the sensor 3 positively responds but it also produces spikes up to 1200s as shown in Fig. 1a. Multisensor PCAbased anomaly detection cannot detect this behavior because only the sensor 1 works properly before 1200s. However, we compare the current sensor window of Sensor 2 with its neighboring data windows we can identify the anomalous behavior. We used = 5 data segments to construct the 5 × 5 covariance and ℓ 1 -kernel covariance matrices. In each data segment we have = 224 measurement. We used only the dominant eigenvector to estimate the data segments.
The regular PCA, the ℓ 1 -PCA via bit flipping, and the ℓ 1 -kernel PCA reconstructed waveforms do not have spikes and that is how we can identify the anomaly in sensor readings as shown in Fig. 4. Table 4 shows the CSD values of these methods and they correctly identified the anomolous segments.  Table 4: CSD values due to ammonia exposure of Sensor 3. The first three segments contain impulsive spikes.  Table 4. The reconstructed waveforms do not have spikes.

IV. CONCLUSION
In this paper, we presented a framework for detecting anomalous sensors and sensor measurements in a chemical sensory system using the ℓ 1kernel PCA. We collected data from three commercial Tin Oxide (SnO2) sensors by exposing them to ammonia in an environment-controlled experiment. The proposed ℓ 1 -kernel PCA is more robust than the regular PCA in our experiments. This is due to the fact that ℓ 1 -kernel PCA is related with the ℓ 1 -norm and it gives less emphasis to anomalous spikes in sensor measurements while computing the correlation matrix. The computational energy cost of the ℓ 1 -kernel PCA is much lower than the regular PCA on many processors. Because of low energy complexity, the ℓ 1 -kernel PCA can be implemented in low-cost edge devices directly connected to the chemical sensors.