Anti-Drift in Electronic Nose via Dimensionality Reduction: A Discriminative Subspace Projection Approach

Sensor drift is a well-known issue in the field of sensors and measurement. It has plagued the sensor community for many years. In this paper, we propose a sensor drift correction method to deal with the sensor drift problem. In contrast to domain regularized component analysis, the proposed method makes use of the class label of data in the source domain. Specifically, we propose a discriminative subspace projection approach for sensor drift reduction in electronic noses. The proposed method has multiple properties. (1) It inherits the merits of the subspace projection approach called domain regularized component analysis via introducing a regularization parameter to tackle the sample size imbalance problem. (2) The proposed method takes the source data label information into consideration, which minimizes the within-class variance of the projected source samples and at the same time maximizes the between-class variance. The label information is exploited to avoid overlapping of samples with different labels in the subspace. An efficient method based on generalized eigenvalue decomposition is employed to solve the optimization problem. Experiments on two sensor drift datasets have shown the effectiveness of the proposed approach.


Introduction
Electronic noses (E-noses) have been widely used in a wealth of domains, for instance, indoor and outdoor air quality monitoring [1,2], medical diagnosis [3,4], detection of polluting gases from vehicles [5,6], and fruit quality control [7].An electronic nose device is commonly comprised of a sensor array, a conditioning circuit, and a signal processing electronic system [8].In the past decades, tremendous efforts have been made to develop various gas sensors based on different sensing principles [9].For instance, a single generic tin oxide gas sensor was reported for the first time to discriminate among complex odors [10].Meanwhile, sensor conditioning circuits have also been improved for better gas sensing.As an example, a compact and low-cost electronic circuit was developed by Flammini et al. [11].This device is capable of supporting a wide range of resistive values, which is key to the realization of an electronic nose.
E-nose as a gas sensing device suffers from the sensor drift issue, which is a well-known problem in the field of sensors and measurement [12,13] and has plagued the sensor community for many years.Sensor drift can be generally divided into two categories [14].One category is called the first-order drift caused by aging and poisoning, and the other category is called the second-order drift caused by uncontrollable alterations of the experimental operating system such as temperature and humidity variations.In practical applications, it is difficult to clearly distinguish these two types of sensor drift.
Drift compensation or drift correction can be implemented in different aspects such as hardware updating [15] and signal improvement.In the signal processing level, approaches of drift correction are broadly divided into three categories, i.e., univariate methods, multivariate methods, and machine learning approaches [16].Typical univariate methods include frequency analysis, baseline manipulation, and differential measurements [13].The univariate methods correct the response of each sensor independently, which are simple but extremely sensitive to sample rate variations [17,18].Unlike the univariate methods, multivariate methods correct the responses of the entire sensor array [19,20].Both the univariate methods and multivariate methods explicitly do the sensor drift compensation.However, it may be impossible to compensate the sensor drift in many real-world applications.Therefore, machine learning methods are employed to implicitly do the sensor drift correction via learning from data distributions.In this paper, we focus on developing machine learning approaches for sensor drift compensation.
The main contribution of this paper is to propose a discriminative subspace projection method for drift reduction in electronic noses.The proposed method inherits the merits of the subspace projection method called domain regularization component analysis (DCRA) in [21].Moreover, the proposed method takes the source data label information into consideration, which minimizes the within-class variance of projected source samples and at the same time maximizes the between-class variance.The label information is exploited to avoid overlapping of samples with different labels in the subspace.Figure 1 illustrates the basic idea of the proposed discriminative domain regularization component analysis (D-DCRA).
The rest of the paper is organized as follows.Section 2 briefly provides the related works on drift correction in electronic noses and relevant machine learning approaches.Subsequently, Section 3 presents the proposed discriminative subspace projection.In section 4, the results of our method and the competing methods on two gas sensor datasets are compared.Finally, Section 5 concludes the work and discusses the future work.

Related work
In this section, the machine learning methods for anti-drift in electronic noses are firstly reviewed in Section 2.1.Subsequently, Section 2.2 provides a brief review of subspace learning methods.In Section 2.3 we present another quite related machine learning topic, i.e., domain adaptation, which is widely used in many applications in addition to sensor drift correction.

Anti-drift in E-nose using machine learning methods
Topic of anti-drift in E-nose using machine learning methods is receiving increasing attention in the past several decades.Vergara et al. [14] contributed an extensive gas sensor drift dataset including six gases with concentration ranging from 10 to 1000 ppmv.The dataset was collected over a period of three years using an array of 16 metal-oxide gas sensor.In addition, classifier ensembles as a drift compensation tool were employed to solve the gas recognition problem.Zhang et al. [22] proposed two drift compensation algorithms based on extreme learning machines [23].The proposed domain adaptation extreme learning machine achieved the gas classification by leveraging a limited number of labelled samples from target domain.The computational efficiency of domain adaptation extreme learning machine was comparable to that of classic extreme learning machines.Ziyatdinov et al. [24] proposed a drift reduction method called common principal component analysis (CPCA).This approach addressed the sensor drift issue by searching for components that are common for all gases.Zhang et al. [21] presented an unsupervised subspace learning approach for drift reduction.The proposed method aimed to find a subspace by minimizing the mean distribution discrepancy.The distribution difference between the source domain and target domain was very small in the subspace.The subspace search could be easily solved by eigenvalue decomposition.

Dimensionality reduction methods via subspace projection
The objective of subspace learning is to find a low-dimensional subspace through optimizing certain objective functions.Principal component analysis (PCA) is an unsupervised subspace learning method [25].PCA is an orthogonal linear transformation that converts a number of possibly correlated variables into a number of relatively small uncorrelated variables.The data are projected along the directions of maximal variance.Each principal component accounts for as much of the variability of the original data as possible.Locality preserving projection (LPP) is another unsupervised method on the manifold [26].LPP optimally preserves the neighborhood structure of the data set and is able to discover the nonlinear structure of the data manifold.Compared with PCA and LPP, Linear discriminant analysis (LDA) [27] is a supervised dimensionality reduction method, which maximizes the between-class variance and at the same time minimize the within-class variance.A graph embedding method was proposed as a general framework for dimensionality reduction called Marginal Fisher Analysis (MFA) [28].MFA is also a supervised subspace approach, which overcome the limitations of LDA by designing two graphs to characterize the within-class compactness and between-class separability.

Domain adaptation
Classic machine learning approaches assume that the distribution of training set is consistent with that of test set.However, the distribution discrepancy between the training set and test set is a common issue in various real-world applications such as text classification [29], sentiment analysis [30], crosssystem recommendation [31], and indoor WiFi localization [32].Transfer learning is a machine learning technique proposed to alleviate the distribution discrepancy issue [33].Domain adaptation [34,35] is a branch of transfer learning, which aims to improve the algorithm performance in the target domain by utilizing the information from both source and target domain.Pan et al. [36] proposed a domain adaptation method called transfer component analysis (TCA) by minimizing the Maximum Mean Discrepancy (MMD) [37].Transfer components span a subspace in which data distributions of the source domain and target domain are close to each other.TCA is further extended in a semi-supervised learning setting [36].Jiang et al. [38] presented an algorithm called integration of global and local metrics for domain adaptation learning (IGLDA).Unlike TCA, IGLDA take the source data label information into consideration as well as minimizing MMD.Classifier and feature-invariant subspace are commonly learned independently in domain adaptation problem.Long et al. proposed a unified framework to achieve both distribution adaptation and label propagation named Adaptation Regularization based Transfer Learning (ARTL) [39].Table 1 provides the list of abbreviations used in this paper.

The proposed discriminative domain regularized component analysis (D-DRCA)
In this section, the notation employed throughout this paper is given in Section 3.1.Subsequently, Section 3.2 provides a brief introduction of domain regularized component analysis.Section 3.3 presents the proposed discriminative domain regularized component analysis, which is an improved extension of DRCA.

Notations
Table 2 (2) Table 2 list the notations defined in this paper.

Domain regularized component analysis (DRCA)
Domain regularized component analysis [21] is an unsupervised method without using any data label information.The main idea of DRCA is to learn a projection matrix such that the projected sample set of source domain has similar probability distribution as that of target domain.The projection matrix is computed by optimizing two terms simultaneously including 1) minimize the mean distribution discrepancy (MDD) between the project sample set   and the project sample set   and 2) maximize the variance (i.e., energy) of both source data and target data.

Mean distribution discrepancy (MDD):
The mean distribution discrepancy is defined as the distance between the mean of projected samples of source domain   ̅ = Variance of projected samples: The maximization of variance of projected source samples is given by max Similarly, the maximization of variance of projected target samples is given by max The DRCA incorporates the mean distribution discrepancy term and variance term together, which can be formulated as where  denotes the trade-off parameter to avoid bias learning between source and target domain.The problem can be easily solved by eigen decomposition.

Discriminative domain regularized component analysis (D-DRCA)
The proposed discriminative domain regularized component analysis inherits the merits of DRCA.i.e., D-DRCA is also to optimize the MDD term and the variance term.The MDD term can be considered as a global metric [38].However, DRCA ignores the data label information in the source domain.It is desirable to make the projected samples with different labels more discriminative.Therefore, D-DRCE takes the label information into consideration by optimizing two additional terms including 1) minimize within-class distance of projected samples in source domain and 2) maximize between-class distance of projected samples in source domain.
Let  be the number of labels and   is the number of samples of the class where , , and  are the trade-off parameters.To make parameters easily tunable, each term in the numerator in equation ( 9) in normalized.The solution of equation ( 9) is not unique [21].The problem is reformulated in the following to guarantee a unique solution Lagrangian method is employed to solve the optimization problem is equation (10).The Lagrangian (, ) associated with equation (10) Therefore,  can be obtained via eigenvalue decomposition.The optimal projection matrix is given by ) where   (1 ≤  ≤ ) are the eigenvectors corresponding to the first  largest eigenvalues.The proposed D-DRCA algorithm is depicted in Figure 2.

Results and discussion
In this section, experiments are performed to demonstrate the effectiveness of the proposed approach on two sensor drift datasets including one form UCSD and the other from CQU. the proposed discriminative domain regularized component analysis is compared with DCRA and other competing methods.

Experiment on sensor drift dataset from UCSD
The sensor drift dataset from UCSD was collected by Vergara et al [14].A total of 13910 samples were collected using an electronic nose consisting of 16 gas sensors.The collection period was 36 months from January 2018 to February 2011.There were totally six types of gases at different concentrations to be detected, which include Acetaldehyde, Acetone, Ammonia, Ethanol, Ethylene, and Toluene.All the samples were split into ten batches based on the sample acquisition time.The detailed information regarding the acquisition time and the sample count of each batch is summarized in Table 3. Feature extraction were performed for each sensor.The feature number of each sensor is 8, resulting in 128dimensional feature vector for each sample.
Following the experimental protocol in [16] and [14], batch 1 is adopted as the samples in the source domain with label information.Other batches are adopted as the samples in the target domains whose labels need to be predicted.Figure 3 depicts the 2D projection of the samples in batch 1~10.The timevarying sensor drift can be obviously observed, i.e., the distribution difference between the source domain and target domain is time-dependent.
There are 13 competing methods in total.PCA and LDA are baseline subspace approaches.Component correction based principal component analysis (CC-PCA) is a multivariate method [20].Multi-class support vector machine with RBF kernel (SVM-rbf), the geodesic flow kernel (SVM-gfk), and the combination kernel (SVM-comgfk) are methods presented in [16].ML-rbf and ML-comfgk are semi-supervised methods with manifold regularization [16].Orthogonal signal correction (OSC) is another multivariate method similar to CC-PCA, which aims to find the undesired component through searching for subspace that is orthogonal to the target variable [40].Both generalized least squares weighting (GLSW) [41] and direct standardization (DS) [42] are calibration transfer methods.DRCA is a recent subspace method proposed by Zhang et al [21].
The recognition accuracy of the sensor drift dataset from UCSD is shown in Table 4.The proposed D-DRCA algorithm outperforms the competing methods in terms of average classification accuracy achieving the highest value of 73.80%.Moreover, the proposed D-DRCA performs best in four drift correction tasks, i.e., batch 1→ batch 3, batch 1→ batch 5, batch 1→ batch 6, and batch 1→ batch 10.

Experiment on sensor drift dataset from CQU
The sensor drift dataset from CQU was collected by Zhang et al [21].A total of 1604 samples were collected using multiple E-nose devices of the same model.Therefore, the sensor drift might be caused by device differences in this dataset.The dataset is comprised of three batches, which includes batch master collected five years earlier than the batches slave 1 and slave 2. There were totally six types of gases to be detected, which include Ammonia, Benzene, Carbon monoxide, Formaldehyde, Nitrogen dioxide and Toluene.The detailed information regarding the sample count of each batch is summarized in Table 5. Feature extraction were performed for each sensor resulting in 6-dimensional feature vector for each sample.
Following the experimental protocol in [21], master is adopted as the samples in the source domain with label information.Other batches including slave 1 and slave 2 are adopted as the samples in the target domains whose labels need to be predicted.Figure 4 depicts the 2D projection of the samples in batch master, slave 1, and slave 2 respectively.The time-varying sensor drift can be obviously observed, i.e., the distribution difference between the source domain and target domain is time-dependent.
There are 6 competing methods in total.Specifically, the competing methods contain SVM, PCA, LDA, calibration transfer methods (GLSW and DS) and DRCA.The recognition accuracy of the sensor drift dataset from CQU is shown in Table 6.The proposed D-DRCA method outperforms the competing ones in terms of average classification accuracy achieving the highest value of 65.23%.Moreover, the proposed D-DRCA performs best in both individual tasks, i.e., master → slave 1, and master → slave 2.

Parameter sensitivity analysis
There are four parameters to be tuned in the proposed D-DRCA method.Specifically, the parameters contain the dimension of subspace , regularization coefficient , within-class coefficient , and betweenclass coefficient .The parameter  is tuned from the set {2  ,  = 0, 1, 2, 3, 4, 5, 6, 7} for the case of the UCSD dataset and from the set {1, 2, 3, 4, 5, 6, 7} for the case of the CQU dataset.For both datasets, the regularization coefficient  , within-class coefficient , and between-class coefficient , are all tuned from the set {10  ,  = −2, −1, 0, 1, 2}. Figure 5 shows the classification accuracy of the proposed D-DRCA method on UCSD dataset by tuning the dimension of subspace  and regularization coefficient . Figure 6 shows the classification accuracy of the proposed D-DRCA method on UCSD dataset by tuning within-class coefficient , and between-class coefficient .Similarly, Figure 7 shows the classification accuracy of the proposed D-DRCA method on CQU dataset by tuning the dimension of subspace  and regularization coefficient . Figure 8 shows the classification accuracy of the proposed D-DRCA method on CQU dataset by tuning within-class coefficient , and between-class coefficient .

Conclusion
In this paper, we propose a discriminative domain regularized component analysis (D-DCRA) approach for sensor drift compensation problem.The proposed method is inspired by machine learning approaches including domain adaptation and linear discriminant analysis.D-DCRA has the advantages of the previous method called DCRA, such as, it can be easily solved by eigenvalue decomposition.In addition, the proposed approach takes the label information in the source domain into account as well.The exploitation of label information can avoid overlapping of the samples with different labels in the subspace.The effectiveness of the proposed approaches has been verified on two public sensor drift datasets.Future work could introduce the kernel method to reduce high-order distribution difference between the source and target domain.Another potential direction is to employ graph embedding method to generate the domain-invariant features.
(1 ≤  ≤ ) in the source domain.The sample set of the l-th class in the source domain are denoted as    = [  1 , ⋯ ,     ].The projected sample set of the l-th class in the source domain are denoted as    = [  1 , ⋯ ,     ].Within-class distance of projected samples in source domain: The minimization of the withinclass distance of projected samples in source domain is defined by mean of the samples of the l-th class in the source domain and    ̅ denotes the mean of the projected samples of the l-th class in the source domain.Between-class distance of projected samples in source domain: The maximization of the between-class distance of projected samples in source domain is given by within-class distance and the between-class distance, D-DRCA extends DRCA with the formulation given by max  [  (

Figure 1 .
Figure 1.The illustration of discriminative domain regularization component analysis.(a) samples in source and target domain with different distributions.(b) The distribution difference between source and target is alleviated by DRCA, but samples with different labels overlapping in the subspace.(c) D-DRCA method reduce the distribution difference and avoid the overlapping problem simultaneously.

Figure 4 .
Figure 4. Samples in batch master, slave 1 and slave 2 are projected to 2D subspace using principal component analysis respectively.The time-varying sensor drift can be easily observed.

Figure 5 .
Figure 5. Classification accuracy of the proposed D-DRCA method on UCSD dataset by tuning the dimension of subspace  and regularization coefficient .

Figure 6 .
Figure 6.Classification accuracy of the proposed D-DRCA method on UCSD dataset by tuning withinclass coefficient , and between-class coefficient .

Figure 7 .
Figure 7. Classification accuracy of the proposed D-DRCA method on CQU dataset by tuning the dimension of subspace  and regularization coefficient .

Figure 8 .
Figure 8. Classification accuracy of the proposed D-DRCA method on CQU dataset by tuning withinclass coefficient , and between-class coefficient .
the sample set of the l-th class in source domain    The projected sample set of the l-th class in source domain    the i-th sample of the sample set of the l-th class in source domain    the i-th sample of the projected sample set of the l-th class in source domain    ̅̅̅ the mean of the samples of the l-th class in source domain    ̅ the mean of of the projected sample set of the l-th class in source domain  the number of labels provides the list of notations employed in this paper.The sample set of source domain is denoted as   = [  1 , ⋯ ,     ] ∈ ℝ ×  and the sample set of target domain is denoted as   = [  1 , ⋯ ,     ] ∈ ℝ ×  .  and   are the number of samples in source domain and target domain, respectively. is the dimension of original space and  is the dimension of the subspace.The projection matrix is represented as  ∈ ℝ × .The projected samples in the source domain is given by   = [  1 , ⋯ ,     ] =     =   [  1 , ⋯ ,     ] ∈ ℝ ×  , (1) and the projected samples in the target domain is given by

Table 1 .
List of abbreviations.

Table 2 .
List of notations.  the sample set in source domain   the sample set in target domain   ̅̅̅ the mean of samples in source domain   ̅ the mean of samples in target domain    the i-th sample of the sample set in source domain    the i-th sample of the sample set in target domain  the projection matrix   the projected sample set in source domain   the projected sample set in target domain   ̅ the mean of the projected samples in source domain   ̅ the mean of the projected samples in target domain    the i-th sample of the projected sample set in source domain    the i-th sample of the projected sample set in target domain   the number of samples in source domain   the number of samples in target domain   the number of samples of the class  in source domain

Table 3 .
Gas sensor drift dataset from UCSD with period of collection and sample count of different types of gases.

Table 6 .
Recognition accuracy of the sensor drift dataset from CQU. Bold values represent the best results.The proposed D-DRCA method outperforms the competing ones in both tasks.