Detecting Gas Vapor Leaks Using Uncalibrated Sensors

Chemical and infra-red sensors generate distinct responses under similar conditions because of sensor drift, noise or resolution errors. In this paper, we develop novel machine learning methods for detecting and identifying VOC and Ammonia vapor from time-series data obtained by uncalibrated chemical and infrared sensors. We process time-series sensor signals using deep neural networks (DNN). Three neural network algorithms are utilized for this purpose. Additive neural networks (termed AddNet) are based on a multiplication-devoid operator and consequently exhibit energy efficiency compared to regular neural networks. The second algorithm uses generative adversarial neural networks so as to expose the classifying neural network to more realistic data points in order to help the classifier network to deliver improved generalization. Finally, we use conventional convolutional neural networks as a baseline method. Our findings indicate that using raw time-series data obtained from uncalibrated sensors and processing them using deep-learning-based methods yield better results than using hand-crafted feature parameters.


I. INTRODUCTION
Ammonia and Volatile organic compounds (VOCs) are associated with numerous health problems.Although VOCs and Ammonia are naturally occurring, they can nonetheless cause serious health issues in high concentration.For example, exposure to Ammonia in high concentration causes harm to skin, lungs and eyes.Methane and other VOC compound leaks contribute to global warming.VOC compounds such as benzene and toluene are carcinogenic [1]- [4].
In this paper, we consider both an infrared (IR) and a chemical sensor system for early detection and, thus, prevention of dangerous gas leaks 1 .Mobile infrared and chemical sensors can be part of an open air cyber-physical system (CPS) [5].We use the time-series data obtained by the sensors in order to detect accidental and/or deliberate gas vapor leaks.The main contribution of this paper centers on the exploitation of the This work is being supported in part by NSF grants 1739396 (UIC), 1739451 (ASU), 1739390 (UD) and 1739684 (UCSD) .The authors, Badawi and Cetin, additionally thank Nvidia for an equipment grant. 1 This work was presented in part at the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, May 2019 [5].
time-series data that sensors produce rather than conventional reliance on a single or a couple of sensor readings for leak detection.
Some VOC gas vapors such as ethane and ammonia absorb infrared light in the Long Wave Infrared (LWIR) while others such as methane in the Medium Wave Infra-red (MWIR) bands.Absorbance by ammonia of infra-red light at different wavelengths is shown in Fig 1 .We can easily observe the existence of VOC gas vapor using Infra-red (IR) cameras in open air as shown in Fig. 2. In this figure, a dark smoke-like region denotes the image of VOC gas vapor.However, the distance between the sensor and the source, and infrared reflections from the background significantly affect the recorded level [6], [7].
Conventional optical devices, such as gas chromatographs and MWIR cameras, are generally expensive.A cheaper alternative would be the use of IR sensors and chemical gas sensors.Yet chemical gas sensors incur degradation in their sensitivity over time.Consequently, identically manufactured sensors are likely to yield significantly different responses upon exposure to gas analytes under identical conditions [8]- [12].This problem is known in the literature as the sensor drift problem.
Causes of sensor drift can be summed up by two phenomena, namely, the physical changes in the structure of the sensor and the changes in the operating environment.The former case is known as first-order sensor drift.It is caused by sensor aging or by sensor "poisoning" 2 .Unfortunately, neither poisoning nor aging are reversible as the physical structure of the sensor will have been permanently damaged or at the least affected.The latter case is known as second-order sensor drift and is caused by external uncontrollable environmental changes, such as temperature and humidity variations.In this regard, the sensor response will be different from that expected from the original settings.Consequently, any decision thresholds that are optimal prior to sensor drift are likely to exhibit suboptimal sensitivity and/or specificity once the aforementioned changes take place.Similarly, while it is not possible to detect the concentration of the gas using MWIR and LWIR sensors in open air, it is possible to record a time-varying signal and detect the existence of gas leakage using IR sensors as shown in Fig. 3 using a machine learning algorithm such as a neural network.The sensor signal exhibits sudden jumps and fluctuations due to gas vapor leak.Uncalibrated IR sensor intensity measurements suddenly drop from 95 to 70 and fluctuate because of wind as shown in Fig. 3.
In this paper, we analyze the temporal sensor signals using convolutional, additive neural networks and the discriminator of a generative adversarial network (GAN) to detect and classify VOC gas leaks and other dangerous gas emissions.The proposed analysis is applicable to both Chemically-sensitive Field Effect Transistors (ChemFETs) and Electrochemical Impedance Spectroscopy (EIS) and infra-red sensors as they all produce time-varying signals.
The rest of the paper is organized as follows.Section II describes the machine learning algorithms used in this paper.Section III presents experimental results.We use an infrared data set and two publicly available chemical sensor drift data sets obtained at the University of California at San Diego (UCSD) [11] and [17].The paper finishes by offering a brief set of conclusions in Section IV.

II. DEEP LEARNING ALGORITHMS FOR IR AND CHEMICAL SENSOR DATA PROCESSING
In this paper, we consider three tasks.Task 1 is infrared sensor-based gas-leakage detection.In tasks 2 and 3, we identify different types of gas analytes.Our first network is an energy-efficient network, namely, an additive neural network, which is a neural network that performs no vector-multiplication except in its last layer.Our second neural network is the discriminator of a generative adversarial neural network, to which we refer shortly as DiscGAN.

A. Convolutional Neural Networks
Convolutional neural networks (or ConvNets) have been extensively used in computer vision [18], [19] and timeseries data analysis [20].In ConvNets, convolutions (or local correlations) between the inputs and the filter weights are used to extract local features at different scales in subsequent layers.

B. Additive Neural Networks (AddNet)
Despite their ability to learn and recognize images and signals, deep learning algorithms are computationally expensive.This is attributed to the large number of add-multiply operations needed to be implemented in order to realize convolutions.This poses a problem when it comes to using such methods on platforms where energy is limited.As a result, simpler and, thus, more efficient algorithms are generally required to implement computationally expensive deep learning algorithms in such cases.
Nevertheless, there have been attempts to leverage convolutional neural networks across energy-limited devices by means of methods that aim to either implement fewer dotproduct operations, or to replace dot-product operations with computationally simpler operations.Binarizing the weights and/or the activations results in replacing real-number multiplication operations with binary logical operations when realizing convolution, as in the case of BinaryConnect [21], XNOR-Net [22] and Binarized Neural Networks [23].
An additive neural network (AddNet) falls under the second category, i.e., replacing real-valued multiplication operations in vector-vector and matrix-vector product operations by special addition operations.The new "product" operation comprises binary sign calculation, unsigned addition and regular addition.
In what follows, we define the scalar version of our binary operation and extend it straightforwardly to its vector operation.In this regard, let x and y ∈ R, the multiplicationdevoid (abbreviated md) operation denoted by ⊕ and defined as follows: x ⊕ y := sgn(x.y)(|x| where sgn denotes the signum function.Alternatively, we can express the ⊕ operation as follows: x ⊕ y := sgn(x)y + sgn(y)x This is because x = sgn(x)|x|.One key property of the md operation is that it preserves the sign of regular multiplication operations [24], [25].We define the vector version of the md operation as follows.Let x and w be two vectors in R N .The md dot "product" is defined as: It can be seen that the md operation expressed in Eq. 3 requires no real-valued multiplication whatsoever.As such, instead of using add-multiply operations as in an ordinary dot product, we use ordinary addition and addition with sign multiplication in the md vector operation.Furthermore, we can restrict the operands x i and w i to be 8-bit numbers in order to speed up the vector addition operations.Another property of the md operation is that it induces the 1 norm.This is shown as follows: In the context of neural networks, we use convolution and matrix-vector multiplication operations in convolutional and dense layers, respectively.In AddNet, we replace the aforementioned dot-product operations with the md vector product.The feed-forwarding pass in dense layers in a neural network can be expressed as follows: where the superscript denotes the layer index, the subscript the neuron index, w l i the weights connecting the output of the previous layer (the (l − 1)st layer) to the ith neuron, o l i the output of the ith neuron, and bold o l−1 the vector output of the previous layer.φ is the non-linearity function applied element-wise and, finally, b l i denotes the bias term added to the pre-activated response w l i T o l−1 .Similarly, we can define AddNet layers by replacing the dot-product w l i T o l−1 by our md operator as follows: Since the md operator is additive, it will result in a larger output than ordinary multiplication does when either of the operands is of small magnitude, e.g.3⊕0.1 = 3.1 > 3×0.1 = 0.3.In the context of neural networks, the layer outputs and the weights are usually small values.As a result, the responses of the md layers will be of larger variance than those of the regular layer.This poses a problem in deep layers, where the dimension of the dot-product is quite large.In other words, if the depth of a convolutional layer is 64 and the kernel size is 3 × 3, the convolution operations will carry out dot-products between two vectors, each of which ∈ R 3×3×64 .In the case of the md layer, this will cause the output to exhibit inordinately high magnitudes.In order to overcome this, we introduce a scaling factor α. As such, the feedforwarding pass in Eq. 6 becomes The scaling factor α l i enables us to control the range of the output prior to applying the activation function φ and, thus, leads to a controlled range of responses in subsequent layers.Note that the scaling by α l i in Eq. 7 implies real-valued multiplication.Nevertheless, it requires only one real-valued multiplication per neuron.Therefore, carrying out scaling is not computationally expensive.Numerous options exist for selecting the scaling factor α l i .One possibility may be the setting of α l i to , i.e., the reciprocal of the 1 norm of the associated weights.Another option would be having α l i be trainable by backpropagation.The latter delivers more flexibility for the model.
Nevertheless, batch normalization is a common practice in neural networks and has shown to be quite effective in accelerating the training of deep networks [26].Therefore, one can simply apply batch normalization to the pre-activation responses in AddNet.Such normalization eliminates the need to carry out scaling by α l i as it will be subsumed by the scaling induced by batch normalization.
The proof of AddNet with linear and/or ReLU activation functions satisfying the universal approximation property over the space of Lebesgue integrable functions can be found in [27].
As for training the md layers by backpropagation, it is worth mentioning that the derivative of the signum function used in the definition has to be computed.This is because d sgn(w) dw = 2δ(w), where δ is the Dirac-delta function.In practice, this means that the derivative of the signum function is zero almost everywhere except when w = 0.The partial derivative of the md operator w.r.t.w is: We approximate the derivative of the signum operator using the hyperbolic tangent as follows: where sech(x) = 2 e x +e −x is the hyperbolic secant function, and a is a hyperparameter indicating how sharp the hyperbolic tangent is.The larger the hyperparameter a is, the closer tanh is to the signum function.Figure 4 shows the approximate derivative of the signum function for a = 10.As we can see from Fig. 4, the derivative has high magnitude for w values close to zero, whereas it is effectively zero for large values.This can be seen as allowing small weights to have finer updates than larger weights and thus allowing them to change their sign more often during training.We found empirically that this approximate derivative computation provides satisfactory convergence rates in Google's Tensorflow software.

C. DiscGan (Discriminator of GAN as Classifier)
Generative Adversarial Networks (GAN) have become the benchmark in image synthesis [28], [29].A typical GAN has a generative network, which attempts to generate images (or data) resembling real images from noise input, and a discriminator network, which attempts to discriminate between the real images and those synthesized by the generator.The generator and the discriminator are optimized in an adversarial scheme, i.e., the generator tries to fool the discriminator by the synthetic data it produces, and, in turn, the discriminator tries to counteract the generator by discriminating between the real data samples and the fake ones.
In this paper, our aim is not to synthesize realistic data but rather to make use of the adversarial nature of GAN training in order to obtain a discriminator network capable of classifying the input with an unbalanced set of training data.As the recordings of gas leak data may fall short of the clean air recordings for this purpose, we have the generator of the GAN to compensate for the data set with a smaller number of data instances by producing "artificial" gas leak data during training.
In this regard, we perform a two-phase training of the GAN.First, we carry out adversarial training of both the discriminator and the generator using the data of one of the classes.
In the second phase, we use data from both classes and carry supervised binary-classification training of the discriminator which now acts as a classifier.In this setting, let x i represent the ith data instance of one of the classes.In this case, x i s denote the gas leak recordings (or the anomalous class).Let z be a random noise vector, e.g.Gaussian noise or uniform noise.Let D be the discriminator and G be the generator, with each having a set of parameters θ D and θ G , respectively.In the adversarial-training phase, we seek to optimize the following loss function: where D(x i ) is the soft prediction result of the discriminator corresponding to data point x i .From the discriminator perspective, the prediction output D(x i ) should be close to 1 because x i is "real".The generator G produces "fake" data signals from noise vector z i , that is G(z i ), and the prediction D(G(z i )) should be close to zero because G(z i ) is an artificial data instance.The generator, on the other hand, will try to produce G(z i ) that will be assigned the prediction D(G(z i )) close to 1. Once training the first stage is accomplished, we move on to the second stage of supervised training of the entire training data, in which the cost function we seek to minimize is the regular binary cross entropy function CE expressed as follows: where t i ∈ {0, 1} denotes the true class of x i .
When there are multiple classes we can still use the discriminator of a GAN with a slight modification of the loss functions.In this regard, let us assume that there are N -classes.In this case, the one-hot encoded label for each input is an Ndimensional vector, with all entries equal to zero, except for the kth entry, where k is the true class.During training, the discriminator (or classifier) will minimize the cross entropy of the softmax layer applied at the output layer (N logits).The generator G will attack the output of the kth node.Here we consider the output of the kth node to be the logit of a binary class, i.e., the adversarial loss criterion becomes: where D k G(z) is the discriminator sigmoidal response of the kth node, i.e., we apply the sigmoid function to the logits before taking the logarithm in determining the loss.Note that the loss here is different from the multi-class case, in which we consider multi-class logits, i.e., we use sigmoid normalization instead of the softmax normalization.In practice, since we do a mini-batch update, we take the average of the loss functions and minimize the loss functions based on the minibatch gradients.

A. Infra-red VOC Dataset
Our first data set consists of infra-red imaging signals of VOC gas leaks in open air and clean air recordings.Specifically, we have two classes of discrete-time signals corresponding to VOC gas leaks and clean air, respectively.Each signal is a time series containing 50 samples corresponding to two seconds of recording with a sampling rate of 25 samples per second.The recorded value varies in open air because of background temperature variations and low resolution error as it can be observed in Fig. 3. Furthermore, the sensors may not be calibrated in practice, so their sensitivity may differ across time.We gathered about 30,000 VOC gas leak and 30,000 clean air data instances.
The images are obtained using an MWIR camera produced by FLIR systems and Infrared Cameras Inc. [15], [16].VOC gas absorbs the infra-red light appearing as a white cloud in the black-hot mode infra-red image as shown in Fig. 2. In these videos, a gas leak erupts from the source with the gas spreading out as time progresses.We manually selected regions of interest and assigned normal event designations to temporal measurements where no gas is present throughout these series, while designating the rest as anomalous events.
We used min-max normalization in order to scale signal data points between 0 and 1.The normalized signal x is obtained as follows , n = 0, 1, ... , 49 where max(x) and min(x) represent the maximum and minimum values of a given infrared signal x, respectively.We used convolutional neural networks with the architecture specified in Table I.In order to obtain more temporal data points, and in order to ensure the network is translationinvariant to the gas eruption location, we chose to randomly crop the input data into temporal signals of size 32 each.We divided our data set into three disjoint sets.The training data consists of 8,000 recordings of each class.Another set of 8,000 recordings of each class is used as the validation data set.The rest of the data was reserved for testing.We trained  our networks using the RMSProp optimizer algorithm [30].
We tested the hypothesis of whether dropout helps achieve better results [31] by using a dropout rate of 50%.As for the GAN approach, we used a generator which is a multilayer perceptron (MLP) with one hidden layer of size 256.The regular convolutional neural network and AddNet exhibit comparable results.We obtained an accuracy of 99.8% for nogas data and 99.7% for gas-leak data for a regular ConvNet.AddNet attained a recognition rate of 98.9% for no-gas data and 99.3% for gas-leak data.
In the second set of experiments, we assumed that we have an unbalanced data set.In practice, we may not have VOC or ammonia gas leak recordings as clean air.We trained the models with only 50 recordings of gas leak signals against 8,000 recordings of clean air recordings.The test data set contains 14,000 recording instances of VOC gas leaks and clean air recordings.Classification results are also summarized in Table II.AddNet produces the best results but the discriminator of the GAN Network is also quite close to AddNet.The confusion matrix of the results of the best model is given in Table III.
We also investigated pruning the weights in both AddNet and ConvNet during inference.In this regard, we discard the magnitudes of the smallest magnitude weights while retaining their sign information.We keep the bias coefficients and the coefficients of the last layer intact.Results of various pruning rates are shown in Table IV.Apparently, in AddNet, we can

B. Gas Sensor Array Recordings under Dynamic Gas Mixtures
We consider a gas type identification problem, in which we have three types of gases to identify, namely, CO, Ethylene and Methane.We used the data set obtained by Fonollosa et al. [17].The data set consists of time-series measurements of a sensor array of 16 chemical metal-oxide sensors under exposure to two different kinds of gas mixtures, ethylene and methane in air, and ethylene and CO in air.Sensors were exposed to volatile organic compounds at different concentration levels under tightly-controlled operating conditions during the experiment.The data is obtained at a sampling frequency of 100 Hz.The 16 chemical sensors are of four different types, with each having four identical sensors. 3Furthermore, switching between different mixtures of VOCs may occur too fast making it challenging if not impossible for the sensors to reach steady state.This makes identifying the gas analytes difficult using a machine learning method.
The recorded sensor data is deposited to the UC Irvine Machine Learning Repository online in the form of two long time-series.We extracted portions of the time series such that the sensor array is exposed to one type of analyte at a given time.Each recording corresponds to 100 seconds of data.We observed that it is enough to sample the sensor response every 2 seconds.Example sensor response signals to CO, ethylene and methane gas vapor exposures are shown in Fig. 5.Each sub-figure contains four different sensor responses.
We gathered a total of 215 instances from the raw recordings, in which we have 49 CO, 116 ethylene and 50 methane time-series signals.Each instance has 50 time measurements for each sensor.Thus, a total of 50 × 16 measurements per instance are used.Since the number of instances in the data set is small, we employed cross validation with holdout method, 3 For more details, the reader may refer to the original paper [17].Since the sensors are of different types, and since even the sensors of the same type produce different temporal responses, we process the temporal sensor data using 1-D convolutional networks.The input to each neural network is a matrix of size 40 × 16, for 40 time instances and 16 sensors.
We used ReLU non-linearity between layers.Our loss function is the cross-entropy with the softmax operator.We used the RMSProp optimizer to carry out the parameter updates during training.We trained a regular ConvNet and an AddNet of the same architecture as in Table V

C. Chemical Gas Sensor Array Drift Dataset
The third data set is the publicly available chemical VOC gas sensor drift data set compiled by Vergara et al. at UCSD [11].The data set was obtained by exposing an array of 16 distinct chemical sensors to 6 types of gas mixtures (ammonia, acetone, ethylene, ethanol, toluene and acetaldehyde) at a variety of concentration levels.Each data record is a vector time series.Vectors contain 8 feature parameters extracted from the sensor time series signals during a gas release experiment, conducted over a period of three years at UCSD.The feature parameters include the steady state resistance value and the normalized resistance change.The remaining 6 parameter features are the maxima and minima of the exponential moving average (ema α ) transform governed by the following input-output relation: where r[k] is the resistance value at time step k, and y[k] is the transformed value after applying the ema filter.The maxima and minima features are reported for α values equal to 0.1, 0.01 and 0.001 over an entire experiment.These ema features have distinct time constants for different α values, as they contain temporal information.Unfortunately, the raw time-domain sensor signals are not available in this data set.Since there are 16 sensors, a total of 16 × 8 = 128 feature values are recorded per experiment.The data set is divided into 10 batches ordered chronologically.Full details about the experiment and the data set can be found in [11].
We carried out our classification tasks by training neural networks for N = 2 batches and testing on successive batches.This is identical to the sensor drift estimation approach given in [11].Because feature values have huge variances, we opted We trained an MLP model with two hidden layers, each with 512 output units, and an output layer.Furthermore, we trained the network for 100 epochs using the RMSProp optimizer [30].We applied a dropout rate of 20% and used a batch size of 128 in order to prevent complex co-adaptation.To augment the data, we added a zero-mean Gaussian noise with standard deviation of 0.1.
We also tried combining AddNet with the GAN approach, in which case, the discriminator is an AddNet and the generator is a regular MLP.The architecture of the network is the same as that of the GAN we use.Furthermore, we tried utilizing the other batches by passing them to the classifier and carrying out backpropagation according to their guessed labels.This is done in order for the network to utilize the correctly guessed labels so that it could be helpful in improving the classification accuracy for the mis-classified data point.A numerical comparison of the proposed methods to the SVMclassifier ensemble used in [11] is given in Table VIII.In general, the AddNet-MLP, the MLP and the multi-class GAN discriminator produce better sensor-drift compensated results than does the SVM based method.
As we can see from Table VIII, using DiscGAN (with a regular discriminator or AddNet), we were able to obtain better recognition rates for later batches (batches 7, 8 and 9).This could be attributed to the fact that the generator did expose the discriminator to novel unseen points in the data space during training.Therefore, the discriminator would have been able to learn additional meaningful features.As for AddNet, it can perform as well as the regular MLP, either in conventional binary classification or in the case of DiscGAN.It is also worth noting that the domain adaptation scheme we employed did not yield any significant improvements.We believe that improved classification results would have been attained, if the entire temporal sensor signal set were at our disposal as input to our algorithms.In this paper, we have introduced a variety of deep-learning based algorithms and applied them to VOC gas and ammonia vapor leak detection and gas type identification problems.The first algorithm is based on AddNet.In AddNet, we replace the computationally expensive dot-product operations in deep neural networks with a modified addition operation that retains the sign of multiplication.Its computational efficiency enables AddNet to be used in embedded and mobile systems, in which we envision a smart gas leakage monitoring and detection CPS being reliably used.
The second algorithm is called DiscGAN, which uses the discriminator of a generative adversarial neural network as a classifier in a bid to enhance the recognition capabilities of the system.The generator part helps in exposing the discriminator to realistic synthetic data points that can be helpful in classification tasks.
We considered three detection and classification tasks.The first task is to detect VOC gas leakage from temporal IR data.Our proposed algorithms achieved accuracy rates of 97 − 98%.The second task we considered is to identify gas types using temporal data of sensor arrays.We were able to attain recognition rates of 96.1 − 96.5%.Our third task was to identify gas types using non-temporal data, where the readings are obtained for the same sensor array over a period of 36 months.The sensor measurements suffer from degradation due to sensor drift.
Although our gas identification accuracy results for the early batches in the last data set were quite high, the degradation incurred in later batches resulted in significant identification accuracy drop.We believe that the non-temporal global features reported for the experiments are highly affected by sensor drift.As a result, the features are not sufficiently expressive of the sensor responses for different gas analyte types.Based on our high recognition rates for two temporal data sets considered in this work, we conclude that using sensor measurements in their temporal presentation, and feeding these recordings into deep neural network algorithms, achieves better performance as these algorithms learn discriminative features by themselves with no need to hand-craft features that could be sensitive to error as in the case of the sensor drift problem.

Fig. 3 :
Fig. 3: Example infrared sensor time-series data near a gas leak.As one can see, it is not possible to find a threshold that can isolate leak-free intensity signals from those corresponding to a gas leak.Furthermore, the intensity of leak-free point signals is time-varying.This demonstrates the effect of noise, resolution and lighting factors that in turn lend further complexity to the task of distinguishing between the two classes of signals.The two sets of examples are distinguished by dotting.

Fig. 4 :
Fig. 4: The derivative of tanh(aw) = a sech 2 (aw) as a function of parameter w, with a set to 10.

Fig. 5 :
Fig. 5: Time-series data generated by four different sensors under exposure to different type of gases (50 time samples for each sensor).

TABLE I :
Architecture of the convolutional neural network used in classifying the data set of Sec.III-A

TABLE II :
Accuracy results for infra-red VOC data.Classifiers are trained with only 50 VOC gas leak recordings vs 8000 clean air recordings.

TABLE III :
Confusion matrix for the best achieving neural network (AddNet with no dropout) over the testing data.The true positive rate (sensitivity) is 97.3%.The true negative rate (specificity) is 99.1%.4%) without severely degrading performance.On the other hand, the magnitude information is quite critical in the case of a regular ConvNet.These results clearly show the advantages of AddNet, which requires reduced memory space in a mobile device and consumes less energy as it performs much fewer arithmetic operations during inference.

TABLE IV :
Effect of compressing weights of AddNet and ConvNet by discarding the smallest K% magnitude while keeping the sign information.ConvNet fails to produce reasonable results when the compression rate exceeds 16.1%.The compression rate is estimated by allocating 32 bits to intact weight values and 1 bit for every binarized weight factor.

TABLE V :
Architecture of the convolutional neural network used in classifying the data set of Sec.III-B.where our validation set consists of 35 examples, with the experiment repeated 4 times.Thus, we validated our results over 140 examples.Furthermore, since the number of instances is small compared to the input dimension 50 × 16, we opted to randomly crop data points during training into smaller time series of size 40 × 16.This allows the classifier to be invariant to the exact time where the exposure takes place.Furthermore, it increases the number of data points during training.
. Our classification accuracy results over the testing data are shown in Table VI.The confusion matrix of the results over the validation data set obtained by AddNet is given in Table VII.It can be observed in Table VI that the recognition capabilities of both AddNet and ConvNet are on par with one another.It is worth emphasizing the computational frugality of the scheme as use of the regular dot-product is confined solely to the last layer in AddNet.

TABLE VI :
Recognition rates for the two neural networks over the test data set

TABLE VII :
Confusion matrix for AddNet over the validation data sets for repeated trials

TABLE VIII :
[11]arative accuracy (in %) results of the various models when training on batches 1 and 2 and testing on batches 3-10.Bold-text numbers correspond to the best accuracy results obtained amongst the different algorithms for each batch.Batch ID SVM Classifier Ensemble[11]MLP AddNet-MLP DiscGAN AddNet-DiscGAN Domain adaptation