A New Convolutional Neural Network With Random Forest Method for Hydrogen Sensor Fault Diagnosis

,


I. INTRODUCTION
Traditional energy sources, such as liquefied petroleum gas, natural gas and coal, are non-renewable resources; therefore, it is crucial that new energy sources are found to replace them. Hydrogen energy is a new energy source for sustainable development, against the background of the conventional energy crisis [1], which has been recognized as a zerocarbon energy source. In the 21 st century, progress has been The associate editor coordinating the review of this manuscript and approving it for publication was Aijun Yang . made in many aspects in the field of hydrogen energy. Many advanced countries have formulated plans for the development of hydrogen energy [2], [3].
Hydrogen-a highly reactive molecule-is considered to be a hazardous substance. It is flammable and explosive [4], so it is particularly important to monitor hydrogen leakage for safety purposes. A hydrogen sensor, which is designed to monitor the concentration of hydrogen, is a necessary device for the safe use of hydrogen energy [5]. Once the concentration exceeds the normal range, its alarm will sound immediately. However, due to the influence of environmental VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ factors, hydrogen sensors are prone to failure; once one fails in application, it will lose its hydrogen safety detection function, which may lead to combustion and explosion. Therefore, it is vital to detect and identify faults in hydrogen sensors. Semiconductor gas sensors based on SnO 2 sensitive materials have been widely used in practical applications and are a mature technology [6]. However, in terms of the effectiveness of their long-term use, the output value of the gas sensor is not only related to the concentration of the measured gas but is also affected by environmental factors, such as dust, humidity, temperature and air pressure, as well as degradation in the chemical characteristics of the sensor materials (e.g., heating of the wire or oxidation). All of these factors lead to the parameter drift of a gas sensor and impair its effectiveness. Therefore, fault diagnosis of gas sensors has become an important issue to many researchers [7]- [11].
Generally, fault diagnosis methods for sensors can be divided into four categories: knowledge-based, model-based, data-driven and hybrid/active [12]. As the data-driven method is suitable for the analysis of complex signal systems, it has been widely used in fault diagnosis, and an increasing number of engineers and researchers are implementing this method in their work [13].
The feature extraction process in these methods is unable to generate discriminative features of raw data, as well as consuming a lot of time and energy. Thus, the degree of automation is greatly reduced. The final result can also be impacted by the extraction process [23]. If a system is particularly complex, choosing an appropriate feature function requires considerable expertise and a deep mathematical basis. Thus, expert experience will directly affect the final results.
Recently, deep learning (DL), as an advanced technology, has been able to overcome the above shortcomings. It can automatically learn abstract features of the original data and classify them effectively, avoiding the shortcoming of requiring handcrafted features designed by engineers [24]. Therefore, many DL methods have been gradually introduced into fault diagnosis, such as sparse filtering [25], deep belief networks [26], [27] and sparse autoencoders [28] [32]. An ensemble CNN model is proposed for bearing fault diagnosis by Yang Liu et al [33]. Also, a novel sensor data-driven fault diagnosis method is proposed based on CNN [34]. Compared to traditional ML methods, a CNN has achieved better results, but its application in gas sensor fault diagnosis is still in a developmental stage.
Classification is an essential part of the process of fault diagnosis, therefore the choice of classifier is very important. Ensemble learning, also called multi-classifier system and committee-based learning, uses multiple weak classifiers to form a strong classifier, the classification results are obtained according to the majority voting principle. Such a process makes it perform better for complex data and obtain better classification accuracy and generalization performance. The data of fault diagnosis has the characteristics of large scale, multi-scale and autocorrelation, thus the classifier based on ensemble learning could obtain better performance in fault diagnosis [35], [36].
Random forest (RF) is an ensemble learning method which classifies using a voting model. Compared to other ML methods, RF has the characteristics of low complexity, fast computing speed, high accuracy rate, insensitive to parameters, no need for feature normalization, less over fitting, etc. [37], [38]. Importantly, RF is more robust with respect to noise [37]. Therefore, it is more suitable to use RF when respect to a large number of data with reasonable features, especially under noisy environment. The literature [38], [39] has demonstrated that sensors fault diagnosis based on RF is feasible.
It has been reported that the advantage of CNN's ability in feature extraction combined with the good performance of classification of RF has been adopted for image classification [40], solar photovoltaic array detection [41], internet intrusion detection [42], scene categorization [43], facial expression recognition [44], tree species classification [45], ship identification on satellite image [46]. Recently, a novel bearing fault diagnosis method based on CNN and RF has been proposed, and experimental results indicate that the proposed method achieves high accuracy in bearing fault diagnosis under complex operational conditions and is superior to traditional methods and standard deep learning methods [47]. However, the novel method combined CNN and RF is rarely used for gas sensor fault diagnosis till now.
In this article, a method for hydrogen sensor fault diagnosis using a CNN with RF (CNN-RF) is proposed to automatically capture features of the gas sensor signal and improve upon the performance of conventional methods. The main contributions of the paper are as follows.
1) A method for transforming raw fault data into gray matrix images is proposed to process the sensor fault signal directly, which does not require expert experience. 2) In order to reduce the overfitting phenomenon, the structure of the CNN is optimized by dropout and zero-padding.
3) The sensor signal features captured by the CNN are input into the RF classifier to diagnose the fault mode of a hydrogen sensor. The proposed method is verified by a self-made experimental system. The experimental results show that the accuracy of the CNN-RF method is higher than the accuracy of the CNN alone and other methods. The remainder of this paper is organized as follows: The second section introduces the theoretical fundamentals. In the third section, a novel model based on CNN-RF for hydrogen sensor fault diagnosis is introduced. The fourth section verifies the effectiveness of the proposed method through experiments. The fifth section provides a summary and discussion.

II. THEORETICAL FUNDAMENTALS
Convolutional neural network was proposed in the late 1980s for processing data in the form of multiple arrays [48]. Firstly, in a CNN, each neuron in a feature map is sparsely connected to a small group of neurons in the previous layer, which is different from the connections in an artificial neural network (ANN). A CNN mainly comprises a convolutional layer, a pooling layer and a fully connected (FC) layer. As CNN was inspired by the concept of simple and complex cells in the visual cortex in the brain, it has been widely used in computer vision and image classification [48].

A. CONVOLUTIONAL LAYER
The purpose of convolutional layers is to extract different input features. Each feature map is composed of rectangular neurons. Neurons in the same feature map share weights, which are called the convolutional kernel. Convolutional kernels are usually initialized in the form of random matrices. A convolutional layer is shown in Figure 1. The direct benefit of using shared weights (i.e., convolutional kernels) is to reduce the connection between layers of the network while reducing the risk of overfitting [49], [50].
In a convolutional layer, assume that there are K filters and M is the input. Generally, the output feature maps of the lth layer are calculated as follows [50]: where f and x are the activation function and convolutional operations, respectively; b l j denotes the bias corresponding to jth filter, x l−1 i denotes the ith input map, x l j denotes the j th output map, and k l ij denotes the kernel of the j th filter, which is connected to the i th input map.

B. POOLING LAYER
Pooling layers are a form of downsampling. The function of the pooling layer is that it can merge similar features in a local position to make the detection more reliable [26]. In a pooling layer, assume that there are M input feature maps and M output feature maps. Generally, the output feature maps of the l th layer are calculated as follows [49]: where f is the activation function, down is the sub-sampling function, β l j and b l j are the multiplicative bias and the additive bias corresponding to the j th filter, respectively, x l j is the j th output map and x l−1 j is the j th input map.

C. FULLY CONNECTED LAYER
In a CNN structure, after several convolutional layers and pooling layers, one or more FC layers are connected. The FC layer model is shown in Figure 2. Assuming that the length of the input is M and the total length of the output vector is N , then the output vector of the l th layer is calculated as follows [51]: where f is the activation function, x l−1 i is the j th input value, x l j is the j th output value, b l j represents the bias corresponding to the j th output value and w l ij is the weight of the jth output value, which is connected to the i th input value.
There are many famous CNN models, such as GoogLeNet [52], LeNet-5 [53], AlexNet [54], and Network in Network [55]. In this paper, the classical release of CNN, LeNet-5, which has been applied to handwritten character recognition, is adopted to solve the gray matrix image classification task of fault diagnosis. It has two alternating convolutional and pooling layers with a two-layer FC ANN. VOLUME 8, 2020

III. PROPOSED MODEL FOR HYDROGEN SENSOR FAULT DIAGNOSIS
In this section, a new model is proposed for hydrogen sensor fault diagnosis. Firstly, a method for image conversion from raw data to a gray matrix image is proposed for processing of the sensor signal. Secondly, two strategies-dropout and zero-padding-are used to optimize the CNN structure. Finally, the novel CNN-RF method is proposed.

A. GRAY MATRIX IMAGE CONVERSION METHOD
Data pre-processing of the sensor signal is the first step in gas sensor fault diagnosis; the quality of data processing will directly affect the accuracy of the diagnosis. Traditional gas sensor fault signal processing methods mostly rely on expert experience to extract features from raw data and cannot directly handle raw signals [56]. Extracting features is not only exhausting work but also plays a key role in the results. In this study, a method for transforming the raw gas sensor signal into gray matrix images is proposed. The main aspects of the method are as follows.

Algorithm 1 Algorithm of Gray Matrix Image Conversion
Input: The measured raw data sequence X i of each sensor in the sensor array. The method of gray matrix image conversion is sketched in Algorithm 1. In this method, the raw signal of the gas sensor is first converted into a value between 0 and 1, followed by conversion into a gray matrix image of dimension M × N using uint8 encoding technology, where M is the width of the image and N is the height of the image. The gray matrix image conversion method is shown in Figure 3.
The advantage of this method is that it provides a representation for exploration of the 2-D features of the original sensor signal. It can retain the original features of the data as much as possible and does not depend on expert experience or artificial feature extraction.
In this paper, as each 1-D fault signal sample consisted of 2000 data elements, the gray matrix image size was set to 50 × 40 pixels.

B. DROPOUT
In CNN training, the problem of overfitting is often encountered. Reducing the interaction of feature detectors in a neural network can prevent overfitting, in order to improve its performance [57]. Dropout can be used as a strategy for training deep neural networks where, in each training batch, the overfitting phenomenon can be significantly reduced by ignoring half of the feature detectors; that is, the weights of half of the hidden layer nodes are set to 0 during dropout.
In this study, dropout is used to effectively prevent overfitting. We set the retaining probability to p = 0.5 for dropout and, so, the output of each neuron in every layer was zero with probability 0.5. The dropout neural net model is shown in Figure 4. This strategy can reduce interactions between feature detectors.

C. ZERO-PADDING
In processing image information using a CNN, a majority of the edge pixels in the input image are only operated on by the convolutional kernel once, whereas pixels in the middle of the image will be scanned many times. This reduces the reference degree of boundary information, to a certain extent. On the other hand, after using zero-padding, the new boundary has an effect on a certain part of the actual processing. This problem can be solved, to a degree. At the same time, input images of different sizes can be complemented such that they are same size. Suppose the input size is (H, W), the filter size is (FH, FW), the output size is (OPH, OPW), the padding length is P, and the stride size is S. Then, the output size formula is as follows: Therefore, in this study, zero-padding is used to control the feature dimension.

D. RF CLASSIFIER
The selection of a classifier plays a key role in the classification results. It is difficult for the traditional CNN, which is based on the softmax classifier, to achieve the best generalization capability, due to the local minimum, vanishing gradient, and overfitting problems in the training process.
The random forest algorithm is an ensemble learning method based on a decision tree. Suppose there are N training sets, and each tree randomly selects N training samples from them as a sub-training set. If there are M features, select m (m < M) features, and then select the optimal feature from each split. In this way, each tree can obtain training results according to different sub-training sets, and sampling with return can also ensure the ''integrity'' of the training results. Each tree of the input sample is judged separately and the final classification is determined according to the voting results; that is, the results of several weak classifiers are combined to form a strong classifier. The proposed model makes use of the RF as an initial ''mock test'' in the algorithm, and thus has good application in large data sets and for input samples with high-dimensional features. The RF model is shown in Figure 5. Therefore, in this study, RF is used as a classifier. When only a few parameters need to be adjusted, its robustness against noise disturbance is enhanced, the generalization ability and classification effect of the model are improved and overfitting of the model is reduced.

E. CNN-RF METHOD FOR FAULT DIAGNOSIS
This paper proposes the CNN-RF method, which contains four types of layers in its structure: namely, a convolutional layer, a pooling layer, a fully connected layer and a RF classifier layer. It also uses two strategies-dropout and zeropadding-to prevent overfitting and enhance its performance. The CNN-RF method is used to process gray matrix images for fault diagnosis of hydrogen sensors. The samples are labeled by fault type, according to the sensor signal records. Then, the samples were converted into gray matrix images. Layers with dropout and zero-padding were added to deal with the overfitting problem. The sensor fault type features are captured by the convolutional, pooling, and fully connected layers. Then, the features are input into the RF classifier to obtain the fault diagnosis results, as it has demonstrated good performance in fault mode classification of hydrogen sensors. The overall structure of the proposed model based on the CNN-RF method introduced in this paper is shown in Figure 6.

IV. EXPERIMENT AND VALIDATION OF THE PROPOSED METHOD A. EXPERIMENTAL SETUP
The data used for verification of the CNN-RF method were obtained through our experimental system. A system diagram of hydrogen sensor arrays is shown in Figure 7. The experimental system included a standard hydrogen concentration cylinder, a standard air cylinder, a gas molecular flowmeter, a gas mixer, a two-way regulated power supply, a data collector, a constant temperature and constant humidity box, a computer system, a sensor array chamber, a SnO 2 sensor array and a sensor model (MQ-8). We used a six-sensor, commercially available MQ-8 gas sensor array, screen-printed in our experiments, as shown in Figure 8. The MQ-8 gas sensor cylinder core structure is shown in Figure 9. When the test system works, the standard gas in the cylinder enters the gas molecule flowmeter through the pressurereducing valve, which controls the flow rate of the air molecule flowmeter and the standard hydrogen molecule flowmeter in order to obtain the target hydrogen concentration in the gas mixer. After the hydrogen concentration in the gas mixer is homogenized, it flows into the gas chamber of the sensor array through the pipeline and is loaded onto the sensor array. The sensor array detects the concentration of hydrogen gas after the working and heater voltages are provided by the two-way regulated power supply, respectively. The detection signal of the sensor array is picked up by the data collector and transmitted to the computer system through a 232 data bus interface. The structure of the sensor signal pickup circuit is shown in Figure 10. The program was run on a 2.8 GHz Intel CPU with 8 GB RAM running Windows 10. A photographic image of the experimental setup of the hydrogen sensor array is shown in Figure 11.
According to long-term practical experience and related literature reports [7], [58], [59], six gas sensor fault types were selected: impact fault, stuck fault, heating wire disconnection (HWD) fault, bias fault, exfoliation of sensitive body (ESB) fault and false welding of sensitive body (FWSB) fault. Of them, the structure of HWD fault and ESB fault is shown in Figure 12, which could provide a good understanding of hydrogen sensor's fault structure. Through the experiments, we obtained MQ-8 gas sensor signals of seven modes under normal environment (i.e., the six fault types and without fault), as shown in Figure 13.
However, hydrogen sensors perform under noisy environment is inevitable in real world industrial applications.    Since the noise varies a lot, and we can't get all the labeled training samples under different noisy environment. Accordingly, additive white Gaussian noise is added to the original signals to composite signals [24]. MQ-8 gas sensor signals of seven modes under noisy environment (i.e., the six fault types and without fault) are shown in Figure 14.   The CNN-RF method requires a lot of data for training samples and test samples, and sensor fault signals are difficult to obtain in large quantities; therefore, based on the acquired (normal and fault) signals, data simulation was carried out  to increase the amount of data. The simulation samples were acquired by overlapping a sensor fault signal onto the normal signal acquired by the sensor array.

B. VALIDATION OF CNN-RF METHOD
In this section, the results of CNN-RF training and inference are provided, in order to validate the performance of the proposed model in hydrogen sensor fault diagnosis.

1) CNN-RF TRAINING
As shown in Table 1, 137 sets of experimental samples were obtained for each sensor signal mode, of which 100 were training samples and 37 were test samples. There was no repetition between the training samples and test samples. The seven sensor signal modes under noisy environment were first converted into gray matrix images, the size of each being 50 × 40 pixels (as shown in Figure 15). Then, the images were input into the proposed method for training.
The proposed method was trained for 1000 iterations. The relevant parameters for each layer of CNN-RF are listed in Table 2. The training accuracy reached 100% and the training loss closed to 0 after 300 iterations as is shown in Figure 16. Furthermore, all values remained stable after 300 iterations. Thus, we obtained a well-trained model with 100% accuracy through training of the CNN-RF.

2) CNN-RF INFERENCE
Inference for hydrogen sensor fault diagnosis was realized by the new method, according to the experimental data. To obtain better results, the experiment was repeated six times, and the final diagnosis results of CNN-RF and CNN were compared under normal and noisy environment (in terms of accuracy). The mean accuracy results of CNN-RF and CNN are listed in Table 3. The diagnosis results for the seven hydrogen sensor signal modes under noisy environment using CNN-RF are shown in Figure 17, where the matching degree between the predicted and actual type is 100% in each row.
In order to further verify the advantage of RF as the classifier involved in the method of CNN-RF we adopted, we replaced a RF classifier by a KNN (CNN-KNN), a SVM (CNN-SVM), a BP (CNN-BP) classifiers and compared the mean accuracy of all the methods above under noisy     Table 4.
To evaluate the performance of the CNN-RF method, other traditional methods were selected for a comparison of prediction accuracy. The selected methods were KNN [11], ELM [15], SVM [17], LVQ [18], [19], BP [20], RF [39], and CNN-RF, the last of which (i.e., the proposed model) had higher accuracy than the other methods under noisy environment. The comparison results, in terms of sensor fault diagnosis accuracy, are shown in Table 5.

V. CONCLUSION
In this study, we presented a novel method, CNN-RF, for the fault diagnosis of hydrogen sensors. The proposed method is able to fuse the three major blocks of traditional fault-detection approaches into a single learning bodyfeature extraction, feature selection and classificationwithout requiring expert intervention. The main contributions of this study are the development of a raw sensor signalto-gray matrix images conversion method, which changes the default image size of LeNet-5 from 32 × 32 pixels to 50 × 40 pixels, according to the length of the hydrogen sensor signal data. Through dropout and zero-padding, the structure of a CNN with a RF classifier is optimized, which ensures that the training model structure of the proposed method has better generalization ability and robustness, compared to the traditional CNN method, for the fault diagnosis of hydrogen sensors. The experimental results show that the proposed method can learn features effectively and achieve convincing detection results for a hydrogen sensor with seven modes. The proposed method achieved a prediction accuracy of 100% on the seven modes studied, outperforming the CNN alone and other methods. The proposed method can also be applied to other gas sensor fault diagnosis.
Some limitations of the new method are the following: optimization of the CNN parameters requires debugging on a case-by-case basis, and the effectiveness of the method has only been verified in this experiment. Based on the above limitations, we plan to consider how optimization techniques can be used to adjust the CNN parameters in fault detection for process monitoring and to apply the method to a larger range of sensor fault diagnosis scenarios in future work.