Accurate Identification Strategy of Coal and Gangue Using Infrared Imaging Technology Combined With Convolutional Neural Network

To effectively separate coal and gangue, accurate classification is an important prerequisite. Here, a new recognition solution for coal and gangue is proposed, in which the convolutional neural network (CNN) is trained to achieve the automatically identifying coal and gangue based on the infrared images without considering the selection of feature extraction and classifier. Firstly, the specific architecture and detailed parameters of the model are optimized and the CNN model based on only one Inception Block contains three different convolution kernels are considered to be the most appropriate model. Next, performance of the proposed identification model is analyzed and evaluated by the infrared image dataset, and we discovered that the CNN model is capable of correctly identifying 192 training samples and 48 test samples. Finally, compared with the traditional recognition model and other CNN recognition model, it is proved that the proposed CNN model has superior recognition performance. The results state clearly that the combination of infrared image and CNN can quickly and accurately identify coal and gangue without complex image processing steps. At the same time, the model has a certain anti-interference ability for different noises. And it has a certain reference value for the research and development of intelligent coal preparation equipment.


I. INTRODUCTION
China is a large coal country with a very rich stock of coal [1]. The basic characteristics of energy resources of ''coal rich, poor oil, and less gas'' determine the important position of coal in primary energy [2]. As a companion of coal mining [3], [4], gangue is harder than coal and its carbon content is low. Gangue is mainly composed of SiO 2 and Al 2 O 3 [5], [6], which contains not only high levels of sulphur but also many heavy metals (such as arsenic, chromium and mercury). When coal is used as the fossil fuel, if it is blended with the gangue, it will reduce the utilization rate of coal, The associate editor coordinating the review of this manuscript and approving it for publication was Amin Zehtabian . and worse, it will cause serious environmental pollution [7]. With these in account, it is very important to separate gangue precisely from raw coal before the comprehensive utilization of coal. At the same time, precisely classify coal and gangue is the key and the prerequisite for achieving precise separation [8].
In coal (gangue) separation technology [9], [10], besides manually selecting the gangue and coal, the automatic separation method can be split into dry selection and wet selection depending on the use of water resources. Wet selection of gangue mainly includes moving sieve jigging and heavy media separation of the gangue and so on. For wet separation of gangue, it has a large footprint, high investment cost, and requires a lot of water resources. Besides, it will produce a significant amount of coal slime pollution, which is complicated to deal with. Over the past few years, there has been rapid development in the research and application of the dry separation technique, mainly using different sensor technology to identify coal and gangue, including dual-energy gamma rays [11], X-ray [12], laser [13], image [14], and other means. It is easy to integrate for the application of ray detection technology to realize coal (gangue) separation, but the bad thing is that there is radiation during use, so radiation insulation is necessary. In the mean time, with the vigorous development and wide application of image processing and pattern recognition technology [15], the use of image recognition technology for gangue separation is considered to have a broad application prospect [16]. However, this method presents certain problems, environmental factors (such as light, dust, etc.) will affect the results of identification and separation for coal and gangue.
Infrared imaging technology [17]- [19], as a common technology for image acquisition, is less affected by light and environment, and has many applications in the fields of security monitoring, environmental monitoring, and medical detection [20]. Considering the bad separation environment of coal and gangue, infrared imaging technology is proposed to identify coal and gangue. Recently, deep learning [21], [22], especially convolution neural network (CNN) [23], [24], has become a hot topic in current research and has been widely used in face recognition [25], license plate recognition [26], spectral recognition [27] and other fields. In particular, CNN also has many applications in the processing and recognition of infrared images. Kuang et al. [28] offered a method of deep learning for the removal of optical noise from a single infrared image, and the optical noise can be eliminated by using the full CNN. Introducing a new method based entirely on the extreme learning machine (ELM) to learn the useful features of CNN, Khellal et al. [29] realized the rapid and precise classification of images, which is appropriate for the infrared recognition system and is verified on the VAIS dataset.
Taking these factors into account, a new recognition strategy for coal and gangue is proposed, in which the CNN is trained to achieve the automatically identifying coal and gangue on the basis of their infrared images without considering the selection of feature extraction and classifier. Firstly, we introduce the experimental instrumentations and materials, and then briefly explain the structure and training of CNN used in this paper. Next, we focus on the analysis of the experimental results. On the basis of comparing the performance of the CNN models of the two basic structures, the structure with better performance is selected, its parameters are optimized, and the CNN model most suitable for identifying coal and gangue is obtained. In particular, we also show the structure and training process of the CNN model. For the purpose of verifying the reliability of the CNN model, we also compare the CNN model with the traditional strategy of image feature extraction [30] combined with classifier, and analyze the robustness of CNN model to different noises.
Lastly, we present the summary and the perspectives of this study.

A. INSTRUMENTATION
For the purpose of collecting infrared images of coal and gangue, we first built an infrared imaging system. Figure 1 provides the experimental configuration diagram for an infrared imaging system, it is comprised principally of an infrared light source, an infrared camera, and a computer system. The light source used in the experiment is an LED array (Guangzhou Tianjian Electronics Co., Ltd., Guangzhou, China) consisting of six infrared diodes with a peak wavelength of 940 nm. The infrared images of coal and gangue are captured via an infrared camera (S908; Shenzhen Linbaishi Technology Co., Ltd., Shenzhen, China) equipped with an OV2710, which is a true full HD (1080p) CMOS image sensor designed specifically for digital video camcorders, PC webcam other applications. More specifically, we installed a 940 nm narrowband filter in front of the CMOS to capture only infrared images near the 940 nm band. In this way, we can effectively avoid the influence of other light on the images of coal and gangue.

B. MATERIALS AND SAMPLES
Because Huainan is one of the main coal-producing regions in China, we choose coal and gangue in Huainan as the subject of research. Experimental materials, coal and gangue used to collect infrared images, were gathered from the Huainan mining region of Anhui Province on January 8, 2019. 120 pieces of coal and gangue with similar size and shape were selected for infrared image acquisition. All samples were detected and analyzed under the same conditions to ensure more realistic and reliable experimental data were available. In particular, for the infrared images of coal and gangue, we randomly select 96 samples from 120 samples as training samples and the rest as test samples. In order to ensure the effectiveness of the recognition model of coal and gangue, we adopt 5-fold cross validation in the process of model training, that is, the data set is divided into 5 parts, 4 of which are trained and 1 is verified in turn, and the mean value of the results of 5 times is used as the estimation of the accuracy of the recognition model.

C. CNN FOR INFRARED IMAGE CLASSIFICATION
CNN, as an important element of deep learning and a focal point of research, usually includes the convolutional layer, activation layer, and pooling layer. CNN such as LeNet [31] and AlexNet [32] has a good performance in image recognition. For the purpose of improving the performance of the neural network, two new ideas emerged: one is to introduce the inception unit [33], and the other is to introduce the residual unit [34]. As a result, two typical CNN structures were designed as shown in Figure 2. In structure A, the adaptability of the network is enhanced mainly by using multiple convolution kernels of different sizes (using Inception Blocks). The problem of gradient vanishing and performance degradation with the increase of network depth is solved by calculating the residual (using Res Blocks) in structure B. Whether in structure A or structure B, a deeper CNN model can be easily established by increasing the number of Inception Blocks or Res Blocks. Moreover, to determine the best structure, the performance of the different CNN models is compared in Section III.B.
The main components of CNN are detailed below: The original size of infrared images of coal and gangue is 1920 × 1080. Considering the real-time performance of the recognition model of coal and gangue, we scale the original infrared image information to 10% of the original size, that is, the infrared image information with the size of 192 × 108 is used for the recognition of coal and gangue. So the input information regarding the classification model of coal and gangue is the infrared image, which has a size of 192 × 108 × 3.
In the Inception Block or the Res Block, features are extracted by convolution layer and each layer applies various numbers (initial default is 16) of convolution kernels of the same size (usually 3 × 3).
Batch normalization (BN) [35], as a means to improve the reliability and stability of the model, which may inhibit gradient vanishing and over-fitting.
Max pooling, as one of the most widely used strategies for pooling, can decrease the number of dimensions in feature maps and network settings. More specifically, the two frameworks designed in this paper use max pooling with a size of 3 × 3.
The upper layers of the CNN are fully connected through a dense layer that has the same number of outlets as the class of the samples. For the two structures shown in Fig. 2, only the Dense layer uses softmax as the activation function, and the remaining activation functions are Rectified linear units (ReLU). The softmax activation produces values between [0,1] as confidence scores in the classification. Loss of classification is calculated by comparing the confidence ratings with the actual labels of the samples. The softmax function and loss function are expressed as below, where z indicates the input of the softmax layer, m indicates a sample, n indicates a class and K indicates the total number of classifications.

D. TRAINING OF CNN MODEL
Two typical CNN models were first constructed on Keras (v 2.2.4), with a TensorFlow background (v 1.10.0). The templates were developed on a computer that has a Core I7-9700K processor and a GeForce RTX 2070 graphic processor. The CNN training was carried out using the SGD algorithm and the Adam algorithm. Firstly, the performance of two CNNs with different optimizers is compared by using the default parameters of Keras. Subsequently, the parameters of the preferred optimizer are further adjusted. For the optimization, the batch size was set to 128 and the training epoch was set to 1000.

A. INFRARED IMAGE OF SAMPLES
The infrared images of coal and gangue are collected through the infrared imaging system shown in Figure 1, with the original size of 1920 × 1080, and several infrared images of samples are displayed in Figure 3. Some coal and gangue can be easily distinguished by infrared images, but some are not easily distinguishable. Consequently, there is a need to classify coal and gangue using image processing and pattern recognition methods.

B. COMPARISON BETWEEN THE TWO CNN STRUCTURES
For testing the two structures mentioned in Section II.C, the identification model with a single Inception Block or a single Res Block is built on the basis of Fig. 2, and the performance and representation of different CNN identification models of the infrared images of coal and gangue was compared. By changing the optimizers (SGD, Adam, Adamax, and Nadam), the evaluation indexes (accuracy and loss) of the test set under two structures were recorded under 3 tests, respectively. The mean accuracy and loss of different CNN models were computed and displayed, as indicated in Table 1. We note that the CNN model, whether structural A or structural B, has a good effect on identifying coal and gangue. When using the A structure to build a CNN model, the optimizer selects Adam to achieve the highest recognition rate, that is, 100.00%. And when using structure B to build a CNN model, the optimizer selects Nadam to achieve the highest recognition rate, that is, 99.31%. This indicates that Structure A is better suited to identifying coal and gangue through infrared images. Based on Table 1, we determine that the CNN of structure A is utilized in the establishment of the CNN recognition model for coal and gangue by using the infrared images, and Adam is selected as the optimizer.

C. OPTIMIZATION OF THE CNN ARCHITECTURE
With the results of in Section III.B, we know that a CNN model with only one Inception Block can achieve an identification rate of 100.00%, so there is no need for increasing the depth of the network. To simplify the structure of the neural network, we gradually decrease the number of convolution kernels in the convolutional layer, and the accuracy of the test set is shown in Figure 4. We can see that the best recognition effect can be attained when the number of kernels is 10, 12, 14 and 16.
Next, for obtaining the model with the best performance and the simplest structure, the parameters, such as the loss, training time and trainable parameters, were taken into account in the choice of the most suitable identification model. Table 2 illustrates the average results from the three trials. To begin with, we made a comparison of the loss and noted that the loss of the CNN model with 14 convolution kernels can reach a minimum of 0.0007. Then, by comparing the training time, we observed that the training time that uses 10 convolution kernels is shorter and steadier. Then we found that the number of the trainable parameters that apply to the identification model using 10 convolution kernels is almost two-thirds that of the CNN model with 14 kernels. In addition, as the number of convolution kernels goes fewer, the trainable parameters and the training time are gradually decreasing. Considering these indicators, the CNN identification model with 10 kernels in all convolutional layers is more appropriate for classifying coal and gangue by using infrared images.

D. PERFORMANCE OF THE DEVELOPED CNN IDENTIFICATION MODEL
According to the above results, we get the CNN structure for establishing the identification model for coal and gangue by using infrared images. Further details about the CNN model are given in Figure 5.
The perfect architecture is a one Inception Block CNN model: (1) The Inception Block contains three different convolution kernels with the size of 1 × 1, 3 × 3, and 5 × 5; (2) All convolutional layers have 10 convolution kernels. In order to more intuitively observe the types and parameters of each layer, we have established Table 3 to show all layers and all parameters used in the CNN model. In Table 3, we can clearly see the type of layers used in each layer of the CNN model, the connection relationship between layers, and the output size of each layer. In addition, we can also see the specific parameters of each layer, such as kernel sizes, kernel numbers, activation function and pool size.
The accuracy and loss in a variety of epochs of the proposed CNN model are set out in Figure 6. The results indicate that accuracy increases and the loss decrease with the increase of epochs. Following 200 iterative learning epochs, the accuracy and loss gradually become stable when applied to identifying coal and gangue. In this way, the established model achieves an accuracy of 100.00% in both training set and test set, which means that the CNN recognition model is capable of correctly identifying 192 training samples and 48 test samples. That is to say, the proposed CNN recognition model is achievable and efficient for identifying coal and gangue through infrared images.

E. COMPARED WITH THE TRADITIONAL CLASSIFICATION MODEL
As is well known, traditional image recognition usually includes two parts: feature extraction and classifier. For feature extraction, the three most commonly used features are local binary pattern (LBP) features, histogram of oriented gradient (HOG), and Haar features. These three features describe three types of local information and have different performances in different fields. As a traditional classification model, support vector machine (SVM) has an enormous application range and a good performance in the field of image recognition. Grid search (GS), genetic algorithm (GA), and particle swarm optimization (PSO) are the common parameter optimization methods of SVM. Hence, the performance of various feature extraction methods (LBP, HOG, and Haar) and SVM classifier (GS-SVM, GA-SVM, and PSO-SVM) combinations are mainly compared here. It is worth noting that all the image features here are processed in the following two steps before being fed into the SVM: normalized processing and principal component analysis (PCA) dimension reduction (with the cumulative contribution rate of 95%).
For the purpose of ensuring the reliability of the experimental results, we conducted three trials and calculated the average value, and the results are shown in Fig. 7. The figure displays the average accuracy of the test set under various recognition models. When the infrared images of coal and gangue are classified by using the LBP feature, combined with GS-SVM, the average recognition ratio for the test samples can reach a maximum of 88.19%. What stands out in the figure is the classification of coal and gangue by using Haar features, and the three classifiers have the same accuracy, which can reach 97.92%. The infrared images of coal and gangue are classified by the CNN model, and the accuracy of the test set reaches an identification rate of 100.00%. In comparison to SVM, the proposed CNN model has superior performance in identifying coal and gangue from infrared images. At the same time, it can also be seen that feature extraction methods of infrared images have a certain influence on classification performance. In other words, the traditional image recognition method needs to choose the extracted features reasonably in order to achieve a good classification result. All in all, the CNN model has better performance and it can achieve the automatically identifying coal and gangue on the basis of their infrared images without considering the selection of feature extraction and classifier.
To more intuitively analyse the performance of different methods, we use the confusion matrix to count the recognition results of coal and gangue of different methods under three different experiments, as shown in Table 4. It should be noted that the statistics in the table are the summary of the results of three different experiments using 5-fold cross validation. By observing this confusion matrix, we can more clearly understand the identification differences of coal and gangue by different methods. For example, when GS-SVM is used as the classifier, 22 coal samples are mistakenly identified as gangue, whereas 43 gangue samples are mistakenly identified as coal. When GA-SVM is used as the classifier, 25 coal samples are misidentified as gangue, while 30 gangue samples are misidentified as coal. On balance, the number of coal samples wrongly identified as gangue is lower than the number of gangue samples wrongly identified as coal samples.

F. COMPARED WITH THE STATE-OF-THE-ART CNN CLASSIFICATION MODEL
In order to further verify the effectiveness of the recognition model of coal and gangue proposed in this paper, we compare it with LeNet5, AlexNet, VGG, ResNet, DenseNet and other mainstream CNN model. Considering that the input size of other CNN models is inconsistent with the infrared image size collected in this paper, we modify the input of other CNN models to 108 × 192 × 3. During the experiment, we counted the accuracy, training time and model parameters of different CNN models, as shown in Table 5. Firstly, we pay attention to the accuracy of the model. We can find that the overall performance of using CNN for the recognition of coal and gangue with infrared images is good. Only VGG-16, GoogleNet V3 and ResNet 152 have poor recognition accuracy, and the recognition rate of other models can be maintained at more than 95.00%. In particular, our proposed model can achieve 100.00% recognition accuracy. Secondly, when we observe the training time, we find that the training time of different models is very different.  The training time of our CNN model is the least (only 87.63 s), while the training time of ResNet50 model is 920.84s, and the training time of DenseNet model increases very much, reaching 5751.50s. In general, our proposed model has certain advantages in recognition accuracy and training time. In addition, we also found an interesting phenomenon. When we compared ResNet 50, ResNet 101 and ResNet 152, we found that with the deepening of the model depth, the training time increased, but the accuracy showed a downward trend. It is worth noting that we found an unbelievable phenomenon that the performance of GoogleNet-V3 and ResNet 152 is very poor. This may be because the model is too complex, and our data set can not fully meet the training of the model.

G. ROBUSTNESS TEST OF THE IDENTIFICATION MODEL
The selection environment of coal and gangue is usually quite harsh, and there is dust around it, which will cause certain interference to the infrared image data acquisition process. There are inevitably some noise signals in the infrared image data collection process of coal and gangue. In order to verify the robustness of the identification model proposed in this paper, the environmental disturbance in the coal preparation process is approximated, and the robustness of the identification model proposed for coal and gangue is tested. The following two sets of anti-interference experiments are carried out: One is to add different image noise signals to the training set and test set samples at the same time; the other is to only add different image noise signals to the test set samples. Gaussian noise, Poisson noise and Salt and pepper   noise were selected as image noise signals. The results of the anti-interference experiments are presented in Table 6.
For the first set of anti-interference experiments (simulating the same environmental disturbance), it is easy to observed that after adding different noise signals to the original infrared image of all samples, the classification accuracy of the test set decreases to different degrees, and the classification accuracy of the CNN model can be maintained above 89.00%. For the second group of anti-interference experiments (simulating the emergence of new environmental disturbances), it can be found that when only adding different image noise signals to the test set samples, the recognition rate decreases significantly. At this time, the recognition rate of the CNN recognition model can maintain above 83.00%. Interestingly, when adding Poisson noise, the CNN identification model maintained a very good recognition effect (97.92%) under both anti-interference tests, that is, the recognition model has a very good anti-interference ability for Poisson noise. In summary, using infrared imaging technology combined with the CNN model for identifying coal and gangue, the identification model has a certain anti-interference ability when the same or different environmental disturbances occur.

IV. CONCLUSION
Here we have proposed a CNN model for classifying coal and gangue on the basis of infrared images which not only performs exceptionally well but also evades the challenge of choosing the appropriate image feature. At the beginning, we design two CNN models with typical structures. On the basis of determining the better structure, we optimize the parameters of the model, and finally get the CNN model which is most suitable for the recognition of coal and gangue with the infrared image. In addition, for the purpose of verifying the performance of the proposed CNN model, we also compare it with the traditional classification model and other CNN classification models.
The results show that CNN model based on only one Inception Block contains three different convolution kernels (with the size of 1×1, 3×3, and 5×5) are considered to be the most appropriate model. The CNN model proposed in this paper has an excellent performance in solving the identification problem of coal and gangue with infrared imaging. Compared with the traditional classification model, we found that the CNN model is more accurate, and utilizing CNN to analyze the infrared images does not need to consider the selection of extracting features (such as LBP, HOG, and Haar). In comparison with other CNN models, our proposed model offers some advantages in terms of recognition accuracy and training time. Furthermore, the proposed CNN model has some antiinterference capability for various noises. This is significant because accurate identification of coal and gangue is an important prerequisite for intelligent separation of coal gangue, which is helpful to promote the intelligent process of coal industry. The new idea for identifying coal and gangue developed in this paper may have a certain reference value for the research and development of intelligent coal preparation equipment.
This study focuses on identifying coal and gangue from infrared images by using the CNN with only one Inception Block contains three convolution kernels of different sizes. It is worth extending this investigation to the identification with the help of a more straightforward structure (fewer types and numbers of convolution kernels) while planning to design a more general CNN identification model that can be used to identify more different substances.

CONFLICT OF INTEREST
The authors declare that there are no conflicts of interest regarding the publication of this paper. VOLUME 10, 2022