Experimental Study on the Inversion of Coal Concentration in Mine Water by Visible-Near Infrared Spectroscopy

The coal concentration in mine water is the main indicator of mine water discharge. The accurate determination of coal concentration is of great significance for the purification and secondary utilization of mine water. To study the spectral inversion method of coal concentration in mine water, samples with different coal concentrations of 0mg/L-1000mg/L are prepared in this paper, and the ASD Field Spec 4 (350-2500nm) spectrometer is used for spectral collection. It is found that the maximum influence of different coal content on spectral reflectance is 0.9. Based on this, the CK-CNN (C-K-Convolutional Neural Networks) inversion model of coal content in mine water is proposed. This model uses the CARS (Competitive Adapative Reweighted Sampling) algorithm to extract sensitive wave bands and uses CNN (Convolutional Neural Networks) to establish spectral inversion model in sensitive wave bands, K-fold cross-validation is used to optimize the model, the model inversion accuracy is R2 = 0.9994, RMSE=6.1401, RPD=41.9692. In this study, CK-CNN was compared with five models: SPA+RF, CARS+RF, SPA+CNN, All Band +CNN, and CARS+CNN. The results show that the CK-CNN model has the best effect. In addition, the concentration of water coal in Jiaozuo Zhongma Coal Mine is 18.75mg/L, the actual concentration measured in the laboratory is 18.92mg/L, and the inversion error is 0.17mg/L. The inversion results meet the requirements of laboratory measurement in GB11901-1989. The research results show that the hyperspectral remote sensing in the visible-near-infrared band can quickly detect the coal concentration in the mine water. The CK-CNN model provides a new method for the determination of the coal content in the mine water. It has great significance to promote research on the influence of the coal concentration in the mine water on the Vis-NIR spectrum.


I. INTRODUCTION
The coal pollution in the mine water mainly comes from the accumulation and leaching of coal gangue, the wastewater from coal washing, the coal mine water seepage disaster, etc. [1], [2], [3], [4], which is mainly represented by the The associate editor coordinating the review of this manuscript and approving it for publication was Qingli Li . excessive coal content in the water. This kind of mine water used for irrigation of farmland will cause a cumulative formation of ''black soil,'' which will lead to soil hardening, vegetation degradation, crop withering, and yield reduction, etc. [5], [6]. Mine water seeps into groundwater or drains into rivers directly [7]. On the one hand, this will cause waste of water resources and pollution of rivers [5], [8]. On the other hand, the long-term discharge will seriously affect the health of local residents' drinking water, because there are a lot of coal dust, rock powder and bacteria in the mine water [9], [10], [11]. In soil, coal-derived carbon is different from plantderived organic carbon. The origin of coal-derived carbon is a long geological process (more than ten million years). The element composition lacks nitrogen, phosphorus, potassium, and other mineral nutrients needed by plants and soil microorganisms. It has strong stability, which not only makes it extremely difficult for organisms to decompose and utilize but also interferes with the identification of soil organic carbon [12], [13]. At present, for the removal of coal in the mine water, the coagulation sedimentation and filtration method is widely used. A large number of active reagents, flocculants, and other chemicals are added during the treatment process. However, there is no strict control standard for the number of chemical reagents. If the reagents are insufficient, residual pollutants will still exist in the mine water. If the reagents are overused, the reagents themselves will become a pollutant. Therefore, if the coal content in the mine water cannot be accurately measured, secondary pollution will still be there after treatment [5], [14].
With the rapid development of hyperspectral technology, it becomes an important method for water pollution monitoring due to its advantages of low cost and high efficiency make [15]. The research on remote sensing inversion of the concentration of optically active substances such as chlorophyll, heavy metal ions, and soluble organic matter in water is relatively mature [16], and many inversion models have been established for these index parameters. Frau [17] uses electromagnetic sensors operating at microwave frequencies to monitor Pb concentrations in mine water in real-time by generating a PB-specific EM spectrum. Li and Li [18] measured the spectral data of mine water and established a quantitative inversion model for water quality parameters such as pH value, hardness, and the ratio of magnesium ion to alkalinity by using the principal component regression method. Chander et al. [19] used AVIRIS-NG's semianalytical simulation method for hyperspectral reflectance data to model and monitor turbidity, suspended sediment concentration, and chlorophyll concentration in River Ganga of Buxar (Bihar), Howrah (West Bengal), and Chilika lagoon (Odisha). Pahlevan et al. [20] used mixed-density networks (MDNs) combined with hyperspectral data to retrieve the concentration of chlorophyll in water. Wang et al. [21] proposed a semi-supervised regression model and collaborative training algorithm based on a support vector machine to detect the concentration of four representative pollution indicators, namely, permanganate index (CODmn), ammonia nitrogen (NH3-N), chemical oxygen demand (COD) and dissolved oxygen (DO) in Weihe River of Shaanxi Province. Zou et al. [22] proposed an improved broad learning (BL) to realize the nondestructive identification of coal in the near-infrared spectral band. However, in the inversion of mine water quality parameters, the inversion model of coal content in the water has not been studied. To achieve the accurate measurement of coal concentration in mine water, this paper uses a portable ground object spectrometer to measure the Vis-NIR spectral data of mine water, and proposes a CK-CNN model to retrieve the coal content in mine water. The model aims to control the number of chemical reagents resonablly and reduce the impact of secondary pollution. In CK-CNN model, firstly, CARS (Competitive Adapative Reweighted Sampling) is used to extract sensitive wavebands. Secondly, combined with K-fold cross-validation, CNN is used to establish a spectral inversion model of coal concentration in mine water. In our study, after the measurement of spectral data in the mine water simulation experiment, CK-CNN, SPA+RF, CARS+RF, SPA+CNN, All Band+CNN, and CARS+CNN are used to invert the coal content in mine water, and it is found that the proposed CK-CNN model has the best inversion accuracy.

A. SAMPLE COLLECTION
Jiaozuo Coal Mine is an important anthracite production base in China, and it is the earliest modern coal mine in Henan, with a coal output of 15 million tons. The samples used in this research were collected from the Zhongma coal mine of Jiaozuo Coal Industry (Group) Co. LTD, Henan Province, and the types of collected samples are representative.

B. SAMPLE PREPARATION
In this study, to better analyze the spectral characteristics of mine drainage and explore the coal concentration monitoring method, the coal samples from the mining area were ground and dried in laboratory. Yang et al. [23] analyzed the particle size of coal in the mine water of 11 mining areas in Henan Province and found that the particle size of coal below 50µm accounts for about 85%, and the particle size of 50µm-100µm accounts for about 15%. The coal particles with particle sizes below 50µm and 100µm are screened with a sieve, and the proportions are matched according to the proportions shown in Table 1.
The coal particles are mixed evenly, and the concentration is set evenly. The coal is black and has low reflectivity. Based on these characteristics, to reduce the influence of the background on the spectrum, the pure white square container is selected in our experiment as the pure white container has the total reflection property to the spectrum. When there is no coal in the pure white container, it is the reflection spectrum of the container. With the addition of pulverized coal, spectral absorption characteristics are enhanced. In other words, the white container can be used during the measurement about the changing spectral reflectance of different coal content. Additionally, the specification of the experimental container is related to the observation range of the spectrometer, therefore, 62.5cm × 41cm × 37cm containers are selected, about 32 times the observed area of the instrument. The coal-containing sample is configured according to the concentration gradient of the experimental design.

C. SPECTRAL TEST
In the experiment, the spectral resolution of the data measured by the portable ground object spectrometer ASD FieldSpec 4 is 3nm in 350-1000nm and 10nm in 1000-2500nm, and the final resampling resolution of the spectrometer is 1nm. The experimental environment is selected at noon, which is sunny, windless, and cloudless, and the solar height angle is greater than 45 • . The experimental water sample is mixed evenly and the water surface is kept free of large fluctuations. The probe is vertically aligned with the experimental water sample for detection, with a height of 30 cm from the water sample, and the radius of the observation area is about 6.65 cm. Calibration is conducted every 15 minutes with a standard whiteboard. After the spectrum test is completed, the pretreatment is carried out to eliminate the error points. To reduce the influence of the background environment on the spectral curve, all data are smoothed and denoised by the Savitzky Golay filter with 9 window points.

A. SENSITIVE BAND EXTRACTION
Vis-NIR spectral data has a relatively high dimension, and it has large data volume, high measurement complexity, and the strong correlation between bands, etc. These features may result in poor accuracy and reliability, when establish the analysis model in the full spectrum region [24]. Therefore, when using Vis-NIR spectral data to establish an analytical model with strong stability, the sensitive wavebands of the sample characteristics should be selected first. The CK-CNN model uses CARS to extract the sensitive band of the spectrum. CARS method imitates the principle of ''survival of the fittest'' in Darwin's theory, and uses Monte Carlo sampling to select the wavelength variable of PLS (Partial least squares regression) model. Each time, select the variable point with the largest absolute value of the PLS model regression coefficient and remove the point with a smaller weight value. After N times of sampling, the CARS algorithm obtains N groups of candidate characteristic wavelength subsets, and the corresponding root mean square error of cross-validation (RMSECV) value. The wavelength subset corresponding to the minimum value of RMSECV is selected as the characteristic wavelength.
When use sampling variables to establish the model, each variable will be taken out once. Before the taken out of the variable, the prediction and RMSECV of the model, which include the taken out variable, will be obtained. Fig.1 shows the changes in the number of sampling variables, RMSECV, and regression coefficient with the number of samples. Fig.1 (a) shows the change in the number of sampling variables with the number of sampling runs. In Fig.1 (a), from 0 to 25, with the increased number of running times, the number of, sampled variables show rapid decline; from 25 to 50, when the redundant wavelength is removed, the curve gradually flattens. Fig.1 (b) shows the change of RMSECV (root mean square error of cross-validation) with the increased number of running times. The smaller the RMSECV is, the better the band combination is. In other words, in Fig.1 (b), the lowest point of the curve has the optimal band combination.
In Fig.1 (c), each line records the coefficient of each variable at different running times. When the coefficient of a variable change to 0, it means that the variable is being eliminated; When the coefficient of a variable change to 0, RMSECV jumps to a high position, indicating that the variable is a key variable. In other words, without these variables, the performance of the model will be lower. The asterisk mark position is the optimal subset of key variables, corresponding to the minimum value of RMSECV in Fig.1 (b).

B. EXTABLISHMENT OF INVERSION MODEL OF COAL CONCENTRATION IN MINE WATER
After the sensitive wave band is extracted from the CK-CNN model, the convolutional neural networks (CNN) are used to establish the inversion model of coal concentration in mine water, and K-fold cross-validation is used to optimize the model. CNN is one of the typical models of deep learning. It is a feedforward neural network with convolution calculation and depth structure. The basic structure consists of the input layer, convolution layer, activation layer, pooling layer, and full connection layer. Bias is usually added after convolution, and a nonlinear activation function is introduced. After activation of the activation function, the result is obtained, as shown in Formula (1) In formula (2), X is the input matrix of the convolution layer; H k x,y is the output vector of the convolution layer in row x and column y; W k is the weight matrix of the Kth convolution kernel; b k is the bias vector of the Kth convolution kernel; f is the nonlinear activation function. In the convolutional neural network model adopted in this study, the learning rate was set at 0.001, and the model was iterated 2500 times. Table 2 shows the network structure. As the experimental data were one-dimensional, Conv1D was used to realize one-dimensional convolution and 5 one-dimensional convolution layers were established, among which the established output channel was 16 and the convolution kernel was 3. In convolution layer, the output channel is set to 32 and the convolution kernel is set to 3, and two output channels is set to be 64 and the convolution kernel is set to 3. After convolution layer, the pooling size is 3, and the dimensionality of features is reduced to form the final features. Flatten Straightening Layer is added to straighten the data into a one-dimensional array, two fully connected layers are added as well. In case of overfitting of the model, the Dropout layer is added to randomly deactivate 15% of the neurons. The Flatten layer is flattened with parameters, that is, the multidimensional input is one-dimensional, and it is used as a transition before the fully connected layer.
To improve the effectiveness of the model, K-fold crossvalidation is used to optimize the CNN model. K-fold cross-validation uses non-repeated sampling technology. Each sample point has only one chance to be included in the training set or test set during each iteration. K-fold crossvalidation divides the original data into K-groups, extracts one subset as a validation set without repetition, and combines the remaining K-1 subset data as a training set. First, the training set is used to train the model, and then the verification set is used to test the trained model, which is used as the performance index of the evaluation model.
To verify the feasibility of the CK-CNN model, the sensitive bands were selected by SPA(continuous projection method) in this study, and the modeling effects of CARS, SPA, and all bands were compared. In addition, RF (Random Forest) commonly used in water color remote sensing inversion was used to compare the modeling effects of CNN. The number of trees in RF is set to 200, the depth of the trees is set to the default 100, and the estimator in the model is set to 25%.

A. SPECTRAL CHARACTERISTICS ANALYSIS
When light penetrates water, coal may block light to some extent due to the presence of coal particles. According to the mechanism of water color remote sensing, some of the radiation energy, which is solar radiation and touches the water body, goes deep into the water body, and the rest is directly VOLUME 11, 2023 reflected by the water's surface. Water color remote sensing operates by receiving reflected energy. During the propagation of radiation deep into water, absorption and scattering of water molecules and coal in water will consume some radiation. The radiation received by the sensor is divided into three parts, namely where L A is the radiation energy received by the sensor, L W T A is the radiation energy attenuated by the atmosphere, L g is the surface flare, and L p is the atmospheric radiation. The reflectance of water is mainly concentrated in the blue-green band (450nm-520nm, 520nm-600nm), while the reflectance of other bands is low, especially in the nearinfrared band, which is almost completely absorbed by water.
In the near-infrared band range, the absorption peaks of coal are the double and harmonic frequencies of various organic absorption groups, and the number of overlapping and superposition of absorption peaks is large and most of them are weak. At the same time, coal contains a small number of mineral components such as quartz and clay minerals, which also affect the absorption in the near-infrared band [25]. In the visible band, the absorption characteristics of coal are mainly affected by transition metal minerals such as Fe [26], [27]. The above factors make the obvious absorption valley of coal in the Vis-NIR band reflection spectrum less.
The sample spectrum is shown in Fig.2. The coal content has a great impact on the spectral reflectance of water. The shape of the spectral curve of water samples with different coal concentrations is similar. In the visible region, the reflectance is high, and in the near-infrared part, the absorption feature is very strong, and the reflectance is almost 0. Due to the low reflectance of coal, the spectral reflectance gradually decreases with the increase of coal concentration in the band range of 350nm-1100nm. A reflection peak is formed at 500nm-550nm(green), and the reflection characteristics gradually weaken with the increase of coal content. This is because when the concentration of coal is very low, the chlorophyll in the water forms a reflection peak here. With the increase in coal concentration, the characteristics of coal in the murky coal-containing water cover the characteristics of chlorophyll. Therefore, the reflection peak of the green light band gradually weakens with the increase of coal content. In addition, because the water is almost completely absorbed in the near-infrared band, an absorption valley is formed at the critical point of 760nm, and the absorption characteristics gradually weaken with the increase of coal content. After 760nm, the reflectance increases rapidly and reaches the second peak at 810nm, which is mainly caused by the weakening of the absorption of water molecules and the strengthening of the backscattering degree of suspended matter. The absorption characteristics also gradually weaken with the increase of coal content. Then, it rapidly drops from 810nm to almost zero.

B. INVERSION OF COAL CONCENTRATION IN MINE WATER
The CK-CNN model uses 54 sensitive bands, which are extracted by using CARS extraction method and have the strongest correlation with the coal content. Since 10 pieces of data were measured for each sample, there were 370 pieces of data. The proportion of verification data was set as 5% of the total data, and the coal concentration was set not to be repeated. The sensitive bands selected by the CARS sensitive band extraction method, which is included in CK-CNN model, are shown in Figure 3. The analysis shows that coal's spectrum has sensitive bands in visible and near-infrared bands. However, due to the strong absorption characteristics of water in the near-infrared band, the difference in the spectral curve in the near infrared band is not obvious.
To verify the feasibility of the CK-CNN model, the sensitive bands were selected by SPA(continuous projection method) in this study, and the modeling effects of CARS, SPA, and all bands were compared. In addition, RF (Random Forest), commonly used in water color remote sensing inversion, was used to compare the modeling effects of CNN. In this study, six inversion models were compared, namely, SPA+RF, CARS+RF, SPA+CNN, All Band +CNN, CARS+CNN, and CK-CNN. Table 3 shows the parameters of the six models. It can be seen from SPA+RF and CARS+RF that CARS greatly improves the model effect. Comparing SPA+RF and SPA+CNN, it is found that CNN has a better effect than the RF model. Therefore, when CARS are combined with CNN among the three methods, the accuracy of the model is improved significantly. In the process of selecting sensitive bands, CK-CNN proposed in this study not only takes advantage of CARS to extract sensitive bands but also adds a K-fold cross-validation optimization inversion model. As can be seen in Table 3, the effect of the CK-CNN model is superior to the above five models, and the inversion effect is the best, as shown in Fig. 4.
A spectral test was carried out on the mining well water of Zhongma Coal Mine and the CK-CNN model was used to predict. It was measured that the coal content  in the mine water of Jiaozuo Zhongma Coal Mine was 18.75mg/L. The water sample is put into the 105 • oven to dry according to the requirements of laboratory determination in National Standard of the People's Republic of China GB11901-1989 (Water Quality -Determination of Suspended substance -Gravimetric method), taken out and cool to room temperature, repeatedly dry, cool and weighs until the weight difference between the two weighings is less than 0.2mg, the coal content in mine water is 18.92mg/L. The inversion precision is more accurate.

V. CONCLUSION
In this paper, the coal samples from Zhongma Coal Mine of Coking Coal Group are taken as the research object, and the Vis-NIR spectra of water samples with different coal concentrations are tested by the portable ground object spectrometer ASD FieldSpec 4 to study the spectral characteristics of coal concentration in mine water. Based on the CK-CNN content estimation model, the coal concentration in mine water is inverted. It provides a new method for monitoring coal content in mine water and has important significance for promoting the study on the influence of coal concentration in water on the Vis-NIR spectrum. The study concludes that: (1) The spectral reflectance of the water sample is concentrated in the visible light band, and almost 0 in the nearinfrared band. The spectral reflectance decreases with the increase in coal content. A reflection peak and an absorption valley are formed around 500-550nm and 760nm, respectively, and both of them show gradually weaken with the increase of coal concentration.
(2) CK-CNN content estimation model has the best inversion effect compared with five methods: SPA + RF, CARS + RF, SPA + CNN, All Band + CNN, and CARS + CNN. CK-CNN model based on hyperspectral data can be used as a method to predict coal concentration in mine water.
However, the actual mine water not only contains coal but also may contain suspended solids such as sediment and a variety of heavy metal ions, which will also affect the spectral characteristics of mine water. Therefore, there are some limitations in this study, and these factors will be taken into account in future research.
WENWEN HONG is currently pursuing the master's degree with Henan Polytechnic University. Her current research interest includes rock and mineral remote sensing.
TIANZI LI received the bachelor's degree in surveying and mapping, the master's degree in geographic information systems and cartography, and the Ph.D. degree from Northeast University, in 2003, 2006, and 2019, respectively. He is currently an Associate Professor with Henan Polytechnic University. He has presided over or participated in almost ten scientific projects, coauthored more than 30 journal articles, and written three books. His current research interests include hyperspectral remote sensing and photogrammetry.
KAILIN YAN is currently pursuing the master's degree with Henan Polytechnic University. His current research interest includes rock and mineral remote sensing.
YUANHANG LIU is currently pursuing the master's degree with Henan Polytechnic University. His current research interest includes rock and mineral remote sensing.