Study on Deep Learning Model for Online Estimation of Chlorophyll Content Based on Near Ground Multispectral Feature Bands

Chlorophyll content in plant leaves is an essential indicator of the crop growth status. This study focuses on nondestructive estimation of the chlorophyll content of maize using near ground multispectral data. We propose a one-dimensional convolutional neural network-gated recurrent unit (1-D-CNN-GRU). That is, it combines a 1D-CNN with strong feature expression capacity and strong memory capacity with a gated recurrent unit (GRU) neural network to estimate the chlorophyll content of maize directly from multispectral images. Furthermore, the iteratively retaining informative variables-successive projections algorithm (IRIV-SPA) is first used to select the feature wavebands from the 11 available wavebands of the two datasets in the experiment. The experimental results show that the selected feature wavebands are more accurate than the raw wavebands when using the same model; based on these feature wavebands, the 1D-CNN-GRU model has smaller errors than the other conventional models such as support vector regression (SVR) and random forest (RF), with an mean relative error (MRE) of 0.069, root mean square error (RMSE) of 3.473 on Datasets I, and an MRE of 0.108, RMSE of 7.568 on Datasets II. The real-time performance is also validated in the experiment. These investigations can provide valuable guidelines for online monitoring of chlorophyll content in maize based on near earth multispectral band data, and are also important references for the development of intelligent agricultural monitoring systems for general crops, which were tested on maize only and provided reliable results in this study.


I. INTRODUCTION
In the last several decades, the demand for food has risen with an increase in the global population. Therefore, countries around the world have made great efforts to develop digital agriculture to increase crop yields, and improve the ecological environment [1]. Among them, near ground multispectral analysis, as a nondestructive method for detecting crop growth, has become imcreasing popular because it is easily integrated into agricultural network system [2], [3], [4]. In contrast to high resolution remote sensing image analysis, it has three advantages: i) high spatial and temporal The associate editor coordinating the review of this manuscript and approving it for publication was Nurul I. Sarkar . resolution, ii) low cost, and iii) easy integration into agricultural network systems. In this study, we aimed to determine the chlorophyll content of maize leaves because chlorophyll is the most crucial pigment in maize plant photosynthesis, reflecting the intensity, nutritional quality, and physiological function of crop photosynthesis. Therefore, chlorophyll content can be used to monitor and evaluate the growth status of maize [1]. At the same time, another reason why we chose maize as our research target in this paper is that maize, as one of the most important staple foods at home and abroad, also plays an important role in industry and animal husbandry [5], and attracts not only the attention of farmers, but also to some investors. We hope to construct an intelligent prediction model of maize chlorophyll content based special VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ multispectral bands in this study, and then integrate it into the agricultural Internet of things (IoT) monitoring system to monitor the growth condition of field crops nondestructively in the near future. In fact, some related IoT monitoring systems have been designed by many researchers; for instance, Andrianto et al. [6] successfully developed and integrated a service system platform and chlorophyll meter that is based on IoT, and they believe that their research will be further developed using multispectral image data. Wuqian et al. [7] also designed crop monitoring systems that can nondestructively monitor the growth conditions of field crops. At present, there are two classical methods to measure chlorophyll content: chemical analysis (such as spectrophotometry) and chlorophyll-meters (such as SPAD-502). The spectrophotometry is a traditional high precision experimental method for measuring crop chlorophyll content, but it also has some disadvantages, such as time consumption, high cost, and complex operation. Conventional chlorophyll-meters, such as SPAD-502, are simpler and faster than chemical analysis. Dong et al. [8] assessed different portable chlorophyll meters and found that the coefficient of determination (R2) for SPAD-502 was 0.90, and the RMSE was 3.68 for the chlorophyll of maize, showing that the readings of SPAD-502 are better at restricting interference from other factors. Nevertheless, it also exhibits some drawbacks: (i) contact between the samples and device, (ii) limited sampling points on the leaf surface [9].
On the other hand, based on high resolution remote sensing image data, a non-contact method for estimating growth status parameters such as chlorophyll content can help farmers manage their fields conveniently. In particular, Singhal et al. [10] attempted to estimate the leaf chlorophyll concentration of standing maize plants from high resolution (5 cm) multispectral Unmanned Aerial Vehicle (UAV) images (350-2500 nm), and found that Kernel-Ridge regression was the most robust method for developing a chlorophyll estimation model with minimal RMSE (0.057 mg/gm) and regression coefficient of determination (R2 = 0.904). Guo et al. [1] also found that the vegetation index from images acquired from a flight altitude of 50m was better for estimating leaf chlorophyll content using the DJI UAV platform with this specific camera (5472 × 3648 pixels). However, remote sensing images still have the problems of high acquisition cost and low resolution. Therefore, near ground multispectral images (visible and near-infrared) have been explored as a tool to estimate leaf chlorophyll concentration to deliver time critical information for farming management [8]. Practically, many researchers have put great effort into the construction of estimation models. For instance, Cavallo et al. [12] proposed a detection method for the chlorophyll content of fresh-cut rocket leaves by processing RGB images as a nondestructive detection method, and, combined the random forest regression, and its result with a coefficient of determination R2 of 0.90, far exceeded 0.79, which was obtained by SPAD-502. Lv and Yan [13] used random forests to construct a hyperspectral (400-2500 nm) estimation model for crop chlorophyll content. The result, with a coefficient of determination of 0.9317 and a mean square error (MSE) of 74.2569, indicated that this model based on random forest and field imaging spectra could accurately estimate the chlorophyll content of soybean leaves. Liu et al. [14] developed several quantitative models to estimate pigment concentration in the jujube canopy using hyperspectral data (450-2300 nm), and among them, the support vector regression (SVR) model for chlorophyll and carotenoid had higher prediction accuracy than the Partial Least Squares Regression (PLSR). Later, to reduce costs and simplify the model, many researchers chose to find sensitive wavelengths. Sun et al. [15] used hyperspectral reflectance (900-1700 nm) to estimate the water content of maize leaves. The Partial Least Squares (PLS) model with competitive adaptive reweighted sampling (CARS), combining random frog, had the best performance and extracted 23 feature wavelengths. Xia et al. [16] classified protected tomato plants using cloud-computing technology based on three spectral datasets. They used successive projections algorithm (SPA) to select data from six wavebands (483,557,674,783,869, and 964nm) as feature wavebands with good performance. Tang et al. [17] proposed a low cost method to detect natural rubber leaves' nitrogen content using CB-SPA (correlation-based successive projections algorithm) with better efficiency, and the explanatory variables selected by CB-SPA were: 1090 nm, 1108 nm, 2193 nm, 2112 nm, 2072 nm,1921 nm. Li et al. [18] concluded that reflectance at 550 nm may characterize the amount of chlorophyll, motivating us to construct an intelligent prediction model based on near ground multispectral bands data for agricultural IoT monitoring systems.
Thus far, the available estimation models have limited the use of historical data and are not as effective as they could be in terms of cost control and integration with IoT in agriculture. There is still a need for practical applications, such as in our work in this paper. In this study, we propose an intelligent algorithm for chlorophyll content detection in maize, in which the feature wavelengths are selected and then, integrated into the 1-D-CNN-GRU model to estimate the chlorophyll content of maize leaves, but not limited to this crop in this study. Compared with traditional estimation models [13], our model can extract more features by utilizing a 1D-CNN module and make full use of historical data by using GRU memory characteristics to obtain more stable and accurate estimation results.

A. COLLECTION OF MULTISPECTRAL DATA AND MEASUREMENT OF CHLOROPHYLL CONTENT
Two real datasets (datasets I and II) were obtained to validate our method, corresponding to two maize varieties, Mitiannuo-4 and Qingyu-11. The two datasets were collected in a trial field (111.697 • W, 40.817 • N) at Inner Mongolia University. The multispectral images were acquired using a multispectral camera-SpectroCam produced by Ocean Thin Films (OTF), which can deliver eight images corresponding to eight spectral bands at a rate of up to 25 frames per second and a resolution of 1408 × 1044 per image. The chlorophyll content of the corresponding maize leaves was measured using SPAD-502, covering the entire growth period. To reduce error, every point was measured three times, and the results were averaged in our experiments. Although the relative amounts of chlorophyll content from SPAD-502 were measured by the light transmittance coefficient against the crop plant leaf at two wavelengths,650 nm and 940 nm, the SPAD-502 value is still a reference standard in this domain [19], [20]. In particular, the readings of SPAD-502 can reveal the change trend of chlorophyll content during the entire crop growth period [1].
The sample sizes of the two datasets were 297, 1145 respectively. Considering the varying characteristics of chlorophyll content [21], we collected Datasets I every two days from May 1, 2018, to August 31, 2018, and it used the eight spectral bands between 400 and 775 nm (including 400 nm, 475 nm, 550 nm, 575 nm, 627 nm, 675 nm, 700 nm, and 775 nm, as shown in Fig.1), as well as the corresponding chlorophyll contents. Maize samples were marked in different locations in advance, and they were approximately 50 cm away from each other. This planting method met the requirements of local field management. We then extracted the gray values of the leaf images of maize samples at different growth stages using a software tool, which was integrated into the multispectral camera system. Thus, 33 candidate points per image were chosen to cover the entire maize leaf as far as possible, and 297 candidate points were taken for raw data collection. Compared to Dataset I, the maize samples collected in Dataset II have a longer growth cycle; therefore, more data in Dataset II were collected during the growth period, which were acquired from June 1, 2014, to October 31, 2014, using the eight wavelengths between 425 and 850 nm (425 nm, 475 nm, 550 nm, 575 nm, 615 nm, 675 nm, 775 nm, and 850 nm, as shown in Fig.2), as well as the corresponding chlorophyll contents. After collecting the completely expanded leaves from the well-growing maize, the samples were measured using SPAD-502, cut off, and quickly brought back to a laboratory (near the maize trial field) to capture images of the maize leaves. A total of 1145 samples of candidate points were obtained from Dataset II. To find another group of wavebands to accomplish online detection of the chlorophyll content of maize, we explored the wavebands included in the two datasets (not including 650 nm and 940 nm), instead of using the conventional portable device of SPAD-502 or complex chemical detection methods in the laboratory. The new waveband combination would extend the multispectral range (including the ultraviolet and near-infrared ranges). as the input for the hybrid light deep learning model designed in this study. Meanwhile, we separately compared the RGB bands (675 nm, 550 nm, and 475 nm) and RGB-NIR(675 nm, 550 nm, 475 nm, and 775 nm)with the new waveband combination on both Dataset I and Dataset II based on the different models to validate our idea.
Figs.1 and 2 show that the spectral images chosen in our experiments are clearly visible, providing sufficient image grayscale information that can reflect the maize growth process [22]. Furthermore, this provides an opportunity to VOLUME 10, 2022 approximate the accuracy of the SPAD-502 device; therefore, we used the selected spectral bands online to replace this device in an agricultural IOT system to monitor the growth of maize or other crops.

B. SENSITIVE BANDS EXTRACTION
At present, there are many commonly used methods for selecting the feature bands which are sensitive to crops. Among them, the successive projections algorithm (SPA) is a wavelength selection algorithm which can minimize the covariance between variables, resulting in redundancy reduction [23]. SPA can select a subset of Multiple colinear minimum with high accuracy by projection operation [24]. Our goal is to find the sensitive spectral bands with rich spectral information related to the maize chlorophyll contents.
However, the SPA only considers projections in the spectral matrix to minimize covariance, and the variable with the largest projection length is not necessary, as expected in the experiments. In addition, the subset selected by SPA may contain some uninformative variables or even interfering variables. Meanwhile, iteratively retaining informative variables (IRIV) can remove these confounding variables ahead of time, and can compensate well for the shortcomings of SPA [25]. Therefore, the IRIV-SPA was proposed by Cheng and Chen [25] to retain informative variables, outperforming the SPA or IRIV algorithm in terms of accuracy. Therefore, we can employ the IRIV-SPA algorithm for spectral variable classification to remove irrelevant and interfering variables when analyzing multispectral data. As a clustering solution, it makes good use of the mean root mean square of error cross-validation (RMSECV) and then combines the p-value of the Mann-Whitney U-test [26] to classify the spectral variables into four categories: strong informative variables, weak informative variables, uninformative variables, and interfering variables. Thus, irrelevant and interfering variables can be removed so that informative variables can be retained. Subsequent experimental results validated our idea.

C. 1D CNN-GRU MODEL
It is well known that the convolutional network (CNN) can automatically extract essential features from raw high dimensional data, and its weights sharing and sparse connections are the two highlighting advantages because the weights sharing can avoid overfitting and the sparse connections can improve the model's efficiency compared with the conventional neural network algorithms [27]. However, the CNN process is often time-consuming and unsuitable for time series data. In contrast, 1D-CNN not only has the advantages of CNN but is also suitable for time-series data. Therefore, we first adopt 1D-CNN to extract sensitive bands from the spectral data in the experiments.
Assume X = [x 1 , x 2 , . . . . . . , x n ], x i ∈ R m×n , where n is the number of training samples; x i is the i th sample in m dimensions. The convolution layer with multiple filters can be able to convolve the raw input data to generate the corresponding local features, as shown in Eq. (1), where k denotes the number of convolutional kernels, w j denotes the parameters of the j th convolutional kernel, f is a nonlinear activation function, and b denotes bias. The number of channels of the convolution kernel is the same as that of the input data, and the output of the corresponding point can be obtained by dot product of w j and x i . The sliding window enables to obtain the output feature map.
Additionally, Recurrent Neural Network (RNN) is suitable for processing time-series data. Although its hidden layers can preserve information from the previous moment, and the output can be determined by the current input, it has been pointed out that RNN have difficulty solving the problem of long term dependence, not only due to the variation in gradient magnitudes, but also because the effects of long term dependencies are hidden by the effect of short term dependencies [28]. Therefore, RNNs inevitably have ''gradient explosion'' and ''gradient dispersion'' problems during back propagation [29]. Later, GRU was proposed to solve the above problems based on Long Short Term Memory (LSTM) neural network. It can store historical information from time series data; meanwhile, GRU can be updated by partially forgetting the existing memory and adding a new memory content through setting two gates (update gate and reset gate). For detecting important features of input sequences over long distance, it can be able to memorize them easily, thereby capturing potential dependencies over long distance. At the same time, GRU has fewer training parameters than LSTM. Therefore, GRU is faster than LSTM [30]. That's exactly why we chose it in this paper. The structure of the GRU is illustrated in Fig. 3. where the parameter h t represents the output of the current layer, and the parameter h t−1 represents the output of the previous hidden unit, as input of the current layer, as shown in Eq. (2-3). Parameter u is the input of the update gate (u), which is responsible for retaining the previous information to the current state. The closer to 1 it is, the less previous information is easily retained, as shown in Eq. (4); the parameter r is input of the reset gate (r), which is responsible for determining whether the current state is combined with the previous information. The smaller the value, the more easily the previous information is ignored, as shown in Eq. (5). In Eq. (3), parameter b y represents the bias vectors of the candidate activation; w y corresponds to the training weight matrix; in Eq. (4-5), the parameters b u , b r , and σ represent the bias vectors of the update gate, reset gate and the activation function, respectively.
Based on the above, we propose to fuse 1D CNN with GRU, called as 1D CNN-GRU model, which can reserve the advantages of the two algorithms in time series data processing for sensitive bands extraction in the experiments.
The architecture designed in this study is shown in Fig.4, where the sensitive bands of multispectral data are obtained by designing the aforementioned IRIV-SPA model, which can then be used as the input of the 1D-CNN-GRU. The front end of the architecture is composed of one max pooling layer with a stride of 1, and three one-dimensional convolutional layers where the number of filters was 20, with a kernel size of 2, padding of 1, and stride of 1. The middle end of the architecture is composed of five GRU layers. Two fully connected layers are then used at the back end of the entire structure, with output numbers of 12, 10 respectively. Compared with ordinary convolution, the 1D convolution module can fully capture the relationship between the time-series data, and the information extracted from the 1D convolution will be used as the input of the GRU module, which can selectively retain historical information and use it for the detection of post-order data during the training process. This significantly improves the quantity and quality of the feature data, and is conducive to improving the performance of the model. The network was implemented in PyTorch 1.3.1 and trained in the Python 3.6.9 environment. All experiments were performed on a server computer equipped with a 2.2 GHz Intel R Xeon R Sliver-4210 processor, 64 gigabytes (GB) of random access memory (RAM), and an Nvidia Quadro P6000 graphical processing unit (GPU). The system was installed with Ubuntu R Mate 16.04 LTS operating system where all spectral processing/manipulation, such as interpolation and augmentation, was performed using Pandas, Numpy and Scikit-learn libraries for Python.

D. PERFORMANCE EVALUATION MEASURES
Three evaluation indicators were selected to evaluate the accuracy and generalization of the different models.
The mean absolute error (MAE): The mean relative error (MRE): The root mean square error (RMSE): where m represents the total number of samples, Y i represents the true values, y i represents the predicted values.

A. MULTISPECTRAL IMAGE EQUALIZATION
Histogram equalization was used to evenly distribute unbalanced points in the original image. It can increase the fluctuation of the pixels to enhance the contrast of the target pixels in the image so that the information of the target pixels is more abundant [31]. We used histogram equalization to process the multispectral images, as shown in Fig.5(a-b), where the original image is very dark and concentrated in brightness. The equalization results in an enhanced contrast, with the brightness having a more uniform distribution, and more detailed information can be seen in Fig.5(c-d)

B. FEATURE WAVEBANDS SELECTION
IRIV-SPA was first proposed to remove redundant wavebands and select feature wavebands based on raw spectral data. We divided the ray spectral data into a training set and a testing set at a ratio of 7:3 in the experiment. The training set was used to select feature wavebands, and the corresponding experimental results showed that the 700 nm band from Dataset I and the 615 and 675 nm bands from Dataset II were removed. The selection process of the feature wavebands based on the IRIV-SPA method is shown in Fig.6, in which there is a sharp variation trend in Fig.6(a) and Fig.6(c), indicating that the RMSE value exponentially decreases when the number of selected wavelength variables gradually increases. This reveals that some wavebands unrelated to chlorophyll content were eliminated during the band selection process. As shown in Table 1, four wavebands are selected in Dataset I according to Fig.6(a), including 400 nm, 475 nm, 575 nm, and 627 nm, and five wavebands are selected according to

C. RESULTS OF DIFFERENT MODELS USING RAW WAVEBANDS DATA
In this section, we compared 1-D-CNN-GRU with typical algorithms such as SVR, RF in estimating the chlorophyll content of the maize. The parameters of the SVR model mainly include the inner product kernel function, penalty parameter C and kernel parameter G. Here, the polynomial Considering the variations in chlorophyll content over the entire growth period, both datasets were divided into training and test sets in a ratio of approximately 5:1. Specifically, the number of test sets was 50 for Dataset I and 200 for Dataset II. Subsequent results were obtained on the test set. As shown in Fig.7, it is evident that the 1-D-CNN-GRU model has better stability and lower estimation errors than the other models established based on the raw bands data. Furthermore, it can be seen in Fig.8  Although the 1-D CNN-GRU model designed in this study performs well on raw spectral band datasets, there is still information redundancy in the raw band data. We hope to explore the simplest estimation model by reducing the number of redundant bands to detect the chlorophyll content. According to the results of performing the IRIV-SPA method above, for Dataset I, the selected feature wavebands were 400 nm, 475 nm, 575 nm, 627 nm, and for Dataset II, the selected feature wavebands were 425 nm, 550 nm, 575 nm, 775 nmm, and 850 nm. Thus, these features are not only conducive to improving the speed of operation for developing an online detection system, but also to better estimate the chlorophyll content in comparison with raw wavebands. As shown in Fig.9 and Fig.10, we compared 1D-CNN-GRU with the conventional RF model, SVR model based on the featured wavebands, RGB wavebands, and RGB-NIR wavebands data, and found that the results of 1D-CNN-GRU were more consistent with the trend of the true values, with an MRE of 0.108(Dataset I), 0.069(Dataset II),    (Dataset II). Furthermore, Fig.9 shows that the estimated errors of 1D-CNN-GUR are much lower than those of the other two methods in terms of evaluation indicators, such as MRE, RMSE, and MAE. In addition, from Fig.8 and 9, we see that the 1D-CNN-GRU using feature wavebands data has a lower error compared to the results based on the raw data. For the RGB wavebands and RGB-NIR wavebands, Fig.10 shows that the error of our model is still significantly lower than those of the other models, with an MRE of 0.108(Dataset I) and 0.069(Dataset II), RMSE of 7.568 (Dataset I) and 3.473 (Dataset II), and MAE of 0.6.344(Dataset I) and 3.573(Dataset II). Meanwhile, Fig.10(a-d) also show that the selected feature wavebands are more sensitive to chlorophyll than the RGB wavebands based on comparisons of the same models in the experiments, with lower errors than those models based on RGB wavebands. Additionally, from Fig.10(c-f), we also see that the estimation accuracy of the model using RGB and NIR bands is generally higher than that using only RGB, and this indicates that the NIR wavebands are sensitive to the chlorophyll content of maize. The 1D-CNN-GRU model based on the feature wavelengths selected by the IRIV-SPA was validated successfully in the experiments.  At the same time, we recorded the time consumption of the testing set for each model to verify whether our model can accomplish the realtime processing task. We tested each model ten times and used the average value as a reference. The results are shown in Fig.11, We can clearly see that on different data sets, the time consumption of all three models is within 0.2ms and not much different on different data sets, specifically, where we can see that for the smaller number of the testing set from Dataset I, the processing times of all three models are very fast, with the average of 0.15 ms per image; for the more test data in Dataset II, although RF model and SVR model perform faster than our model, all three models can finish within 0.2ms, showing that our model can easily meet the real time processing requirements, such as online prediction of chlorophyll content based on smart agriculture system etc. Furthermore, with the use of feature wavebands, our model reduces the time to less than average 0.07ms per image, and it is sufficient to indicate the feasibility of feature wavebands selection in processing real time tasks, especially when the data amount is larger.

E. RESULTS OF THE CROSS-VALIDATION OPTIMIZATION
As a check on the ability of the 1D-CNN-GRU model based on the multispectral waveband datasets at different growth   periods of the maize crop, we used cross-validation optimization to partition the test set and the training set. However, classical cross-validation techniques, such as K-fold, assume that the samples are independent and identically distributed. Especially, on time series data like our dataset in the experiments, this may lead to a situation where the 'future' predicts the 'past'. Even if the results are highly accurate, they are meaningless because we do not know future data in practice. Therefore, the model's accuracy must be evaluated based on observations from different periods in such a way that the 'past' is used to predict the 'future'. Fortunately, the TimeSeriesSplit function of the scikit-learn library (version 0.24.2) can provide the a means of doing so. The data in our experiments were split as shown in Fig.12. The entire dataset was evenly divided into six consecutive subsets. Except for the first subset, each can be left out and then predicted using the preceding subsets. The results are shown in Table 2, from which we can see that our model achieved higher accuracy in estimating the chlorophyll content of maize at different stages for the two datasets. This indicates that our model can accomplish the detection of chlorophyll content for the whole growth stage of maize, providing the possibility of application in smart agriculture monitoring systems

IV. CONCLUSION
In this study, 1D-CNN-GRU combined with IRIV-SPA is proposed to estimate the chlorophyll content of maize during the entire growth period. Our experimental results indicate that the IRIV-SPA can remove redundant wavebands and improve the performance of the 1D-CNN-GRU model. Compared with conventional SVR and RF, the 1D-CNN-GRU model with IRIV-SPA showed the best performance in terms of MRE, RMSE, MAE, and time consumption on the two serial time datasets of the chlorophyll content of maize because it can reduce the model complexity and extract more significant information related to the chlorophyll content of maize. In particular, GRU was successfully used to enhance the memory capacity of the 1D-CNN-GRU model. Experimental results show that the 1D-CNN-GRU model with IRIV-SPA-based multispectral data is an effective method for the online estimation of the chlorophyll content of maize, replacing the conventional portable equipment of SPAD-502 or complex chemical detection methods and providing a smart agriculture monitoring system with a new detection method based on near-ground multispectral wavebands processing.