Enhanced Leaf Area Index Estimation With CROP-DualGAN Network

Quantitative estimation of regional leaf area index (LAI) is an important basis for large-scale crop growth monitoring and yield estimation. With the development of deep learning, theoretically, the use of neural networks can effectively improve the accuracy of LAI estimation, but sufficient training samples are often required due to a large number of network parameters. In an actual regional LAI quantitative estimation, there are only a few samples, which is difficult to train in networks. Therefore, a crop dual-learning generative adversarial network (CROP-DualGAN) was proposed in this article for data enhancement of small samples to estimate regional LAI. The method uses dual learning to generate hyperspectral reflectance and corresponding LAI, including two groups of generative adversarial networks, in which the generator is used to generate data that conforms to the distribution of the training set, and the discriminator is used to judge the true or false generated samples. The generators and discriminators are constantly optimized in the confrontation so that the distribution of generated data is closer to that of training samples. In single crop type experiments, 30 training samples with enhanced in VGG16 achieved the $R^{2}$ of cereal, maize, and rape seed as 0.921, 0.990, and 0.956, and in SSLLAI-Net achieved the $R^{2}$ of cereal, maize, and rape seed as 0.971, 0.991 and 0.962. In multiple crop types experiments, the result is lower than individual crop estimation, but higher than that without enhancement. Finally, a non-parametric test is used to prove that most improvement in LAI estimation is significant, and the accuracy would not decrease when improvement is not significant. In all, the proposed method is universal and can effectively help benchmark models to improve regional LAI estimation accuracy with neural networks.


I. INTRODUCTION
L EAF area index (LAI) reflects crop growth as a significant biological parameter [1], thereby providing structured qualitative information to describe the conversion process between material on a vegetation canopy and energy. LAI plays an important role in the quantitative remote sensing of vegetation, ecosystem carbon cycling, vegetation productivity, energy balancing among vegetation, soil, and the atmosphere, and so on [2]. LAI is a crucial input parameter in ecological models and land surface models. It is often used as an indicator of vegetation conditions and is also an important agricultural index for monitoring crop growth and estimating production [3]. Therefore, it is of great significance to quantitively acquire spatiotemporally continuous regional LAI for crop growth monitoring and yield estimation [4].
Field LAI measurement methods include conventional direct measurement on the ground and remote sensing technology [5]. Because direct measurement has difficulty handling long time-series LAI observations in a large area [6], [7], and remote sensing technology provides an effective way to quickly and timely obtain regional LAI and has thus become an LAI monitoring trend [8], [9]. At present, LAI estimation methods mainly include statistical models, physical models, and data assimilation. Statistical methods, such as highly correlated statistical models, have high coefficients of determination, but are poor for regional promotion [10], [11], [12], [13]. Physical models face ill-posed problems due to complexity and are highly dependent on the authenticity of radiative transfer model simulations and proper model parameter initializations [14], [15], [16]. Data assimilation method is affected by the utilized observation variables and crop growth models. Each assimilation method has its application scope and conditions [17], [18].
Recently, with the development of machine learning, many machine learning methods have successively emerged to realize large regional LAI estimation based on parts of bands or vegetation indices [5], [19], [20]. The widely used machine learning methods for LAI estimation are artificial neural networks (ANNs), support vector machines (SVMs), random forests (RFs), ensembles of trees (ETs), regression trees (RTs), radial basis functions (RBFs), generalized regression neural networks (GRNNs), Gaussian process models (GPMs), and deep belief networks (DBNs) [20], [21], [22]. ANNs fit well on complex, high-dimensional, and nonlinear data, and have high accuracy. SVMs similarly support high-dimensional inputs in regression models, but they need fewer training This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ samples than ANNs. RF has high precision, high calculation speed, and robustness in parameter estimation, and it can rank variables according to their importance in LAI estimation [13], [19], [23]. Machine learning can improve the precision of estimation over that of traditional estimation methods, but it is very dependent on the number of measured training samples. Therefore, sufficient training samples are needed for model training, and the uncertainty of band combinations may affect the accuracy of LAI estimation [21], for which Bayesian networks are normally used to select hyperspectral bands for modeling [24], [25].
This article utilizes the self-learning characteristic of deep learning to solve the problems of machine learning in LAI estimation. We select all the hyperspectral band reflectance as the network inputs for LAI estimation, according to prior knowledge. Usually, neural networks need sufficient training samples [21], because many parameters are utilized (the number of parameters generally reaches 10 9 ). However, in actual regional LAI estimation, a few measured samples are obtained, which is difficult to train well in general neural networks. Aiming at using small samples to estimate parameters, it can be solved in two ways: one is to use Bayesian estimation with incorporating prior information to reduce severely biased estimates [26] or a lightweight network to complete regression estimation [27], [28], the other is to realize data enhancement for small samples by deep learning [29], [30]. Adversarial networks can be used in computer vision to realize unsupervised dual learning in image-to-image translation [31]. Yi et al. [32], Qu et al. [33], and Omdal [34] utilized a dual generative adversarial network (DualGAN) to complete image-to-image translation. Li et al. [35] detected outliers and Prokopenko et al. [36] used an improved DualGAN to generate synthetic computed tomography images. In addition, adversarial networks have more applications [37], [38]. Thus, we consider utilizing DualGAN to realize data enhancement for small samples and then achieve the purpose of LAI estimation by benchmark models.

A. CROP-DualGAN
A DualGAN is a GAN with dual learning capabilities that contains two groups of generators and discriminators. Generators are used to generate data with a distribution that is subject to the training samples distribution. Discriminators judge the true or false generated samples. Generators and discriminators can be continuously optimized during the confrontation to make the distribution of generated data much closer to that of the training samples. We aim to generate pairs of hyperspectral reflectance and LAI by the DualGAN. Original DualGAN can be modified according to research data, and called crop duallearning generative adversarial network (CROP-DualGAN), as shown in Fig. 1. The generator for generating LAI is denoted as G A , the generator for generating hyperspectral reflectance is denoted as G B , and the corresponding discriminators are denoted as D A and D B .
The domain U contains all original hyperspectral reflectance and generated hyperspectral reflectance, and the domain V contains all original LAI and generated LAI. As shown in   G A is a network for generating LAI according to hyperspectral reflectance, and the network contains seven basic residual blocks. The structure of the G A network is shown in Fig. 2(a). In the basic residual blocks, the main path stacks two convolution layers with 1-D convolution kernel sizes of 3. Furthermore, the padding is set to the same values, and all convolution layers are activated by a rectified linear unit (ReLU). Two shortcut cases can be described as below. One is that when the numbers of channels in the main path and shortcut are different, we need a 1-D convolution with its kernel size as 1 to make the number of channels in the shortcut equal to that of the main path [the residual block is shown in Fig. 2(b)]. The other is that when the numbers of channels are the same, the shortcut's value is the output of the last residual block [the residual block is shown in Fig. 2(c)]. The features learned by the residual block are obtained by adding the values of the main path and shortcut. Then, the mentioned features activated by the ReLU are the final outputs of this residual block. The outputs are used as the inputs of the maximum pooling layer for downsampling. The maximum pooling can reduce dimension and remove redundant information whose size is set to 2 normally and step size is consistent with it [39]. The padding size is set to 1 to fill the boundary, and other parameters set to 0 represent the default. G B is a network for generating hyperspectral reflectance according to LAI, where the inputs are LAI and the outputs are hyperspectral reflectance with 244 bands. The generation procedure is a process of mapping from low-dimensional data to high-dimensional data. The structure of the G B network is shown in Fig. 3. The first layer of G B is a fully connected layer with 16 neurons, meaning that the LAI values are mapped to 16-D vectors. Then, the network repeats deconvolution and convolution operations, finally obtaining hyperspectral reflectance at the output layer. G B is stimulated by the fusion of decoded features and encoded features in U-Net [39]. However, it does not utilize channel stitching to fuse features. It adds only the convolutional features in G A to the corresponding deconvolution features in G B . The prerequisite is that the dimensions of the features in the convolution step and the corresponding up-sampling step are the same. And when they are different, the features with higher dimensions need to be center cropped. The features extracted by deconvolution possess only one dimension more than the features extracted by convolution, so we remove the last 1-D feature. In short, G B repeats deconvolution, feature fusion, and convolution operations to obtain more reliable hyperspectral reflectance.
The size of the 1-D deconvolution kernel is 2, which is the step size. The size of the 1-D convolution kernel is 3, the step size is set to 1, and the padding is the same as above.
The D A is a fully connected network with four layers activated by LeakyReLU (its constant λ is set to 0.2). In D A , the numbers of neurons in the order layers are 512, 256, 128, and 1. The input of D A is hyperspectral reflectance, and its output is a probability that discriminates generated data from real data. D B is slightly different from D A , which has three fully connected layers with 16, 16, and 1 neurons.

B. Loss Functions
The loss functions of G A and G B are optimized simultaneously, and their optimizations are related to minimized reconstruction errors are two reconstruction losses, and λ U and λ V are two constant parameters. In our study, they are equal to 5. The loss functions of D A and D B are optimized respectively. They all add gradient penalties to the initial discriminator loss functions of the GAN [40]. The reason for this setting is that if the discriminators are optimal during training, the generators will encounter gradient disappearance problems and lack diversity. Based on previous research and summaries [41], [42], [43], the above problems can be prevented by adding gradient penalties. D A and D B can be defined as where λ is a gradient penalty constant, ∇, E, and · p represent gradient, mathematical expectation, and p-norm, respectively. Here, λ = 10 and p = 2.x inV and x inÛ are defined asx where ε[0, 1].

C. Hyperparameter Settings
We utilize the root mean squared propagation (RMSProp) algorithm to optimize the loss functions. But the choice of hyper-parameters is always a challenge, so we take a try and trail way to manually search for optimal parameters in a scope. The hyperparameters of RMSProp in D A and D B are set as follows. β 2 is 0.99, and the weight decay is 0.9. They are set to 0.95 and 0.9 in G A and G B , respectively. In addition, we provide a series of settings. For example, the initial weights are subject to a Gaussian distribution with a mean value of 0 and a standard deviation as (2/n) 1/2 (where n is the number of weights in every layer) [44]. All biases are 0, the learning rate η is 2 × 10 −4 , and the batch size is 4.

D. Training
The generators and discriminators alternately update in the networks. When generators G A and G B are fixed, discriminators D A and D B are trained. D A is optimized by correctly discriminating v as true while G A (u, z) is false. Similarly, D B is optimized by correctly discriminating u as true while G B (v, z ) is fake. Similarly, when discriminators D A and D B are fixed, G A and G B are trained. G A and G B are optimized simultaneously to emulate "fake" outputs to blind the corresponding discriminators. To get better generators, we train the discriminators for 1 step and the generators for five steps. The network tends to be stable as losses are within a certain range.

E. Enhanced Data Selection
Due to DualGAN being an unsupervised dual-learning network for image-to-image translation, the generators G A and G B are structured by U-Net. Discriminators can contain fully connected layers or add additional convolution layers appropriately [31]. The outputs in discriminators evaluate the similarity between fake data and true data. When training is finished, many generated data can be obtained. Generally, images with precisely aligned pixel features in domains U and V are screened by Yi et al. [32], which is a subjective process.
For solving the problems of existing data screening methods, and ensuring that the distribution of select generated samples is much closer to the training samples distribution, we propose a more objective and reasonable method to select samples generated. According to the training samples written by X = x 1 , x 2 , . . . , x n , the generated samples are divided into n sets written by G x 1 = x 11 , x 12 , . . . , x 1n 1 , G x 2 = x 21 , x 22 , . . . , x 2n 2 , . . ., and G x n = x n1 , x n2 , . . . , x nn n . G x 1 , G x 2 , . . . , G x n are sorted by errors from small to large, separately. Then, G

A. Experimental Flow
CROP-DualGAN realizes data enhancement for the initial training samples. Then, generated samples were selected by proposed rules with initial training samples to estimate LAI based on benchmark models. Finally, the LAI estimation in this article is compared with the result without enhancement and random selection.

B. Experimental Data
EnMAP data, including hyperspectral reflectance and LAI products, are openly published by the European Space Agency (http://www.enmap.org/). The research area is located in the alpine foothills of Germany (48.0514 • N, 111.0760 • E), and was obtained on July 22, 2006. It shows cereal, maize, and rape seed in the middle and late growth stages. Atmospherically corrected [45] hyperspectral reflectance has 244 bands in the range of 420-2460 nm, where the spatial resolution is 30 m. The LAI products are obtained by the inverse process of the coupled soil-leaf-canopy model [46].

C. Preprocessing
In the EnMAP data, the range of the original hyperspectral reflectance is within 10 000, and the LAI range is from 0 to 7. Here are two ways of data preprocessing. One is that hyperspectral reflectance needs to be reduced by 1000 times, while original LAI need not be preprocessed for CROP-DualGAN to enhance samples. The other way is that generated hyperspectral reflectance needs to be reduced by ten times for benchmark model estimation, and the original hyperspectral reflectance needs to be reduced by 10 000 times. The normalization of the generated LAI and original LAI is defined by the following equation: y = LAI−LAI min LAI max − LAI min (6) where y is the normalized LAI. Then, the preprocessed samples are put into the LAI estimation benchmark models for training.

D. Benchmark Models
Original training samples with selected enhanced data are put into benchmark models to estimate the LAI. In this article, Visual Geometry Group 16 (VGG16) [47] and small samples learning LAI-Net (SSLLAI-Net) [27] are regarded as the benchmark models to estimate LAI. Due to VGG16 and SSLLAI-Net represent two different types of networks, where VGG16 with many parameters is a general classical regression network for an estimation that can be applied to the research of the manuscript and SSLLAI-Net is a lightweight neural network that is dedicated to LAI estimation supporting small samples training.
VGG16 consists of 16 weight layers, including 13 convolution layers and three fully connected layers. In addition, VGG16 has five pooling layers without weights. In the convolution layers, the 1-D convolution kernel size is 3, and the feature dimensions remain unchanged after the convolution layers. In the pooling layers, the pool size and step size of 1-D pooling are set to 2. In the fully connected layers, the numbers of neurons in the three layers with dropout are 512, 512, and 1. The initialization weights of VGG16 are subject to a Gaussian distribution with the mean value as 0 and the standard deviation as (2/n) 1/2 (where n is the number of weights in every layer). The learning rate η is 0.0001.
SSLLAI-Net is a lightweight neural network containing two convolution layers, one pooling layer, and three fully connected layers. The kernel size and step size of the 1-D convolution are both 3. The numbers of channels in the first and second convolution layers are 4 and 16, respectively. The maximum pooling layer is connected after the second convolution layer, whose pooling size and step size are 3. The fully connected layers possess 32, 8, and 1 neuron. In addition, the first connection layer sets the dropout. Similar to VGG16, the initialization weights of SSLLAI-Net are also subject to a Gaussian distribution with the mean value as 0 and standard deviation as (2/n) 1/2 . Its initial learning rate η is 0.01, which decreases with iterations increasing.

E. Results
Two experiments are presented in this section. One is that 200, 100, 50, and 30 initial training samples are selected randomly according to the proportion of cereal 66.6%, maize 15.6%, and rape seed 17.8%, separately, and enhanced to 500 by CROP-DualGAN and proposed selection method. Similarly, the other is that 300 and 200 initial training samples are selected randomly according to the proportion of three kinds of crops, and enhanced to 1000. After training, the test accuracy in this article is compared with the accuracy without enhancement and random selection. To avoid accidental results, every experiment is repeated ten times, and an average value was taken as final accuracy and standard deviation as the measure of stability. Finally, Kruskal-Wallis H Test belonging to the non-parametric test is used to prove whether the improvement in LAI estimation is significant.
In the single crop type experiments, Tables I and II list the R 2 and RMSE of VGG16, separately. Similarly, Tables III and IV list those of SSLLAI-Net. It can be seen that the proposed method achieves higher accuracy and is more stable in both VGG16 and SSLLAI-Net, most improvement in LAI estimation is significant, and the accuracy would not decrease when improvement is not significant. For Tables I and II, it is clear that cereal and rape seed with 30, 50, and 100 initial training samples show that the accuracy of estimation is improved significantly with data enhancement (p-value is smaller than 0.05). While the results of cereal with 200 initial training samples are not significant, but would not be lower than VGG16. Maize in CROP-DualGAN-VGG16 is   , for Tables III and IV, cereal and maize with 30, 50, and 100 initial training samples, and rape seed with 30, 50, 100, and 200 initial training samples show that the accuracy of estimation is improved significantly based on CROP-DualGAN-SSLLAI (p-value is smaller than 0.05). While the results of cereal and maize with 200 initial training samples are not significant, because SSLLAI-Net is a lightweight network supporting small samples training, and the R 2 of cereal and maize has achieved 0.989 and 0.995, resulting in limited improvement, but the accuracy of LAI estimation in CROP-DualGAN-SSLLAI is still higher than that of SSLLAI-Net.  In addition, SSLLAI-Net is better than VGG16 for LAI estimation because is a lightweight network for small-sample training. VGG16 cannot reach the ideal accuracy in the case of small samples, but it can improve the accuracy of crop estimation with the help of CROP-DualGAN. For the crop type, the accuracy of LAI estimation is related to its distribution of various crops which is shown in Fig. 4. We can see that the LAI distribution of cereal, maize, and rape seed is most within LAI = 3. Besides, some LAI of cereal and rape seed is also distributed around LAI = 7, and a few between LAI = 4 and LAI = 7. Therefore, it causes data imbalance, and estimated errors mainly come from around LAI = 7. Compared with the distribution of cereal and rape seed, here are more LAI of maize distributed between LAI = 4 and LAI = 7. And fewer around LAI = 7, so the estimated accuracy of maize is higher than those of cereal and rape seed. Figs. 5 and 6 show the LAI estimation of VGG16 and SSLLAI-Net, respectively, based on 30 initial training samples.
In the multiple crop types experiments, Tables V and VI list the R 2 and RMSE were obtained for three kinds of crops, respectively. Compared with single crop type experiments,  the accuracy of rape seed together with cereal and maize is obviously lower than individual crop estimation. Due to  benchmark models estimating LAI are data-driven methods, that is learning the relationship between hyperspectral reflectance and corresponding LAI, so the accuracy of LAI estimation is affected by data distribution. As shown in Fig. 4, the distribution for LAI of cereal, maize, and rape seed is different. Therefore, the accuracy of cereal together with maize and rape seed is lower than cereal estimation individually, so are maize and rape seed. For Tables V and VI, the improvement of LAI estimation is significant in CROP-DualGAN-SSLLAI. In CROP-DualGAN-VGG16, cereal and rape seed with 200 initial training samples shows that the accuracy of estimation is improved significantly and maize shows the results are not significant because these have performed well in VGG16. But the accuracy of LAI estimation in CROP-DualGAN-VGG16 is not lower than that of VGG16. Figs. 7 and 8 show the LAI estimation of VGG16 and SSLLAI-Net, respectively, based on 200 initial training samples.  In terms of network analysis, CROP-DualGAN can help benchmark models estimate LAI more accurately, owing to its network structure consisting of two pairs of generators and corresponding discriminators. Generators are used to generate hyperspectral reflectance and LAI, while discriminators judge the true or false samples generated by themselves. The generators and discriminators can be continuously optimized during the confrontation to make the distribution of the generated data closer to the distribution of training samples. Moreover, the proposed data selection ensures samples are more balanced. In this article, the Kolmogorov-Smirnov Test is used to test whether the initial training samples and the selected generated samples have the same distribution, as shown in Fig. 9. It can be seen that the p-value >0.05 which is the given significance level. That is, the samples enhanced by CROP-DualGAN have the same distribution as the initial training samples, so it is the rationality of data augmentation to effective LAI estimation improvement.

IV. CONCLUSION
Currently, the problem to estimate LAI with deep learning is that measured samples are insufficient, while the CROP-DualGAN proposed in this article can solve the above problem through data enhancement. Experiments prove that most of the samples enhanced by CROP-DualGAN are effective in LAI estimation improvement. The proposed method is universal to solve insufficient samples. However, in the multiple crop types experiments, the accuracy of LAI estimation is obviously lower than that of single crop type experiments. In the next work, we will focus on solving data imbalance to improve LAI estimation in the multiple crop types experiments. Data enhancement for small samples to improve LAI estimation can be applied to further support crop growth condition monitoring, crop stresses (drought, flood, pest, disease, etc.) detection, and yield estimation. Moreover, LAI has an important role in the calculation of ecosystem carbon sink, so this data enhancement method and algorithm can be extended to the crop, forest, grass, etc. vegetation parameter estimation to support the research in the future.