Recognition of Drought Stress State of Tomato Seedling Based on Chlorophyll Fluorescence Imaging

Drought has become one of the main challenges facing global agricultural production and crop safety. Drought stress will lead to the termination of crop photosynthesis, which will seriously affect the growth and development of crops. We aimed to study a method for identificaton of the drought stress in tomato seedlings using chlorophyll fluorescence imaging. In this study, chlorophyll fluorescence parameters and there corresponding chlorophyll fluorescence images of 4 different drought stress levels were collected. Then three feature optimization algorithms which were Successive Projections Algorithm, Iteratively Retains Informative Variables and Variable Iterative Space Shrinkage Approac were used to choose important parameters. A total of five common parameters were obtained, and the corresponding chlorophyll fluorescence images of the five common parameters were selected. And two types of image features were used to study and analyze drought stress classes: histogram features and texture features. The Pearson correlations of the features were calculated and the high correlated features were input into three models, which were Linear Discriminant Analysis (LDA), Support Vector Machines (SVM) and k-Nearest Neighbor (KNN), to identify drought stress classes. The recognition accuracy rate of LDA, SVM and KNN were 86.8%, 87.1% and 76.5% respectively. Our experiment results showed that the five common fluorescence parameters and there corresponding image features could be used to evaluate the drought stress classes of tomato seedlings, and had a good evaluation effect. This research provideed a new method for monitoring drought stress classes and had considerable prospects for non-destructive diagnosis of plant drought stress.


I. INTRODUCTION
Agricultural drought has become one of the main challenges facing global agricultural production and crop safety [1], [2]. As a multi-dimensional stress, drought can cause changes in crop phenotype, physiology, biochemistry, and molecular level. In severe cases, it will lead to the termination of photosynthesis and metabolic disorders, and ultimately lead to crop death [3]- [5]. As one of the important vegetable crops widely planted worldwide, tomato is vulnerable to The associate editor coordinating the review of this manuscript and approving it for publication was Nadeem Iqbal. drought stress during the cultivation process, which limits its productivity [6]- [8]. Therefore, timely identification of tomato drought stress is of great significance to improvement of tomato growth, yield and quality.
The traditional methods for diagnosing of plant stress were complicated to operate and greatly affected by human operations, had varying degrees of damage to the crops, and could not be used in automated detection [9]- [11]. Chlorophyll fluorescence technology has been used as a probe for photosynthesis research and become an important method for diagnosing plant stress, which is non-destructive, fast and automatable [12]- [14]. At present, a huge number of studies VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ had shown the correlation between chlorophyll fluorescence and crop physiological status. Chen, JL et al. [15] studied the effects of drought and rehydration on the chlorophyll fluorescence parameters (CFP) and physiological responses of Artemisia halodendron to reveal the mechanisms responsible for A. halodendron's tolerance of drought stress. Saric-Krsmanovic, M et al. [16] studied the effect of the parasitic plant Cuscuta on the chlorophyll fluorescence and chlorophyll content of infested alfalfa and sugar beet obtaining Fo, Fv/Fm, phi, Fv and IF these parameters could be regarded as sensitive indicators of the influence of dodder on its host plants. Sun, Z et al. [17] used chlorophyll fluorescence to detect the salt stress of wheat, and the results showed that salt stress caused the chlorophyll content and chlorophyll a fluorescence parameters of 43 wheat varieties and breeding lines to decrease to varying degrees. The salt-induced reductions in PI and Fv/Fm varied greatly among 43 wheat varieties. Li, H et al. [18] investigated whether chlorophyll fluorescence imaging can identify herbicide stress in soybeans shortly after application. Results showed that maximal PSII quantum yield and shoot dry biomass was significantly reduced in soybean by herbicides compared to the untreated control plants.
Chlorophyll fluorescence imaging was a visual technology in chlorophyll fluorescence technology, which could reflect the photosynthesis information of crops such as the absorption and conversion of light energy by leaves, the transmission and distribution of energy, and the state of reaction centers [19]- [21]. Chlorophyll fluorescence imaging technology could obtain the CFP and the spatial heterogeneity distribution of chlorophyll fluorescence non-destructively, simply and quickly, and had become an important method for diagnosing chlorophyll fluorescence [22]- [24]. Many scholars used chlorophyll fluorescence imaging to study the effects of stress on plants. Zhou,CY et al. [25] developed a chlorophyll a fluorescence-induced imaging analysis classification method based on Artificial Neural Network. The effects of three different water and three different nitrogen treatments on fluorescence parameters were obtained through hundreds of time-resolved fluorescence images. The results showed that the chlorophyll a fluorescence image recognizes different water and nitrogen states of plants, and had a high recognition accuracy. Yao, JN et al. [26] combined dynamic chlorophyll fluorescence with multi-color fluorescence imaging for plant phenotype analysis, and combined with the support vector machine resulted in good classification accuracies of 93.3% and 99.1% for classifying the control plants from the drought-stressed ones with 3 and 7 days treatments, respectively. Cen, HY et al. [27] used chlorophyll fluorescence imaging combined with feature selection to characterize and detected HLB disease. Measured the chlorophyll fluorescence image of citrus leaf samples by the internal chlorophyll fluorescence imaging system. The results showed that the new data-driven method combining average fluorescence parameters and image features provided the best classification performance with 97% accuracy. Dong, ZF et al. [28] used chlorophyll fluorescence imaging technology to complete the identification of chilling injury of tomato seedlings. By calculating the Person correlation between fluorescence parameters and the degree of chilling injury, six fluorescence parameters, including actual photochemical quantum yield and steady-state light adaptation photochemical quenching coefficient, could be used to evaluate the chilling injury of tomato seedlings, and a neural network was used to construct a prediction model. The recognition accuracy rates of the training set and validation set were 90.3% and 90%, respectively.
In this study, the chlorophyll fluorescence imaging technology was used to collect the fluorescence parameters and fluorescence images of the plant canopy. Successive Projections Algorithm (SPA), Iteratively Retains Informative Variables (IRIV) and Variable Iterative Space Shrinkage Approac (VISSA), were used to choose important parameters, and five common selected fluorescence parameters of these three algorithms were used to evaluate drought stress classes. Based on the values of the five common fluorescence parameters and the image features of the five common fluorescence parameters that have a strong correlation with the drought stress state, the identification models of the drought stress state were established respectively, and the comparative analysis was carried out to obtain the best model, which proved the feasibility of using the chlorophyll fluorescence imaging technology to identify the drought stress state of tomato seedlings. Specific contributions of this study were three folds: (1) Found the parameters that could effectively identify the drought stress state of tomato at seedling stage among all 98 CFP; (2) Extracted fluorescence image features that were strongly correlated with drought stress; (3) Established different drought stress identification models and compared there performance. The general block flow diagram of the proposed method was illustrated in Fig. 1.

II. EXPERIMENT AND DATA COLLECTION A. PLANT GROWTH ENVIRONMENT AND STRESS CONDITIONS
The drought stress experiment was conducted in May 2021 at Institute of Water-saving Agriculture in Arid Areas of China (IWSA), Northwest A&F University. 140 tomato seedlings with basically same growth were transferred to an artificial climate room. In the artificial climate room, the ambient temperature was 24 • C during the day and 14 • C during the night, and the relative humidity was 60%, the photoperiod was 14/10h light/dark. First, all plants received sufficient water to maintain soil water content close to field capacity. After 2 days, 35 tomato seedlings were received sufficient water, and the rest of plants were the water-stressed plants. Both well-watered and water-stressed plants were weighed every day. The water-stressed plants were treated with three different degrees of drought. 35 tomato seedlings were irrigated with a water volume of 55%-65% of field capacity, called mild drought plants. Another 35 tomato seedlings were irrigated with a water volume of 35%-45% of field capacity, called moderate drought plants. And the last 35 tomato seedlings were irrigated with a water volume of 15%-25% of field capacity, called severe drought plants [29]. After 5 days of differential watering, the chlorophyll fluorescence measurements were made.

B. CHLOROPHYLL FLUORESCENCE IMAGE ACQUISITION
In this study, a PlantScreen plant phenotype imaging analysis system (Beijing Ecotech Co., Ltd.) was used to collect plants chlorophyll fluorescence images. The system consists of measuring light source (610-620 nm, red light), photochemical light source (610-620 nm, red light; 470-480 nm, blue light), saturated light source (470-480 nm, blue light), dark adaptation room, Computer and control software and other components. The tomato seedlings were subjected to dark adaptation for 20 min before the images were collected. The system used FluorCam 7.0 software to collect and analyze data. The images of the minimum fluorescence parameters under different drought stress conditions were shown in Fig. 2.
A total of 98 fluorescence images were acquired for each sample, that is, there are a total of 98 CFP. The index number of the parameters, symbols for the parameters and the parameter description were shown in Tab. 1 [30].
C. DATA PREPROCESSING First, the data had normalized. Data normalization is the dimensionless processing of data, which mainly solved the comparability of data [31]. Converting the original data into non-dimensional index evaluation values, that is, all index values are at the same quantitative level, which can effectively improve the accuracy of the mathematical model [32]. In this study, the commonly used linear normalization method was selected to normalize each parameter. Then the SPSS 20.0 software was used to eliminate outliers with 3 times the standard deviation.

III. FEATURE EXTRACTION
In order to obtain important fluorescent parameters that can effectively identify the drought stress state of tomato at seedling stage among all 98 CFP, SPA, IRIV and VISSA were VOLUME 10, 2022 used to selected the useful CFP. The changes of common CFP with the deepening of drought stress were analyzed. To prove the feasibility of using common fluorescence parameters to predict drought stress state. Furthermore, the image features of common fluorescence parameters were extracted and Pearson correlation analysis was performed. The image features strongly correlated with drought stress state were used as the input of the next model.

A. CHLOROPHYLL FLUORESCENT PARAMETERS SELECTION
(1) SPA is a forward loop variable selection method, which can effectively select feature variables and reduce the data dimension. This method set the maximum and minimum values of the number of selected parameters, and then loops iteratively to calculate the projection size of the variable on other unselected variables. Introduced the variable with the largest projection vector into the variable combination, established the Partial Least Squares (PLS) regression model, and calculated the Root Mean Square Error (RMSE) obtained by different parameter combinations. The loop ends when the number of characteristic parameters reached the value corresponding to the minimum RMSE [33], [34].
(2) IRIV divides variables into four categories based on the idea of Model Population Analysis (MPA): strong information variables, weak information variables, non-information variables and interference information variables. Eliminate interference information variables and non-information variables, retain strong and weak information variables, and finally eliminate the remaining strong and weak information variables through a reverse elimination strategy. Therefore, the remaining variables are characteristic variables [35], [36].
(3) VISSA algorithm is an optimization algorithm for variable selection, which is based on MPA and Weighted Binary Matrix Sampling (WBMS). First, use WBMS to extract some sub-training data sets from the original data set, and then build a PLS model of variable subsets. Sort the RMSE of Cross-Verification (RMSECV) values of different sub-models to obtain the best model, extract the best model, and obtain a new sub-training data set. Repeat the above process until the weights of all variables are constant (1 or 0). Finally get the best model, select the best set of variables [37], [38].
When the SPA algorithm optimizes the CFP, set the number of selected parameters to range from 1 to 38. The number of parameters and their corresponding RMSE were shown in Fig. 3a. It could be seen from Fig. 3a that when the number of selected parameters was greater than 12, the RMSE basically remains stable, so a total of 12 chlorophyll fluorescence kinetic parameters were extracted.
When the IRIV algorithm optimizes the CFP, a 5-fold cross-validation method was used to establish a PLS model, and RMSECV was used as the evaluation index to select characteristic variables. The parameter selection process was shown in Fig. 3b. After the first iteration, the number of parameters was reduced from 98 to 45. After the second and third iterations, the number of parameters stabilizes at 36. After eliminating irrelevant or interfering parameters through reverse, 29 chlorophyll fluorescence kinetic parameters were retained.
When the VISSA algorithm optimizes the CFP, set the number of variables generated by WBMS to 2,000, the submodel ratio to 5%, and the initial weight of the variables to 0.5. Fig. 3c showed the change trend of RMSECV with the number of optimal parameters. From Fig. 3c, it could be seen that with the increase of the number of parameters, RMSECV presents a trend of a large decrease first and then a small increase. Fewer variables correspond to larger RMSECV, indicating that fewer parameters cannot accurately express the degree of drought stress. When the number of parameters is too large, RMSECV will increase accordingly, indicating that the parameters at this time contain redundant information, which was not good for modeling. Therefore, 25 chlorophyll fluorescence kinetic parameters were selected for this study.

B. ANALYSIS OF 5 COMMON PARAMETERS
Based on the above three parameter extraction algorithms, five common CFP were obtained. They are No. 55 QY_L2, No. 64 NPQ_L3, No. 87 qL_L2, No. 90 qL_Lss and No. 93 qL_D3. Fig. 4 showed the changes of these 5 fluorescence parameters under different drought degrees. QY_L2 is the photon quantum efficiency of light adaptation. QY_L2 showed a downward trend with the increase of drought stress. NPQ_L3 is non-photochemical quenching, which increased with the increase of drought stress. qL_L2, qL_Lss and qL_D3 are photochemical quenching. qL_Lss first increased slightly under mild drought condition, and then sharp decreased under moderate and severe drought condition. qL_L2 and qL_D3 continued to fall with the increase degree of drought stress. It could be obtained that the above five public parameters showed regular changes as the degree of drought stress deepens, which further proves the feasibility of using the five public parameters to recognize the drought stress state of tomato seedlings.

C. IMAGE FEATURE ACQUISITON
Chlorophyll fluorescence imaging technology provides information on the distribution of plant chlorophyll fluorescence signals in two-dimensional space. Therefore, image feature extraction can better reflect the information of photosynthesis at the canopy level. Based on the above five common fluorescence parameters, this study used histogram features and texture features to extract image information.
In this study, a total of 30 histogram features were extracted. 15 statistical features were extracted, which were mean, standard deviation and third-order moment of the 5 common CFP images. 15 Gaussian curve fitting parameters were extracted, which were the peak value, the x-coordinate of the peak and standard deviation of the first-order Gaussian fitting curve of the 5 common CFP images histogram. Fig. 5 showed QY_L2 first-order Gaussian curve fitting under four drought stress states. It reveals that under different drought stress states, the peak value of Gaussian fitting curve, its corresponding abscissa and curve trend were different, which could represent different drought stress states.
Texture feature is one of the main features of image information [39]. In this study, a total of 35 texture features were extracted. Gray-level co-occurrence Matrix (GLCM) was a commonly used texture calculation method. 20 GLCM characteristics were extracted, including energy, entropy, inertia and correlation of the 5 common CFP images GLCM. Compared with GLCM, Gray-Gradient co-occurrence Matrix (GGCM) could capture the texture arrangement direction and pixel Gradient change information [40]. Therefore, 15 characteristics of GGCM were extracted, including Large gradient dominance, Nonuniformity of gray distribution and Non-uniformity of gradient distribution of the 5 common CFP images GGCM.

D. IMAGE FEATURE CORRELATION ANALYSIS
The correlation coefficient is a statistical indicator that describes the closeness of the correlation between things or data variables [41]. For the correlation study of bivariate data, the Pearson correlation coefficient is usually used, and its calculation formula is shown in formula (1).
where N is the number of samples. The interval of the correlation coefficient r is [−1, −1]. When the absolute value of the correlation coefficient is greater than 0.6, the two sets of data are strongly correlated. The absolute value of the correlation coefficient was greater than 0.8, indicating a strong correlation between the two sets of data [42]. In order to further improve the accuracy of modeling, reduce the number of modeling parameters. In this study, Pearson correlation analysis was used to extract image features related to drought stress state. Fig. 6 showed the correlation between image features and drought stress state. There was 1 image feature which correlation coefficient was greater than 0.8, showing an extremely strong correlation. And there were 7 image features which correlation coefficients were higher than 0.6, showing a strong correlation. In this study, image features with Pearson correlation coefficient greater than 0.6, that is, strong correlation with drought stress state, were selected to participate in the next step of building drought stress recognition model. There were 7 image features with correlation coefficient greater than 0.6, which are the mean histogram of NPQ_L3, qL_Lss and qL_D3, the standard deviation of gaussian curve of qL_Lss histogram, the mean entropy of qL_L3, the mean moment of inertia of QY_L2 and the large gradient advantage of qL_Lss.

IV. DROUGHT STRESS RECOGNITION METHOD
This study used three machine learning algorithms which are Linear Discriminant Analysis (LDA), Supported Vector Machines (SVM) and k-Nearest Neighbor (KNN) to establish three drought stress recognition models and used K-fold cross validation to evaluate the models performance. The study choose 5-fold cross-validation method. In each fold cross-validation, 80% of the samples were randomly selected from the sample data to construct the training set, and the remaining 20% were the testing set. Using K-fold crossvalidation accuracy and confusion matrix as the evaluation indicators of the model. The kernel function of SVM algorithm was Gaussian kernel function and one-to-one multiclassification method is selected. KNN algorithm selects the type as weighted KNN and sets the neighborhood number as 10.

A. DROUGHT STRESS RECOGNITION RESULTS
Using common fluorescence parameters and image features as model inputs, drought stress state discrimination models were established. The model accuracy of different inputs was analyzed and compared to further confirm the feasibility of using image feature modeling. The drought stress recognition accuracy of each method was shown in Tab. 2. As can be seen from Tab. 2, when just five public parameters are involved in modeling, SVM algorithm had the highest recognition accuracy of 83.7%. When the selected image features were involved in the modeling, the SVM algorithm had the highest recognition accuracy of 87.1%.

B. CONFUSION MATRIX ANALYSIS
Although the recognition accuracy can reflect the overall classification effect of the model, it cannot obtain the classification effect of different drought degrees. Therefore, it is necessary to analyze the specific classification effect through confusion matrix [43]. Confusion matrix analysis was conducted on three models based on image features, as shown in Fig. 7. It could be concluded that the recognition accuracy of LDA algorithm for severe drought can reach 90%, and the recognition accuracy of suitable water, mild drought and severe drought was greater than 85%. The recognition accuracy of SVM algorithm for suitable water and severe drought was 90%, and the recognition accuracy of light drought and moderate drought was more than 82%. The recognition accuracy of KNN algorithm for suitable water and severe drought was more than 80%, but the recognition accuracy of light and moderate drought was only 69%. Therefore, SVM algorithm was more suitable for recognition of drought stress degree of tomato at seedling stage based on chlorophyll fluorescence image, and its recognition rate of suitable water, mild drought, moderate drought and severe drought are 90%, 82%, 87% and 90%, respectively.

C. DISCUSSION
In this study, five chlorophyll fluorescence parameters related to the drought stress state of tomato seedlings were obtained, namely QY_L2, NPQ_L3, qL_L2, qL_Lss and qL_D3. QY_L2 was the light quantum efficiency of light adaptation, QY_L2 showed a decreasing trend with the increase of drought stress, indicating that drought stress reduced the photochemical quenching ability of leaves [44]. NPQ_L3 was non-photochemical quenching, which reflects the proportion of energy absorbed by PSII used to dissipate heat [45]. NPQ_L3 increased with the increase of drought stress, indicating that the natural pigment of leaves releases excessive VOLUME 10, 2022  heat through heat dissipation and reduces the excessive light energy absorbed by PSII, thus slowing down senescence. qL_L2, qL_Lss, and qL_D3 were photochemical quenching, which indicates the proportion of the energy absorbed by PSII for photochemical reactions [46]. Except for qL_Lss which increased slightly under mild drought conditions, the others decreased with the increase degree of drought stress, indicating that drought stress led to the decrease of the value of PSII natural pigment capturing light energy allocated to electron transfer rate. Through the analysis of the photosynthetic physiology of 5 public fluorescence parameters, it is further proved that the 5 public parameters obtained by this research are effective in identifying the drought stress state of tomato at seedling stage.
According to the 5 chlorophyll fluorescence parameters obtained in this study, 13 image features corresponding to each fluorescence parameter could be extracted, a total of 65 image features. Pearson correlation analysis was used to calculate the correlation coefficient between image features and drought stress state. Pearson correlation coefficient was greater than 0.6, indicating that there was a strong correlation between image features and drought stress state, which can represent the stress state. Among the 65 image features, 7 features had correlation coefficients greater than 0.6, which could well represent the drought stress status of tomato at seedling stage. Further, 7 image features were used to classify the degree of drought stress.
LDA, SVM and KNN algorithms were used to establish the recognition model of drought stress state of tomato at seedling stage. 5 common fluorescence parameters and 7 image features were used as input of the models, respectively. It could be concluded that the SVM algorithm had the highest model recognition accuracy for two different type inputs. The recognition accuracy based on image features is 3.4% higher than that based on parameters. The identification accuracy of the model based on the image features was 90% for suitable water and severe drought, 82% for mild drought and 87% for moderate drought, respectively. The reason may be that the water difference between mild drought and moderate drought had little impact on tomato plants.

VI. CONCLUSION
In this study, the chlorophyll fluorescence imaging technology was used to collect the chlorophyll fluorescence image of the plant canopy. The identification model of drought stress state was established by using chlorophyll fluorescence parameters and images, and the feasibility of using chlorophyll fluorescence imaging technology to complete the recognition of drought stress degree of tomato seedlings was demonstrated. The results showed that: (1) chlorophyll fluorescence parameters QY_L2, NPQ_L3, qL_L2, qL_Lss, qL_D3 were highly correlated with drought stress degree of tomato seedlings, and could be used to identify drought status of tomato seedlings. (2) The mean histogram of NPQ_L3, qL_Lss and qL_D3, the standard deviation of gaussian curve of qL_Lss histogram, the mean entropy of qL_L3, the mean moment of inertia of QY_L2 and the large gradient advantage of qL_Lss were obtained, which could represent the drought stress state of tomato seedlings. (3) Compared with LDA and KNN modeling methods, SVM method is more suitable for the recognition of drought stress degree of tomato seedlings, and its recognition rates of suitable water, mild drought, moderate drought and severe drought were 90%, 82%, 87% and 90%, respectively. Therefore, chlorophyll fluorescence imaging technology could be used as a powerful tool to identify the drought stress state of tomato seedlings, which is of great significance to the healthy growth of crops and automatic production management.