Classification of Germination Images of Pear Pollen Using Random Forest and Convolution Neural Network Models

Verifying pollen germination using microscopic images is a difficult task. It is usually time-consuming and may entail reduced accuracy and reproducibility. Therefore, in this study, we used random forest (RF) and convolutional neural network (CNN) models to perform image classification on raw data corresponding to pollens with different germination rates; the data were obtained via flow cytometry. A heat map, which was based on the RF analysis results, showed that the variables that significantly influenced the classification decision between NG and 60G categories were mainly located in the center and top-right regions of the $30\times30$ pixel image. Additionally, a variable importance plot showed that among the 900 input variables, pixel_316 was the variable that contributed the most toward prediction. Gradient-weighted class activation mapping was used to visualize the class activation maps of the CNN model. The bottom-left region of the activation map was activated in the NG image. However, the 60G image showed that not only the bottom-left region but also the top-right region was activated. Both the models classified the input images into NG and 60G categories with high accuracy. However, considering that the RF model does not reflect the characteristics of adjacent variables, the CNN model is more appropriate for classifying pollen germination images corresponding to pollen with various germination rates into distinct classes. Taken together, these results suggest that the CNN model can provide a reliable method for verifying the pollen performance.


I. INTRODUCTION
Most pear cultivars require insect-mediated pollination, but insect populations are in rapid decline caused by recent environmental changes such as habitat loss, environmental pollution and climate change [1]- [5]. Thus, the artificial pollination has been compelled to offset the decline in insect pollinators and ensure satisfactory crop yields in many commercial orchards [6], [7]. It is necessary to find an easy and reliable method to assess viability of stored pollen before applying it to artificial pollination, as pollen viability, which The associate editor coordinating the review of this manuscript and approving it for publication was Ioannis Schizas . affects pollination efficiency, may be reduced during storage period.
Numerous methods have been developed to assess pollen viability, of which in vitro germination is the most commonly used method [8]. This method is easy and simple, but it is necessary to determine the optimal conditions for pollen germination. Furthermore, there is a limitation in that the germination rate in vitro is determined using only the pollens identified from microscopic image, not all pollen used in the assay [9]. In this method, the germination rate is calculated as the proportion of pollen grains germinated to the total number of pollen grains existed in microscopic image. VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ In a previous study [9], we proposed a method of evaluating pollen viability through flow cytometry in order to reduce the drawbacks of conventional method (i.e., in vitro germination) in respect of reliability, subjectivity of counting, and analysis time, and to improve the accuracy of measurement. The presence or absence of the pollen tubes (i.e., pollen has germinated or not) can be determined by flow cytometry, which can distinguish individual pollen according to their size (FSC parameter) or internal complexity (SSC parameter). This became evident through our previous result, which confirmed the difference in density distribution on the dot plot between pollen samples with different viability. In addition, since 50,000 pollen grains are used per flow cytometric analysis, reproducibility and accuracy can be ensured, unlike conventional method of measuring germination rate from 200-300 pollen grains. Coordinate values of the FSC (x-axis) and SSC (y-axis) parameters derived from 50,000 pollen grains following flow cytometric analysis are transformed into a germination image, which is suitable for fitting a neural network classification, through data refining process. It may therefore provide a new perspective to establish a reliable method of verifying pollen performance, as germination images of pear pollen, which can vary depending on the germination rate.
Unlike structured data, the image data are highdimensional. So, the problem may occur with the number of parameter being increased in conventional machine-learning model, when dealing with the image data. For that reason, shallow neural networks using fully connected layers are not suitable for classifying the image data.
The most used model in machine learning is random forests (RF), which use a random subspace method based on decision trees [10], [11]; this concept entailed a method of building a forest of uncorrelated trees using a CART-like procedure, combined with randomized node optimization and bagging. RFs offer many advantages, namely, satisfactory predictive performance, robustness to both noise and overfitting, and extraction of information regarding the variables that are important for the classification [11]- [13].
Generally, in image data, adjacent pixels are spatially correlated [14], and loss of information may occur during vectorization because of the spatial characteristics of pixels. However, conventional machine-learning techniques do not reflect the spatial characteristics of the image and rather use each pixel as a variable, which in turn cause increased the time and number of parameters required for learning [15]. Therefore, LeCun et al. [16] proposed a convolutional neural network (CNN) that reflected the spatial information of the image and overcame the limitations of the existing methods. Unlike other fully connected neural network models, CNNs can effectively recognize the relationship between pixels while maintaining the spatial information of the input image. In the ImageNet Large Scale Visual Recognition Challenge, Alexnet [17], which used a CNN-based architecture, was ranked the first in 2012, showing exemplary performance compared with conventional computer-vision techniques. This effectiveness is achieved by using a convolution layer with multiple filters and intensifying the extracted features after convolution layer stacked using a pooling layer [17]. The filter parameters used for feature extraction are shared for all the input data; thus, fewer parameters are required and the learning time is shortened, as compared with the number of parameters and learning time of a regular neural network [18]. In a CNN model, the convolutional layer and pooling layer used to extract features are freely configurable. Finally, the extracted features are fed to a fully connected layer, following which the input data are classified.
Therefore, one can use machine-learning techniques such as RF and CNN that perform excellent image recognition and classification, to develop a novel validation system that can discriminate the pollen performance. This approach can be achieved by training both the models with germination images, which extracted from pollens with different germination rate, and then performing classification on germination images into distinct classes according to germination rates.
Over the past few years, several studies using machinelearning algorithms have been conducted to monitoring of the airborne pollen or to develop automatic classification system of pollen grains [19]- [21]. There are also many studies that have utilized CNN models to classify various pollen species [19]- [24]. However, our research is focused on establishing a reliable method for assessing pollen viability using germination images rather than identifying the pollen species covered in the palynology fields.
The purpose of this study is to determine whether the input images are classified into two categories of NG and 60G by RF and CNN models that have learned germination images extracted from two pollen groups with different germination rates. To achieve the goal, 1) the raw input data were transformed into matrix and vector formats for machine learning. 2) The transformed data was trained in RF and CNN models. 3) Classification was performed on the input images into two class labels through the classifier of both models trained. 4) The extracted features were visualized to ensure which part(s) of the pollen germination image were considered in the final classification decision in each model. Thus, the results obtained here may provide a framework for establishing a reliable method to assess pollen performance from germination images. Classification on pollen germination images into multiple labels according to germination rates can be achieved through a classifier that has learned features reflecting the difference in germination images.

A. POLLEN COLLECTION
Unopened, balloon-stage flowers were collected in April 2019 (5 days before full bloom) from the pear trees of the ''Wonwhang'' cultivar, which is grown in commercial orchards (Gongsan, Naju, Korea). The anthers from the pear flowers that were not completely open were harvested. Subsequently, to dry and release their pollen within 24 to 36 h, we laid out the anthers on a black kentpaper in the anther dehiscence room at 20 • C with the relative humidity of 50%. When more than 80% of the anthers were dehiscent, they were transferred from the kentpaper to a stainless-steel bowl equipped with a 100-mesh sieve (aperture: 0.149 mm). Acetone was then poured into the bowl, and the pollens were gently sifted through the 100-mesh sieve. Subsequently, the acetone supernatant was carefully discarded from the bowl, and the residual solvent was volatilized. The pollen samples, each weighing 10 g, were placed in tightly sealed containers and frozen at -60 • C, until further use.

B. FLOW CYTOMETRY
Unless otherwise stated, an assay was conducted by removing the pollen container from a deep-freezer and thawing the pollen overnight at 4 • C in a desiccator. Subsequently, pollen at a concentration of 2.5 mg·mL-1 was suspended in a liquid medium that contained 10% sucrose (w/v), 0.4 mM boric acid, and 1 mM calcium nitrate, and the mixture was then incubated at 25 • C for 3 h. Data were acquired by analyzing 1 mL of the pollen culture per sample every hour by using the Accuri C6 flow cytometer (Becton Dickinson, USA), according to the instructions provided by the manufacturer. The Accuri C6 flow cytometer was generously provided for this study by Damyang Agricultural Technology Center. The germinated pollens were plotted using the FSC and SSC parameters and visualized as dots on a dot plot by analyzing 50,000 pollen grains per hour. The CSV data of the germinated pollen obtained here were used as raw data for performing machine-learning analysis; they contained the coordinate values of the FSC and SSC parameters for 50,000 pollen grains.

C. DATA REFINING
To reduce the range of data, the raw data were log-transformed prior to applying them to the classification model. A grid was established to convert the CSV data, which contained the coordinates of the FSC and SSC parameters, into matrix form. Notably, the spatial features of an image are reflected increasingly well with an increase in the grid size. However, larger grid sizes often result in increased processing time. Thus, we loaded the CSV data within grids with a size of 30 × 30 pixels. This is because the grid size of 30 × 30 is sufficient to represent the density distribution on the dot plots reflecting whether the pollen has germinated. The number of pollen grains that belong to the set grid range was computed and then divided by the maximum value in the matrix to normalize all the pixel values to lie between 0 and 1 before using the pixel values as the value of the corresponding row and column. The matrix-form data were either used to train the CNN model to extract the spatial features of the image, or applied to the RF model after converting the data into vector form to identify the influence of pixels on the classification decision. The process of image transformations required for machine learning is illustrated in Fig. 1.
The entire data set was randomly divided into training and testing sets in the ratio of 70:30. The images of the pollen samples analyzed on the dot plot immediately after suspension in the germination medium were used as image data of non-germinated pollen (NG). Additionally, the images of the samples analyzed after incubation for 1 h were used as image data of pollen that had germinated to the extent of over 60% (60G). The average germination rate of the pollen samples used in this study was calculated as approximately 60% after the incubation for 1 h (data not shown). For the training set, 168 images for 60G and 104 for NG were used. For the testing set, 72 images for 60G and 45 for NG were used.

III. RESULTS AND DISCUSSION
To fit the RF model, the input images were vectorized with the size of 30 × 30 pixels and the model was fitted using 272 images and 900 pixel variables. The optimal parameters were obtained via hyperparameter tuning with three-fold cross validation. The final parameters used in the RF model are as follows: bootstrap = True; n-estimators (i.e., the number of trees used) is 1,000; the maximum number of leaf nodes is 64; criterion is entropy. The heat map in Fig. 2a indicates the relative importance of individual pixels that affect the classification decision. The variables that significantly affect the classification between NG and 60G were mostly located in the center and top-right regions of the images.  A variable importance plot (VIP) shows the top 10 pixels that contributed the most to the prediction among the input variables applied to the RF model (see Fig. 2b). From the plot, pixel_316 seems to be the most significant variable, followed by pixel_345, pixel_317, pixel_344, and so on. These results show the information that the RF model has learned to distinguish NG and 60G, and they may provide additional insight into the major contributing pixels that indicate whether the pollen has germinated.
The CNN architecture used and its training process are illustrated in Fig. 3. It comprises convolution layers to extract features and a global average pooling (GAP) layer to classify the input data into two distinct classes, i.e., NG and 60G. The input data were transformed into the output after stacking several convolution and pooling layers. For CNN analysis, 272 images (i.e., 168 images for 60G and 104 for NG), which were transformed into matrix form, were processed through 3 convolution layers. The features maps, which ranged from low to high level, were extracted upon passing the input through each convolutional layer. Generally, high-level features extracted from the last convolution layer were used for the classification task after transferring them to the fully connected layer. The feature maps generated from each convolution layer were represented in an increasingly abstract manner as they passed through each layer. Therefore, it is typically difficult to know which part(s) of the input image influenced the final classification decision.
Additionally, it remains difficult to explain the predictions of the CNN owing to the lack of interpretability. Therefore, the visualization of the activated features is necessary to verify that the resulting classifier makes decisions on the basis of appropriate features present in the training data, although the CNN makes accurate predictions. Gradient-weighted class activation mapping (Grad-CAM) was used to visualize the effect of each input image on the classification decision [25]. For feature visualization, the gradient and output information of the last convolution layer was used. The heat map is a class activation map. It indicates the importance of each region of the image with regard to the classification decision by highlighting the image regions responsible for a particular prediction, as illustrated in Fig. 4. The bottom-left region of the activation map was activated in the image that corresponded to NG. However, in the image that corresponded to 60G, not only the bottom-left region but also the top-right region was activated. This helps understand the manner in which the network produced the output more intuitively because class prediction mainly depends on the activation of the last convolution layer.
For both the models used in this study, almost all of the pollen germination images in the testing set were classified into NG and 60G with high accuracy by using the above-mentioned parameters and classification rules. The RF model yielded a classification accuracy of 100.0% for 272 training images, and the CNN model obtained a classification accuracy of 99.6% owing to one misclassification. For 117 testing images, the RF model achieved the classification accuracy of 99.1% except for one misclassification, whereas the CNN model exhibited the classification accuracy of 100.0% for all the samples (Table 1, 2). This indicates that both the models could appropriately classify the input data into distinct classes. Additionally, both the models may be used to learn multiple categories to classify the pollen germination images with various germination rates in a future study. By visualizing the activated features, one overcomes the limitation of CNNs that they function as black-box models whose results are difficult to interpret [26]. Additionally, one can know what the model has learned to classify an image. The RF model performs learning by converting input variables into the vector form. The RF analysis revealed that the RF model classified the input data into NG and 60G categories with high accuracy, although the characteristics of adjacent pixel variables could not be reflected. In data processing using machine learning, the image data are susceptible to distortions such as rotation or movement. Additionally, when the number of samples or the number of classification categories increases, classification errors may occur in the RF model where uses pixels as a parameter, owing to the lack of spatial information. However, important CNN concepts, including sparse connectivity, parameter sharing, subsampling, and local receptive field, ensure that the CNN is not affected by the movement, scaling, and distortion of the input data [16], [17], [27].    The VIP provides the list of the most important variables in descending order with decrease in average entropy. The importance of variables is assessed on the basis of the part that represents the difference on each variable throughout the entire data set. The center and top-right regions of the pixel images, which show the difference between NG and 60G, significantly affected the classification (see Fig. 2a). This shows that the decision rules applied entirely to the input data, indicating that the algorithm used in the model was globally interpretable. Notably, global interpretability helps understand the entire relationship between the input and output; however, it may be approximate or based on average values [28]. Meanwhile, the result of Grad-CAM (see Fig. 4), which shows the features activated in each sample, has a meaning of local interpretation unlike VIP has shown.
The local interpretation focuses on the details in each image that might be overlooked in the global interpretation [29]. Thus, compared with the RF model, the CNN model which extracts features from each image and uses them for classification is considered more appropriate for classifying the pollen germination images with various germination rates into distinct classes.
Robustness is important because image data has sensitive characteristics even with small distortion. To compare the robustness, classification task was performed with two models following the germination image corresponding 60G was shifted up, down, and right by 3 and 6 spaces, respectively. The CNN model showed robust classification performance even in distorted images, whereas misclassification occurred in RF into NG in case of D6, U3, and U6 (Table 3). Therefore if there is distortion in input image, the performance of RF is not suitable for classification on pollen germination images because it does not reflect spatial information.  Classification of pollen germination images using CNN model revealed to be suitable for establishing a reliable method to assess pollen performance. In this study, image classification was performed on raw data with difference in pollen germination rates by using both RF and CNN models. The results of VIP and Grad-CAM showed which region of the image affected the final classification decision in each model. Both RF and CNN models showed high classification accuracy in training and testing data. While the RF shows the general relationship between the input and the output learned by a model, the CNN focuses on how individual predictions are made by a model. Considering the spatial characteristics of the image, the CNN model is therefore more suitable for classifying pollen germination images with multiple categories into distinct classes in future study. However, for a detailed prediction of the germination rate, further studies are required to develop a classification and prediction model that reliably validates the pollen performance. He is currently serving as the Chairman for the Big Data Center, Chonnam National University, since 2015. His research interests include pattern recognition, wavelet, artificial intelligence, and multivariate statistics and analysis of big data according to climate changes. VOLUME 9, 2021