Monitoring the Change Process of Banana Freshness by GoogLeNet

Freshness is the most critical indicator for fruit quality, and directly impacts consumers’ physical health and their desire to buy. Also, it is an essential factor of the price in the market. Therefore, it is urgent to study the evaluation method of fruit freshness. Taking banana as an example, in this study, we analyzed the freshness changing process using transfer learning and established the relationship between freshness and storage dates. Features of banana images were automatically extracted using the GoogLeNet model, and then classified by the classifier module. The results show that the model can detect the freshness of banana and the accuracy is 98.92%, which is higher than the human detecting level. In order to study the robustness of the model, we also used this model to detect the changing process of strawberry and found that it is still useful. According to the above results, transfer learning is an accurate, non-destructive, and automated fruit freshness monitoring technique. It may be further applied to the field of vegetable detection.


I. INTRODUCTION
The banana is a giant monocotyledon perennial herb that grows in moist and sub-humid tropical areas at low and middle latitudes. It is considered the most important traded fruit globally in terms of volume [1]. Bananas are sold all over the world and are popular with people around the world. Because they are rich in vitamins, fiber, and phenolic compounds, and have many health benefits [2]. The demand for banana is enormous, and the fresh condition of the banana should attract sufficient attention. This is directly related to the health of the majority of consumers.
At present, many researchers have studied the classification and quality detection of fruits. Recently, [3] used computer vision and color characteristics to sort peeled pistachios. [4] through experiments that hyperspectral imaging can detect cherries pitting, which can be applied to the online sorting system. [5] proposed apple quality detection and variety identification method based on multi-spectral imaging technology. [6] used an artificial neural network to classify and identify fruit images. [7] used artificial neural network (ANN) technology to solve apple's automatic The associate editor coordinating the review of this manuscript and approving it for publication was Jenny Mahoney. visual classification problem. [8] proposed two new fruit classification methods based on machine learning using wavelet entropy, principal component analysis (PCA) and feed-forward neural network (FNN). [9] proposed a fuzzy classification method, which used the decision tree to classify the maturity of tomato and obtained a high accuracy rate. Most of these fruits are ranked based on images or other features using machine learning tools.
Recently, lots of studies were carried out on ripening and classification of banana. [10] applied clustering and classification to predict the maturity and shelf life of bananas. [11] designed an intelligent banana sorting system based on computer vision. [12] conducted a similar study, used image processing to distinguish immature bananas successfully, but failed to distinguish ripe and overripe fruits. [13] used optical properties to predict the quality attributes and maturity classification of bananas. [14] proposed a classification method of yellow banana based on biogeographic optimization and feed-forward neural network. In [15], a general machine learning method was proposed to successfully classify banana tiers based on color and shape features using a random forest classifier. Juliana et al. have surveyed the colorimetric index of banana ripening. [16] studied the application of laser-induced backscatter imaging in the prediction VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ and classification of the banana ripening stage. [17] used VGGNet network to classify different types of fruits on two different types of data sets. However, few researches focus on the post-harvest freshness of bananas. The yield and maturity of bananas are critical. Still, the research on the post-harvest freshness of bananas is also quite necessary, which is directly related to consumers' safety, and is also the research motivation of this experiment. In our experiments, we wanted to relate the quality of fruits and vegetables with how long they had been stored, and in particular their freshness. The storage time of bananas was used as the evaluation index to evaluate the quality of bananas. This experiment's results can give advice on banana storage, which is one of this study's purposes.
With the development of transfer learning, the combination of transfer learning and image recognition in agriculture has become a hot topic in recent years. A recent agricultural survey showed that transfer learning application provides better accuracy than traditional image processing and data analysis techniques [18]. Transfer learning reduces the workload of data collection. It can obtain a higher precision under the condition of small data quantity. ImageNet data set is an extensive training data set through which a model with high accuracy can be pre-trained [19]. A series of fine-tuning can improve the running time and precision of the model. So in our experiment, we used transfer learning to identify changes in banana storage. This paper is organized as follows; the whole process of our experiment and the methods were introduced in Section II. In Section III, the results of the experiments were presented. In Section IV, discussions were provided with their positive and negative aspects. Finally, we have concluded our study in Section V.

A. IMAGE DATASET
In this experiment, we selected 103 bananas of two varieties transported from the production area on the same day. 71 of them were from Goodfarmer, and 32 were from Chiquita. We selected the same banana at different shooting times as samples according to the shooting time, as shown in Figure 1.
Bananas were stored at room temperature for 11 days. We placed the bananas on A4 paper and took pictures with a handheld Canon DS126491 camera. The position of each banana was fixed. When collecting the photos of bananas, we took the pictures of them every other day following the order of their serial number. The bananas were placed randomly, just as stored in the real world. The shooting time was from 9:30 to 10:00 every morning. We recorded the temperature and humidity changes during the daily shooting to better reflect the experimental conditions, as shown in Figure 2.

B. TRANSFER LEARNING
Transfer learning is a new machine learning method that uses existing knowledge to solve different but related problems. It aims to solve learning with only a small amount of labeled sample data or even no labeled sample data in the target domain by transferring the existing knowledge [20]. To ensure the accuracy and reliability of the training model, traditional machine learning has two basic assumptions: (1) there must be enough available and clean training samples to learn a good classification model; (2) the training samples used for learning and the new test samples satisfy the independent and identically distributed. Transfer learning relaxes the two basic assumptions of traditional machine learning. In this study, we use GoogLeNet [21] architecture, which is a prominent CNN and brand deep learning structure proposed by Christian Szegedy. Before it, AlexNet [22], VGGNet [23], and other structures obtain better training effects by increasing the depth (layer number) of the network. But the increase of layer number will bring many adverse effects, such as overfitting, gradient disappearance, gradient explosion, and so on. Inception can improve training results from another perspective: it can utilize computing resources more efficiently and extract more features with the same amount of computation, thus improving training results.
GoogLeNet is an excellent model that is obtained through the ImageNet data training [21]. Although the pre-trained neural network cannot be directly used to identify banana, it can provide a good initial value. Good initial values are critical to network training. GoogLeNet architecture was selected because of its superior performance in identifying fruit and vegetable. The network structure of GoogLeNet is shown in Figure 4. We keep all layers before the last output layer and connect them to a new layer for the new classification problem. Transfer the previous layer to the new classification task by replacing them with a fully connected layer, a softmax layer, and a classification output layer. Specify options for the new fully connected layer is based on the new data. Set the full connected layer to the same size as the number of classes in the new data. Because our target class number is 6, so this value is set to 6. The weight learning rate factor and the bias learning rate factor value of the fully connected layer was increased to learn faster in the new layer than in the transfer layer. The changes will make the network applicable to our data set and will speed up the network training process.

C. CLASSIFIER BLOCK
In AlexNet and VGGNet, the fully connected layer is used as the classifier. However, the fully connected layer parameters account for almost 90% of all the network parameters, which is easy to cause over-fitting. Unlike the above two classical network models, GoogLeNet uses the global average pooling layer instead of the fully connected layer. In this way, GoogLeNet aims to reduce the number of network parameters and make the network faster and reduce the over-fitting. The final classifier module of the GoogLeNet network is shown in Figure 3.

D. EXPERIMENTAL SETUP
In this experiment, using the neural network toolbox provided by MATLAB 2018a, transfer learning is applied to CNN (GoogLeNet) trained. After fine-tuning, the training network parameters are determined as follows: Basic learning rate 0.0001; Power, 0.9; Mini-Batch Size, 10; Max Epochs, 10. All experiments were performed using the NVIDIA GEFORCE GTX950 graphics processing unit (GPU). The data set is pre-processed and imported into GoogLeNet for training. The network recognition accuracy and loss can be obtained after training for appropriate number of iterations. Then the experimental results are analyzed in detail. All images were resized before training to meet the GoogLeNet's input dimension requirement (224 × 224 × 3 pixels).
First of all, we took the pictures of banana samples with the camera every other day during the 11 days. Each banana was photographed one image at a time so that each banana could obtain six photos. We collected 618 pictures in total. Because the number of data samples was relatively small, we adopted eight means of data amplification, including rotation, translation, and mirroring to expand the number of data sets. After data amplification, the number of data sets became 4944.
After data amplification, the number of data sets is increased by eight times, which can meet the data set size requirement of transfer learning. Secondly, the amplified data set was randomly divided into a training set and validation set, with a ratio of 7:3. They were independent of each other, and there was no case of the data in the training set being used again in the validation set. After that, the segmented data set was imported into the GoogLeNet network that had been pre-trained (fine-tuning some parameters). The training was stopped after an appropriate number of iterations. We evaluated the performance of our model by accuracy and cross-entropy loss rate. Cross-entropy loss is used to calculate the distance between the predicted probability distribution and the probability distribution of the real value. The smaller the cross-entropy loss is, the closer the two probability distributions are, and the better the model fits. Besides, the model's superiority was verified by the experiments of banana freshness identification at different time intervals.
The performance of the classifier is presented by using the confusion matrix. Each column of the confounding matrix represents the predicted category. Each row represents the actual category to which the data belongs, and the diagonal data represents the number of correct predictions. True positive(TP) means that the prediction is positive, and the fact is positive. False positive(FP) implies that the prediction is positive, and the truth is negative. False negative(FN) means that the prediction is negative, and the fact is positive. True negative(TN) implies that the prediction is negative, and the truth is negative. Based on the confusion matrix, the performances of classifiers are measured by accuracy (AC), precision (PR), sensitivity (SE), and balanced F Score (F1).

III. THE PERFORMANCE OF MODEL A. FEATURE VISUALIZATION
As we all know, what shallow network extracts are textures and detail characteristics. As the number of layers of neural network deepens, more abstract features are generated. Details are lost in deeper neural networks, retaining only important information [24]. Feature visualization gives us some idea of what neural networks rely on for recognition. The feature maps extracted from different layers in the GoogLeNet are shown in Figure 5.
The Gradient-weighted Class Activation Mapping (Grad-CAM) method is used to display the useful areas of the given images to extract features for predicting the images' class [25]. In Figure 6, we visualized the last convolutional layer from Inception3a to Inception5b successively. As shown in Figure 6, low-level features such as the banana's color and shape are activated in the first few Inception modules. As the number of layers deepens, the image's high-level features are gradually activated, and features of particular interested to the network are strongly marked. Figure 6 showed the location of the major concerns in the network identification. The focus of the network was mainly at the beginning and the end of bananas.

B. EXPERIMENTAL RESULT
The curve of recognition accuracy and cross loss (Figure 7) is drawn to illustrate the performance of our model. The training process curve was processed by smoothing with a smoothing factor of 0.6, and the variation trend was easily observed. We also added some interfering images in the training set to avoid the over-fitting phenomenon and facilitate generalization of the classification model.
The model training process proceeded for 4320 iterations on recognizing the freshness of bananas, and the recognition accuracy was 98.92%. As can be seen from the confusion matrix (Table 1), most misidentified images was focused on the correct one before or after, without too much deviation. For example, we can see from the table that most bananas on day three are correctly identified as day three. Only a few are recognized as day one or day five, not as day nine or day eleven. Even if the recognition was wrong, there was no difference between the recognition result and the correct result.
Simultaneously, when the time interval was extended from two days to three days, the prediction accuracy of the model reached 99.39%. We measured accuracy three times at different intervals. The results were shown in Table 2. It can be seen from Table 2 that the recognition accuracy of the model increases with the expansion of the interval. Compared with the data set without data augmentation, the model's recognition accuracy after data augmentation is also improved. This shows that the data amplification is sufficient. ''Oneday means one day at a shooting interval, ''Two-day'' means two days at a shooting interval, and ''Three-day'' means three days at a shooting interval.
ROC curve reflected the relationship between false-positive rate (sensitivity) and true positive rate (specificity). According to the position of the curve, the whole graph is divided into two parts. The area under the curve is called the area under curve (AUC), which indicates prediction accuracy. The higher the AUC value is, the higher its accuracy is. The closer the curve is to the upper left corner, the higher the prediction accuracy and the model's performance are. From Figure 8, we can see that the six types of curves almost overlap. This indicates that our model has high recognition accuracy for detecting bananas with different freshness, and almost no misjudgment. The statistical analysis results of the ROC curve once again demonstrate our model's superior classification performance.

C. COMPARISON OF THE MODEL WITH HUMAN EXPERT
To evaluate the performance of our model, we design a contrast experiment of computer and human recognition. In the test, five people were asked to identify 120 images, and they   are trained before the recognition. At the beginning of the experiment, each participant was able to recognize pictures accurately. However, as time went on, they would become exhausted, and their judgment accuracy would decline to different degrees. The final result of the ''man-machine war'' is shown in Figure 9. The performance of our model is much better than human beings.

D. EXPANDING APPLICATION
To study the model's general adaptability, we also applied the model to the classification of fresh strawberries. The method we used is consistent with the method used to identify bananas. The environment of strawberry data collection is also the same as that of banana. A partial sample of the strawberry dataset was shown in Figure 10.
A total of 104 strawberries were collected, and 312 photos of strawberries were taken as a data set. After the same operation steps and 2100 iterations, the recognition accuracy of the model is 92.47%. The accuracy and loss curves in training are shown in Figure 11. The strawberry change is more challenging to recognize than the banana because the strawberry itself is red. When it goes bad, the color changes to a deep red, so the back and forth changes are less obvious. However, our model can still achieve high accuracy. The results show that the model has good adaptability and can be used for various fruit identification.

IV. DISCUSSION
We put forward the idea of relating the freshness of fruit with storage time. Based on this idea, we used transfer learning VOLUME 8, 2020   to learn banana features from the banana data set and then transferred the knowledge to recognizing fruit freshness. The results showed that there was a correlation between banana freshness and storage date. Bananas become less and less fresh over time, and the change is most noticeable in the medium term. The freshness of the banana can be judged by the number of days it has been stored. GoogLeNet can effectively identify the different types of fruits and vegetables because it has the inception and multi-dimensional convolution aggregation modules [26]. Because of them, the GoogLeNet was superior. In reality, people's evaluation of a particular fruit's freshness mainly depends on individ- ual subjective judgment. Our model can judge the freshness of fruit more accurately and provide decision support for people. The combination of human and computer can make assessment more accurate and useful.
Some people use electrochemical methods to analyze the freshness of fruits and vegetables and study the relationship between the freshness of fruits and vegetables and their age (storage time). This approach involves destructive detection. In the experiment, it is necessary to insert the probe into the fruit and vegetable for detection, which is still very difficult in the practicable engineering application. Some people use the chlorophyll fluorescence method to measure the freshness of leafy vegetables. The method mentioned above also involves destructive detection.
When analyzing the experimental data, it can be found that there are some differences in a fresh degree between different bananas. For example, some bananas go bad on the fifth day and some on the ninth day after picking. At the same time, we also found that the change of the two kinds of bananas was not consistent (the time of the Goodfarmer was shorter, while the Chiquita was longer), which posed significant challenge to our model. We will do further research on this issue in the future. Besides, our model broadens the time interval of recognition. For example, when we expand the recognition interval from one day to two days, the performance of the model will be improved. This indicated that the freshness trend of most bananas is the same.
Through the freshness detection experiment of banana and strawberry, it can be found that the change process of different fruits was different. Compared with strawberries, the change process of banana was relatively slow. The experimental results showed even if the peel of banana fruit becomes black, the pulp can also be used. Although the banana can still be eaten, it dramatically reduces the consumer's desire to buy. By testing the freshness of fruits, we can quickly determine the freshness of a batch of fruits, thus saving costs. Growers can also use this method to determine how quickly they can transport their crops to market or use packaging that slows down the ripening process. This enables consumers to maintain peak fruit values before purchasing, thereby achieving higher economic benefits. Therefore, it is significant to study the detection and evaluation of fruit freshness.
Our experiments prove the effectiveness of transfer learning method in detecting the freshness of fruits, which is better than people's subjective judgment. This fast and scalable method can be deployed in mobile devices to apply this technology to practice better and has a high practical guiding significance. With the widespread use of smartphones, we can develop a corresponding mobile phone app to get the fresh state of fruit at any time. This will be a very convenient process with low cost and simple operation.
In this study, we used data augmentation technology to the data set. Experimental results show that the accuracy after data amplification is improved, indicating that data amplification is effective and sufficient. But some researchers don't think data amplification should be used. Nowadays, many people use data amplification technology when the data set is insufficient. However, the data amplification technology only performs operations such as rotation and translation on the original data, which is not far from the original data. By referring to the relevant literature and known scope, there is no research to explain why the experimental results after data amplification is better than before or whether the data amplification is correct. This will be our next research direction.
One deficiency in our experiment is that the number of data sets is still too small. The experimental data is only 103 bananas, which is not sufficient compared to the large ImageNet data set. Therefore, we will expand the number and types of data sets to verify our experiment's performance. It is always challenging to identify the specific change (changes on the first day and the second day). We have done experiments and found that the model's accuracy decreases when the daily photos (without intervals) are taken as the data set. The accuracy of the model is about 70 percent. Besides, this experiment adopts the local fine-tuning model weight and deviation to train the model quickly. We believe that the global fine-tuning of the model weight and deviation method can further improve the model's performance, and the training time will increase accordingly. However, this method still takes less time than the traditional CNN model. The next step is to fine-tuning global variables to prove our guess.

V. CONCLUSION
In this paper, the GoogLeNet network was used to predict banana freshness. Through rotation and translation, the data are amplified to improve the model's generalization ability and avoid the over-fitting of the network. The experimental results showed that the model achieved good classification results in banana freshness classification, and the recognition accuracy reached 98.92%. Our model can identify the freshness of bananas very well, with high accuracy and good applicability at different time intervals. In order to evaluate the universality of the model, we used the fresh strawberry for training and validating, and the accuracy rate was 92.47%.
In the future, we will apply our model to the recognition of more fruit, and improve the network's general usability.
JIANGONG NI is currently pursuing the master's degree with the School of Science and Information Science, Qingdao Agricultural University. His current research interests include deep learning, artificial intelligence, and image processing.
JIYUE GAO is currently pursuing the master's degree with the School of Science and Information Science, Qingdao Agricultural University. Her current research interests include deep learning, aflatoxin detection, and image processing.