IHDS: Intelligent Harvesting Decision System for Date Fruit Based on Maturity Stage Using Deep Learning and Computer Vision

Date is the main fruit crop of the Kingdom of Saudi Arabia (KSA), approximately covering 72% of the total area under permanent crops. The Food and Agriculture Organization states that date production worldwide was 3,430,883 tons in 1990, which increases yearly, reaching 8,526,218 tons in 2018. Date production in KSA was around 527,881 tons in 1990, approximately reaching 1,302,859 tons in 2018. Harvesting date fruits at an appropriate time according to a specific maturity stage or level is a critical decision that significantly affects profit. In the present study, we proposed an intelligent harvesting decision system (IHDS) based on date fruit maturity level. The proposed decision system used computer vision and deep learning (DL) techniques to detect seven different maturity stages/levels of date fruit (Immature stage 1, Immature stage 2, Pre-Khalal, Khalal, Khalal with Rutab, Pre-Tamar, and Tamar). In the IHDS, we developed six different DL systems, and each one produced different accuracy levels in terms of the seven aforementioned maturity stages. The IHDS used datasets that have been collected by the Center of Smart Robotics Research. The maximum performance metrics of the proposed IHDS were 99.4%, 99.4%, 99.7%, and 99.7% for accuracy, F1 score, sensitivity (recall), and precision, respectively.


I. INTRODUCTION
According to the Ministry of Agriculture in Saudi Arabia, an estimated 24-25 million palm trees approximately produce a million tons of dates yearly, accounting for an estimated 15% of the global date production [1], [2]. The estimated average annual yield of dates per palm tree in Saudi Arabia is 48.0 kg, with a selling price estimated at SR 4.00/kg. Several Saudi farmers are suffering from lack of skilled labor; hence, around 23.00% of the farmers sell their produce from the farm itself to foreign labor for a cheap price [1]. According to the Food and Agriculture Organization of the United Nations, global date production is annually increasing, as shown in  Date production in Saudi Arabia was around 527,881 tons in 1990, approximately reaching 1,302,859 tons in 2018. However, despite the increase in cultivated areas, VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ productivity per hectare has declined in recent years. This may be due to the lack of skilled labor. Saudi Arabia is the second largest date-producing country in 2018 and the third in 1990, with a cultivated area of about 1,116,125 hectares in 2018.  According to these statistical information, the palm productivity of Saudi Arabia is relatively low based on the number of palms. This may be attributed to several reasons, including the inability to estimate the weight of date per palm and maturity level before harvesting when crop ''dates'' are in the trees; therefore, the farmer sells the crop in the palm trees without knowing the weight and degree of maturation; weak pre-harvesting maintenance; and the lack of skilled laborers.

A. DATE HARVESTING
Date harvesting involves several tasks before, during, and after harvesting for better yield and tree maintenance. Furthermore, we are going to give a brief description of these tasks.

B. PRE-HARVESTING TASKS
In this stage, many caring pre-harvesting tasks are performed, including dethroning, thinning the palm date tree, aligning the bunch, bunch attaching, removing dust, exterminating date spiders, bagging, and estimating the weight and yield. Pre-harvesting tasks are done ensure the quality of the date fruit, making the fruits ready for the next stage, which is the harvesting stage.

C. HARVESTING DATE PALM BUNCHES
There are different types of harvesting; either picking the date fruits one by one or by shaking the bunch, where most of the dates will fall down, or by cutting down the bunch at a certain time. In this proposal, we will focus on date palm trees requiring full bunch cutting.

D. POST-HARVESTING
Post-harvesting consists of many operations that happen after the dates are removed. In this step, the palm trees do not contain date fruits anymore. The remaining brown dead leaves are cut using a circular saw at a very precise angle (avoiding sharp cuttings) for the safety of the manual workers. In KSA, the traditional way is to avoid cutting many leaves (average of six leaves per tree) to avoid the vertical growth of the palm trees, keeping them as short as possible to make the next harvest easy for the manual workers.
Various cleaning operations can be automated, such as brown-leaf cutting and trunk cleaning. These operations require less effort and precision than those in the harvesting process. To solve the inability to estimate the maturity level of dates per palm before harvesting, we have developed a smart system using DL techniques to predict the maturity level of the dates before harvesting. Furthermore, we proposed an intelligent harvesting decision system (IHDS) based on the maturity level detection of date fruits. The proposed decision system uses computer vision and DL techniques to detect seven different maturity stages/levels of date fruit (Immature stage 1, Immature stage 2, Pre-Khalal, Khalal, Khalal with Rutab, Pre-Tamar, and Tamar) before harvesting. This paper is organized as follows: in Section 2, literature review is presented. The methodology and dataset are explained in Section 3. The proposed system is presented in Section 4, and Section 5 explains the training and testing parameters. The experimental results are illustrated in Section 6, Section 7 compares the proposed system with other systems, and the conclusion is given in Section 8.

II. LITERATURE REVIEW
Many studies have been conducted to classify fruit maturity levels using image processing technologies. In 2014, Zhang et al. [4] used a color-grading method to determine the quality and the maturity of date fruits. It used 2-D histograms with a color-grading category to define the co-occurrence frequency. In 2015, Gokul et al. [5] used image processing to estimate the maturity of sweet lime. They classified maturity through RGB color coding based on the RG ratio. In 2013, Prabha, D. Surya; and Kumar, J. Satheesh introduced a maturity classification system for banana fruit using image processing technique in terms of the color and size value of their images [6]. They classified the maturity of banana into three different stages, namely, under-mature, mature, and over-mature. The mean color intensity from the histogram, area, perimeter, and major and minor axis lengths from the size values were extracted from calibration images to classify the maturity stage. However, most of these techniques use thresholds for features, such as color, shape, and size. In 2014, Yamamoto et al. [7] used machine learning (ML) approaches to detect tomato fruit maturity stages without adjusting the threshold values for fruit. They proposed a method containing three steps: a pixel-based segmentation, blob-based segmentation, and X-means clustering. They achieved precision levels of 1.00 and 0.80 for mature and immature fruits, respectively.
Other several studies used robotics technology and machine vision in agricultural applications and called it harvesting robots. These harvesting robots can be used for fruit picking [8] and for detecting of fruit-bearing branches [9]. Another study [10] developed a detection algorithm based on color, depth, and shape information. Chen et al. [11] introduced a multi-camera scheme for agricultural application to increase the perception range of vision systems.
Several studies have been done to classify date fruits. Nasiri et al. [12] used computer vision and machine ML techniques to classify three maturity stages (Khalal, Rutab, and Tamar) and one defective stage. The dataset was built using single dates with a uniform background. This study used the VGG-16 architecture model with max pooling, dropout, batch normalization, and dense layers. They collected the dataset through a smartphone, and their system achieved an overall accuracy of 96.98%. Another study has been done by Altaheri et al. [13], who proposed a framework using a vision system to classify date fruits in an orchard environment. They used the proposed framework to classify date fruit images based on type and maturity. This study used the VGG-16 and Alexnet architecture models, and achieved accuracy levels of 99.01% for type classification and 97.25% for a five-level maturity classification system. Several other studies have been done to classify fruits other than dates. In 2020, Behera et al. [14] introduced two methods based on ML techniques to classify papaya fruit maturity stages. They used a very small dataset with 300 papaya fruit images, consisting of 100 images of each of the three maturity stages. They used seven pretrained architectures: VGG-19, VGG-16, ResNet101, ResNet50, ResNet18, AlexNet, and GoogleNet. Another study has been done in 2019 [15] by Pacheco, W. D. N. and F. R. J. López to classify the maturity of Milano and Chonto varieties of tomatoes using ML techniques. In 2020, Caladcad, J. A., S. Cabahug, et al. introduced a system to classify the maturity of Philippine coconut using ML techniques [16]. They classified the Philippine coconut into three different maturity levels (pre-mature, mature, and over-mature) using random forest and support vector machine (SVM) classification systems. Another study has been done in 2020 by de Luna et al. [17] to monitor the growth stage of tomatoes using SVM, ANN, and KNN, which achieved maximum accuracy levels of 99.81% for SVM, 99.32% for KNN, and 99.32% for ANN. Another research using MLK was introduced in 2020 by Chen et al. [11] to classify the maturity levels of sweet red and yellow peppers. They achieved 98.2% and 97.3% accuracy levels for red and yellow pepper maturity classification, respectively, for two maturity stages; and 89.5% and 97.3% for red and yellow pepper maturity classification, respectively, for four maturity stages.

III. METHODOLOGY
In general, DL works better with huge datasets than with smaller ones. For applications with a small dataset, the transfer learning concept is used to enhance the efficiency and outcomes of the system.
In the proposed IHDS system, we started by building the dataset named ''DATE FRUIT DATASET FOR AUTOMATED HARVESTING AND VISUAL YIELD ESTIMATION'' [18]. Then, we used this dataset to train and evaluate the proposed IHDS system that used three types of CCN: VGG-19 [19], Inception-v3 [20], and NASNet [21]. The IHDS takes live videos from video sources, extracts and manipulates the images, and then the manipulated images are entered into the maturity level detection system (MLDS) to identify the date fruit maturity level (Immature stage 1, Immature stage 2, Pre-Khalal, Khalal, Khalal with Rutab, Pre-Tamar, and Tamar.

Selected CNN Architecture
In this work, instead of using traditional image processing techniques, we used the CNNs to detect the maturity stages/levels of date fruit from the images because of their high-accuracy. To save time, obtain better accuracy, detect high-level features; such as edges and patterns, we used pretrained CNN models instead of using an ad hoc, and then we added four more layers to the pretrained CNN models as illustrated in the succeeding part of this section. In the proposed system, we will use three models, namely, VGG-19 [19], Inception-v3 [20], and NASNet [21]. The VGG model was developed with minimum pre-processing graphic patterns from pixel images. The ImageNet project has been configured for applications in visual object detection research. The VGG network is characterized by its simplicity, using only 3 × 3 convolutional layers stacked on top of each other in increasing depths. Volume size reduction is handled by max pooling. Two fully connected layers, each with 4,096 nodes, are then followed by a Softmax classifier. In the proposed system, we froze all layers from 1 to 15 of the VGG-19 architecture. Then, we added five more layers (Global average pooling, Dropout (0.3), Dense (128), Dense (64), and Softmax (2/3/4/5/6/7 classes)) before the last layer. At the end, the VGG-19 architecture has total 20,098,759 parameters, 7,153,799 trainable parameters, and 12,944,960 non-trainable parameters for the seven-stage MLDS (TABLE 2).
In the beginning, the Inception CNN architecture was introduced as GoogleNet and called Inception-v1. Then, Ioffe and Szegedy enhanced the Inception architecture by introducing batch normalization and called it Inception-v2 [22]. Later Szegedy, C., et al. (2015) enhanced the Inception-v2 CNN by adding factorization and then called it Inception-v3. [20].The main idea of the Inception architecture was to find the optimal local construction of the convolutional network and spatially repeat it [20]. In general, Inception was introduced based on the idea that several connections between layers are   ineffective and have redundant information due to the correlation between them. Therefore, the Inception architecture used 22 layers in a parallel manner (Figure 3), which benefited from the several auxiliary classifiers within the intermediate layers, thereby improving the discrimination capacity in the lower layers [23]. For Inception-v3, we added five more layers (Global average pooling, Dense (1,024), Batch normalization, Dense (1,024), and Softmax (2/3/4/5/6/7 classes)) before the last layer. In the end, the Inception-v3 architecture had a total 23,916,327 parameters, 23,877,799 trainable parameters, and 38,528 non-trainable parameters for the seven-stage MLDS. NASNet is a google DL model introduced in May 2017. It produces a small network architecture. Google introduced NASNet mainly for image classification applications. For NASNet, we added five more layers (Global average pooling, Dense (1,024), Batch normalization, dense (1,024), and Softmax (2/3/4/5/6/7 classes) before the last layer. In the end, the NASNet architecture had a total of 6,417,051 parameters, 6,376,217 trainable parameters, and 40,834 non-trainable parameters for the seven-stage MDLS.

Dataset
We use a dataset named ''DATE FRUIT DATASET FOR AUTOMATED HARVESTING AND VISUAL YIELD ESTIMATION'' [18] that was built by the Center of Smart Robotics Research (www.CS2R.ksu.edu.sa). The date fruit dataset was introduced for use in the pre-harvesting and harvesting stages. The date fruit dataset consists of two different datasets, namely, Dataset-1 and Dataset-2. Dataset-1 contains about 8,079 pictures captured from 350 bunches that belong to 29 palms using two Canon cameras (EOS-1100D and EOS-600D), with resolutions of 4,272 × 2,848 and 5,184 × 3456, respectively. The images were taken under different natural daylight conditions: in the morning (9:00-11:00) or afternoon (3:00-5:00). Dataset-1 covers all the maturity levels of date fruits: Immature stage 1, Immature stage 2, Pre-Khalal, Khalal, Kahalal with Rutab, Pre-Tamar, and Tamar ( Figure 5 and Figure 6). Dataset-1 was labeled according to type and maturity. Dataset-1 and its annotation files are available in [https:// ieee-dataport.org/open-access/date-fruit-dataset-automatedharvesting-and-visual-yield-estimation]. Dataset-2 was built for weight estimation, which consists of 152 date bunches of 13 palms. These bunches were weighed after harvesting, and their images were captured with a white background.

A. PROPOSED SYSTEM
In this paper, we are proposing an IHDS based on maturity level detection of date fruits. As shown in Figure 7, the IHDS takes live videos from video sources (unmanned aerial vehicles or any other source), then extracts the image from the live video stream. After that, image manipulation is performed on the extracted images. Then, the manipulated images are entered into the MLDS that identifies the date fruit maturity level (Immature stage 1, Immature stage 2,  Pre-Khalal, Khalal, Khalal with Rutab, Pre-Tamar, and Tamar) as shown in Figure 8.

B. THE MATURITY LEVEL DETECTION SYSTEM (MLDS)
The MLDS was designed to detect seven different maturity types or levels of date fruits (Figure 8) (Immature stage 1, Immature stage 2, Pre-Khalal, Khalal, Khalal with Rutab, Pre-Tamar, and Tamar) based on DL techniques. In MLDS, we developed six different DL systems with different accuracy levels, as follows: a two-stage maturity detection system to determine two maturity stages (Immature and Tamar); a three-stage maturity detection system to determine three maturity stages (Immature, Khalal, and Tamar); a four-stage maturity detection system to determine four maturity stages (Immature, Khalal, Khalal with Rutab, and Tamar); a five-stage maturity detection system to determine five maturity stages (Immature, Khalal, Khalal with Rutab, Pre-Tamar, and Tamar); a six-stage maturity detection system to determine six maturity stages (Immature, Pre-Khalal, Khalal, Khalal with Rutab, Pre-Tamar, and Tamar); and a seven-stage maturity detection system to determine seven maturity stages (Immature stage 1, Immature stage 2, Pre-Khalal, Khalal, Khalal with Rutab, Pre-Tamar, and Tamar). In IHDS, we used a seven-stage MLDS to determine seven maturity stages. All maturity level systems used an endto-end DL framework in detecting the date fruit maturity level from the gathered images. We have developed an ML system VOLUME 8, 2020  that explicitly detects date fruit maturity level from raw images without requiring feature extraction. As illustrated in Figure 8, we started by collecting dataset images (thousands of date fruit images) in different maturity levels (Immature stage 1, Immature stage 2, Pre-Khalal, Khalal, Khalal with Rutab, Pre-Tamar, and Tamar). Then, we augmented the images by resizing them based on the standard size of their respective CNN models. After that, we divided the dataset into a training dataset and a testing dataset, and then applied the retrained CNN models (VGG-19, Inception-V3, and NASNet) to determine date fruit maturity levels.

C. TRAINING AND TESTING PARAMETERS
In the proposed MLDS, two well-known pretrained deep learning CNNs (NASNet, Inception-V3, and VGG-19) were trained, evaluated, and tested using the KERAS framework to detect the date fruit maturity level from the gathered images. The training of different models was conducted on a computer using the Inteli9-9880H core @ 2.3 GHz Processor and 32 GB RAM, 8 GB Graphics Unit Processing Unit Graphics Card on 64-bit Windows 10. In the present study, we used the ImageDataGenerator for augmentation with the following parameters: rotation range = 40, width shift range = 0.2, height shift range = 0.2, shear range = 0.2, and zoom range = 0.2. Also, we resized all images (224 × 224) to fulfill the requirement of the pretrained models. We used Anaconda 4.8.3 environment, Spyder 3.7 development environment, and Keras 2.2.4 with a Tensorflow 2.1.0 backend. We used the following training parameters: batch size = 16, number of epochs = 30, and ADAM optimizer with learning rate = 0.0001. For training and testing, we used a five-fold cross-validation method. We also benefited from the python implementation that was done by Talha Anwar [24].

IV. RESULTS
The evaluation of the proposed IHDS is based on Dataset-1 (https://ieee-dataport.org/open-access/date-fruit-datasetautomated-harvesting-and-visual-yield-estimation). For each MLDS, we tested the VGG-19, Inception-V3, and NASNet models for the two-stage, three-stage, four-stage, five-stage, six-stage, and seven-stage maturity detection systems. Well-known performance metrics TABLE 4, such as F1 score, accuracy, recall, precision, and confusion matrix, were used to evaluate the models and were compared with other obtained results.
The VGG-19, Inception-V3, and NASNet architecture models were trained using Database-1. For the two-stage maturity detection, we used 1,302 images, with 661 images    We performed a five-fold cross-validation with 50 epochs for each process for all maturity level detection systems for all VGG-19, Inception-v3, and NASNet models, and took the overall average of all the results. Figure 9 illustrates the learning performance accuracy of VGG-19 in a single-fold cross-validation, with 50 epochs of all stages of the maturity level detection systems.
As shown in Figure 9, the VGG-19 model has a good fit and stable performance. The training and validation loss decreased to a point of stability with a minimal gap between two final loss values. Figure 10 shows the confusion matrix for VGG-19, for one random fold for all maturity stage detection systems.

V. DISCUSSION
In this section, we will compare the proposed system with many reference studies using the same dataset (Dataset-1), as well as other datasets. The comparison will be based on well-known performance metrics (F1 score, accuracy, sensitivity (recall), and precision). Our study and a reference study by Altaheri et al. [13] used the same datasets in a farm environment and the date fruit bunches in an orchard, whereas other studies used different datasets using single dates with uniform background. TABLE 7 illustrates a comparison of the evaluation parameters of the proposed system and the reference study of Nasiri et al. [12]. In the proposed system, VGG-19 outperformed the other models and showed outstanding results for all performance metrics for all maturity detection systems. As shown in TABLE 7, our proposed system using VGG-19 outperformed other systems. The reference study [5] had values of 97.25%, 89.56%, 96.1%, and 97.2% for accuracy, F1 score, sensitivity (recall), and precision, respectively, for five maturity levels using VGG-16, whereas our proposed system gave 98.3%, 98.6%, 98.9%, and 98.24% for accuracy, F1 score, sensitivity (recall), and precision, respectively, for five maturity levels with the same dataset. The reference study [5] achieved 92.3%, 96.71%, 86.98%, and 92.3% for accuracy, F1 score, sensitivity (recall), and precision, respectively, using VGG-16 for seven maturity levels, whereas our proposed system gave 97%, 97.6%, 98%, and 96.9% for accuracy, F1 score, sensitivity (recall), and precision, respectively, for five maturity levels with the same dataset. With a comparably outstanding performance, our proposed system outperformed the reference study [12] with a four-stage maturity detection system. The reference study [12] archived 98.49%, 97.33%, and 97.33 for accuracy, sensitivity (recall), and precision,  respectively, using VGG-16 for four maturity levels, whereas our proposed system achieved 98.5%, 98.6%, 98.5%, and 98.5% for accuracy, F1 score, sensitivity (recall), and precision, respectively, for four maturity levels.

VI. CONCLUSION
The present study proposed an intelligent harvesting decision system called IHDS to harvest date fruits at an appropriate time based on a specific maturity stage using DL and computer vision. In fact, harvesting date fruits at the proper time is a critical decision that significantly affects profit. In the present study, we were able to classify all maturity stages of date fruit (Immature stage 1, Immature stage 2, Pre-Khalal, Khalal, Khalal with Rutab, Pre-Tamar, and Tamar). We used the VGG-19, Inception-V3, and NASNet architectural models for pretraining. The maximum performance metrics of the proposed IHDS were 99.4%, 99.4%, 99.7%, and 99.7% for accuracy, F1 score, sensitivity (recall), and precision, respectively. The proposed IHDS was compared with two other studies from literature, and it comparably outperformed the others. In the future, we are planning to enhance the system to estimate date fruit type, maturity level and the weight of date fruits per palm in the pre-harvesting phase MOHAMMED ARAFAH received the Ph.D. degree in computer engineering from the University of Southern California, Los Angeles, USA. He is currently an Associate Professor with the Department of Computer Engineering, King Saud University, Riyadh, Saudi Arabia. He has published in the areas of multistage interconnection networks, MPLS networks, and LTE networks. His current research interests include robotics, cooperative communication, 5G mobile communications, software defined radios, and multiple antenna systems.
MOHAMED AMINE MEKHTICHE (Member, IEEE) was born in Medea, Algeria, in 1987. He received the B.S. and M.S. degrees in electronics engineering from the University of Blida, in 2010 and 2012, respectively. Since 2014, he has been a Researcher with the Center of Smart Robotics Research, King Saud University, Saudi Arabia. His current research interest includes image processing stereo vision. VOLUME 8, 2020