Real-Time Detection of Ripe Oil Palm Fresh Fruit Bunch Based on YOLOv4

Fresh Fruit Bunch (FFB) is the main ingredient in palm oil production. Harvesting FFB from oil palm trees at its peak ripeness stage is crucial to maximise the oil extraction rate (OER) and quality. In current harvesting practices, misclassification of FFB ripeness can occur due to human error, resulting in OER loss. Therefore, a vision-based ripe FFB detection system is proposed as the first step in a robotic FFB harvesting system. In this work, live camera input is fed into a Convolutional Neural Network (CNN) model known as YOLOv4 to detect the presence of ripe FFBs on the oil palm trees in real-time. Once a ripe FFB is detected on the tree, a signal is transmitted via ROS to the robotic harvesting mechanism. To train the YOLOv4 model, a large number of ripe FFB images were collected using an Intel Realsense Camera D435 with a resolution of $1920\times 1080$ . During data acquisition, a subject matter expert assisted in classifying the FFBs in terms of ripe or unripe. During the testing phase, the result of the mean Average Precision (mAP) and recall are 87.9 % and 82 % as the detection fulfilled the Intersect over Union (IoU) with more than 0.5 after 2000 iterations and the system operated at the real-time speed of roughly 21 Frame Per Second (FPS).


I. INTRODUCTION
Malaysia is one of the biggest palm oil-producing countries 15 in the world. The palm oil industry is a significant contributor 16 to the country's Gross Domestic Product (GDP). Palm oil 17 companies have more than a million hectares of plantation 18 land to produce Fresh Fruit Bunches (FFBs) which will 19 be harvested when it is ripe to extract their valuable oil. 20 Therefore, several rules and guidelines were developed to 21 achieve the maximum oil extraction rate (OER) according 22 to the guideline of the Malaysian Palm Oil Board (MPOB). 23 In the oil palm estate, the FFBs can only be harvested once 24 the trees reach maturity at three years old. The field workers 25 will harvest the FFBs on the 10th-14th days of the harvesting 26 interval. The harvesters will search for oil palm trees with 27 a certain number of detached fruitlets that have dropped to 28 the ground. According to the current guideline, this indicates 29 that there are FFBs on the trees that have ripened and should 30 The associate editor coordinating the review of this manuscript and approving it for publication was Utku Kose . be harvested. The ripe FFBs are usually identified by their 31 colour which is a bright red and yellow, in contrast to the 32 brown and black of unripe FFBs. The harvested FFBs will 33 then be collected and transported to the palm oil mill for oil 34 extraction [1]. The general rule is that the FFBs are to be 35 delivered to the mill within 24 hours after harvesting to ensure 36 the quality of the fruit is at the highest level. However, this 37 implementation is not guaranteed due to factors such as rain 38 during the harvesting process and other unforeseen logistical 39 issues. 40 The problem of labour shortage has had a tremendous 41 impact on the economic growth of the oil palm industry which 42 is traditionally very labour intensive [2]. Oil palm estates 43 have reported labour shortages of approximately 20-30 %, 44 which affected the potential yield to decrease by around 15 45 % due to post-harvest losses [3]. Therefore, the plantation 46 industry should resolve this problem by implementing the 47 latest technology in the FFB harvesting process. model are that it is able to perform detection with high accu-107 racy and in real-time due to its speedy processing technique. 108 In the past, the YOLO series model had been implemented 109 to detect agricultural fruits such as apples [25], tomatoes 110 [26], and pears [27]. In the oil palm sector, Junos et al. [28] 111 developed an automatic detection system that included the 112 YOLO model to detect FFBs. The authors compared the 113 performance between YOLOv3 series models and the result 114 show the model is feasible for object detection in the oil palm 115 sector.

116
In this project, the objective is to develop a system to auto-117 matically detect ripe, unharvested FFBs in real-time using 118 a combination of computer vision and artificial intelligence 119 (AI). Real-time operation is considered an important feature 120 of this system because it is meant for on-field application 121 as part of a robotic harvesting mechanism. Firstly, an RGB 122 camera (Intel Realsense D435) captures the view of the oil 123 palm tree and transmits the data to a Single Board Computer 124 (Nvidia Jetson NX) that is loaded with an inference model 125 based on YOLOv4. The trained algorithm would identify the 126 target object, record its coordinates, and send its positional 127 information to the robotic harvesting mechanism within the 128 Robot Operating System (ROS). Based on the coordinate 129 received from the detection module, the positional informa-130 tion of the FFB was obtained by using a kinematic equa-131 tion. Since it has been released for some time, YOLOv4 132 has become highly compatible with ROS which is a core 133 component of the robotic harvesting system that is being 134 developed separately. This was the main factor in the selec-135 tion of YOLOv4 for this work. Also, although it was not the 136 latest iteration of the YOLO model, it is still able to perform 137 object detection with high accuracy and speed.

139
In order to develop an AI-based vision system to detect ripe 140 FFBs on the oil palm trees, the algorithm must be trained to 141 do so using visual data or samples of ripe FFBs on the trees. 142 In this section, the work done for data acquisition, preparation 143 and training will be explained in detail.

145
The oil palm trees selected for data capture range from 8 to 146 13 years old because this is when the trees produce the 147 most FFBs. The sample data was recorded from November 148 to December 2021 at an oil palm plantation in the state of 149 Selangor, Malaysia. During data acquisition, a subject matter 150 expert assisted in identifying ripe and unripe FFBs on the 151 trees. Fig. 1 shows the hardware setup used for this work. 152 An Intel Realsense D435 Camera was used to capture the 153 FFB visual data. The camera was mounted onto an adjustable 154 platform to enable it to capture images at the same height or 155 level as the FFBs on the trees. The captured images have a 156 resolution of 1920 × 1080 pixels which are stored in a laptop 157 computer connected via a USB-C cable. The laptop computer 158 used for capturing and storing data is mounted with an Intel 159 Core i7-8750H processor and GeForce DTX 1070 graphic 160 card.

161
In the beginning, the camera is lifted at the same elevation 162 and the distance is 3 meters away from the targeted FFB.      In this study, the parameters applied in the YOLOv4 model 201 are the default values provided by Bochkovskiy et al. [29]. 202 Table 2 shows the values of the parameters used in the train-203 ing. The data augmentation includes the parameters exposure 204 and saturation with a factor of 1.5 to randomly change of 205 intensity and brightness of colour present in the image while 206 during training and the parameter hue is set as 0.1. In addition, 207 another method of data augmentation, Mosaic, was included 208 during training, which contributes to the overall improvement 209 in the YOLOv4 model. The Mosaic method mixes 4 training 210 images to generate a new image that is contextually different 211 in context from the original. However, the size of the anchor 212 box, S is required to calculate according to the number of 213 object detection. The formula to calculate the size of the 214 anchor box is shown as: where N represents the number of object detection, C rep-217 resents the prediction of coordinates, and A represents the 218 number of anchor boxes per grid.

219
In this study, a few variations of the YOLOv4 model 220 are evaluated to compare their performance in the task of 221 FFB detection. The additional model applied in this study is 222 Scaled-YOLOv4 which was specially designed to suit vari-223 ous GPUs during operation. For example, the YOLOv4-CSP 224 and YOLOv4-tiny architectures are both categorised under 225 Scaled-YOLOv4. The YOLOv4-CSP was designed with an 226 emphasis on balancing between the execution speed and 227 accuracy rather than the general YOLOv4 which is more 228 focused on fast operating speed and optimization for parallel 229 computation. On the other hand, YOLOv4-tiny was designed 230 for implementation in low-spec devices as the amount of 231 computational complexity and model size have reduced [30]. 232 Thus, the performance of FFB detection using these three 233 types of models which are YOLOv4, YOLOv4-CSP, and 234 YOLOv4-tiny will be examined. The result is analysed after 235 VOLUME 10, 2022  which is three times faster than YOLOv4 and YOLOv4-CSP. 282 This is due to the low number of convolutional layers in 283 the model architecture of YOLOv4-tiny, allowing it to be 284 trained with low computational cost. Table 4 shows the size of 285 the weight files in different models. The YOLO-tiny weight 286 file is remarkably small at 23.5 MB. It verifies that weight 287 size is correlated with computational speed. As the original 288 YOLOv4 models have the largest weight files and also the 289 longest training time.
The confusion matrix of the detection performance for 306 several YOLOv4 models is shown in Table 5. The original 307 YOLOv4 models have the greatest performance regarding 308 different perspectives. The precision, recall, and F1-score 309 have a high percentage of 97 %. The average IoU and mAP 310 achieved above 75 % and 96 % in both 512 × 512 and 608 × 311 608 input network sizes. When analysing the YOLOv4-CSP 312 model, the mAP has similar accuracy to the original YOLOv4 313 models. However, even though YOLOv4-CSP models have 314 higher mAP than YOLOv4, it is less predictive and sensitive 315 as the result shows the precision, recall, F1-score, and average 316 IoU are slightly lower than YOLOv4. Next, the precision, 317 recall, and F1-score are much lower in the YOLOv4-tiny 318 models which were expected since these models were opti-319 mized for speed and not accuracy. It affects the mAP of the 320 YOLOv4-tiny-512 and YOLOv4-tiny-608 models and they 321 scored 48.89 % and 55.60 % respectively, which are lower 322 than other models. 323 95768 VOLUME 10, 2022     Testing was conducted at 9.00 am when the weather was 350 sunny, and the temperature is around 32 • C. In the field, 351 a total of 20 trees were selected for the testing where 10 trees 352 have unripe FFBs only and 10 trees have at least 1 ripe FFB. 353 The targeted trees are similar in age and height to the trees 354 that were used to train the model. Fig. 4 shows the output 355 of the detection system on ripe FFB during on-site testing. 356 In addition, the model operated at the real-time speed of an 357 average of 21 frames per second. Table 6 shows the evaluation 358 of the YOLOv4 model from the on-site testing. During the 359 testing phase, the YOLOv4 model recognises the ripe FFB 360 most of the time, with the mAP reaching 87.9 %. Moreover, 361 the average IoU achieved was 70.19 %. This study confirms 362 that this autonomous detection system can detect the ripe FFB 363 in real-time.

365
The work presented in this paper is the first example of a 366 real-time ripe oil palm FFB detection system that is based 367 on the YOLOv4 model. Although no changes were made 368 to the architecture of the model, it is shown that with the 369 selected methodology and hyperparameters for training, the 370 detection output is very encouraging. Based on the analysis, 371 the trained YOLOv4 model obtained a mAP of 87.9 % in 372 detecting the ripe FFB. The results of the recall and F1-score 373 were 82 % and 88 % respectively as the detection fulfilled the 374 IoU with more than 0.5 after 2000 iterations. During testing 375 in the oil palm estate, the system operated at the real-time 376 speed of roughly 21 FPS and achieved a mAP of 87.9 %. 377 The performance can potentially be improved further with 378 improvements in the training data, model architecture, and 379 hyperparameter optimization. For future work, the system 380 will be expanded to assist in the harvesting process by iden-381 tifying the palm fronds surrounding the FFB and the FFB 382 stalk that needs to be cut. This information is then relayed 383 to a robotic arm through ROS that will proceed to harvest the 384 FFB autonomously using a fitted cutting mechanism.