Non-Destructive Post-Harvest Tomato Mass Estimation Model Based on Its Area via Computer Vision and Error Minimization Approaches

Tomato commercialization in Mexican and Latin-American markets is economically affected by three main physical aspects of the fruit: ripening time, size, and mass. Digital image processing combined with mathematical models and machine learning approaches allows the development of prediction models to minimize fruit waste, among other applications. Particularly crossed validation, linear and non-linear adjustment by quadratic mean least error approximation, and digital image processing are used to obtain a post-harvest mass loss estimation model based upon the fruit’s area. A database for fruit characterization of 97,200 images and mass (kg) and area (cm 2 ) measurement entries over a continuous post-harvest timeline of 54 days was considered in the methodology. Results from the linear (polynomial) adjustment model presented an efﬁciency of 94.65%, while the non-linear (exponential and potential) adjustment models gave in their turn efﬁciencies of 99.21 and 99.82%, respectively. It was concluded that the best mass loss estimation model was the potential adjustment one, with an approximation error of just 0.18% between actual and estimated data.

of the harvest in kilograms; the distributor sells in kilograms, 23 the consumer purchases in kilograms. In the same way, the 24 waste of the fruit is estimated. 25 Tomato waste represents a great problem for both Mexico 26 and the rest of the world, reaching between 20 and 50% 27 of fruit losses [13]. The fruit undergoes mass loss during 28 The associate editor coordinating the review of this manuscript and approving it for publication was Roberto Caldelli . harvest, distribution, commercialization, and consumption. 29 Today, producers, distributors, and consumers still do not 30 have computational tools to estimate the mass of the tomato 31 fruit, both during pre-harvest and post-harvest. Currently, 32 fruit monitoring is carried out daily by physical inspection 33 and by non-properly trained/qualified personnel handling the 34 fruit, both at harvest and in distribution and marketing. On the 35 other hand, the consumer acquires the fruit by observing 36 physical properties such as size, firmness, and color. These 37 three physical aspects, added to the ripening time, are respon-38 sible for directly affecting its mass. As the fruit matures, 39 it loses firmness and color, becomes deformed until its mass 40 declines dramatically, to the point it is considered waste. 41 This deterioration has a direct and proportional effect on the 42 economic value of the fruit until it is considered a complete 43 economic loss. Therefore, establishing a computational tool 44 for estimating fruit mass through its area allows producers, 45 a lack of adequate mathematical models or analytical pro-93 cesses allowing us to establish mass based on the fruit area. nique focused on the minimum mean square error to obtain 102 the best fit in data approximation. When comparing the pro-103 posed methods, results presented an efficiency of 94.65% for 104 the polynomial fitting, 99.21% for the exponential fitting, and 105 99.82% for the potential fitting. Therefore, it was determined 106 that the optimal model for estimating the mass of the tomato 107 fruit as a function of the area, with an approximation error of 108 0.18%, is the non-linear method by potential approximation. 109 The remaining of this present document is divided into four 110 sections. Section I presents the introduction of the work, 111 such as the main aspects of the tomato fruit, problems to 112 be solved, related research works, and a brief description 113 of the obtained results. Section II establishes the methods 114 and materials used to carry out the non-destructive analysis 115 of the tomato fruit, such as the description of the physical 116 morphometry system by computer vision, the digital image 117 processing techniques, and the mathematical and technical 118 models of cross-validation for their comparison. Section III 119 presents the results obtained from the image processing while 120 obtaining the fruit area, as well as the results and compar-121 isons of mathematical methods such as linear and non-linear. 122 Finally, section IV presents some conclusive remarks and 123 discussion.

126
Each fruit was randomly selected from a greenhouse com-127 mercial tomato harvest facility, directly from the plants. Sta-128 tistically, the mass of each tomato in the sample is considered 129 to meet the criteria for an independent and identically dis-130 tributed variable. For this reason, one sample of 50 tomatoes 131 was analyzed, since it would be representative not only of 132 that harvest, but any harvest of this specific variety (or ball 133 tomato). The mathematical model developed for estimating 134 postharvest tomato fruit mass based on its area was carried 135 out on a labeled sample of 50 ball-type tomato fruits. The 136 sample was provided by a production facility owned by the 137 High Tech Farm Group. The time of experimentation and data 138 acquisition began with the fruit's precise harvest date until its 139 waste. The effective period of data acquisition was of 54 days, 140 registering during this time values of relative humidity and 141 room temperature between 30-34% and 23-29 • C, respec-142 tively. The mass recording per fruit of the sample was carried 143 out with a Taylor model TE32C digital scale with a precision 144 of four decimal places (0.01 g). For the acquisition and pro-145 cessing of images, a physical morphometry system for fruits 146 was developed and implemented, see Figure 1. The system 147 is made up of the following elements: stepper motor with 148 advance control and stop every 10 • ; Logitech high-definition 149 digital camera model C920 at 1920 × 1080 pixels and USB 150 connection; diffuse lighting system to avoid reflections on 151 the object. Finally, a computational algorithm was developed 152 and implemented in MATLAB version 2015 for digital image 153 processing and control of the morphometric system of the 154 fruit.

156
The methodology proposed in this work is observed in Fig-157 ure 1, which includes the following stages: The process involved a daily recording of the tomato mass 163 values, starting with the cut date of the fruit until it was 164 wasted, see Figure 2. 165 The recorded data were subjected to a statistical process 166 to obtain the daily mass average of the sample, see Figure 9. 167 At the same time, the acquisition and recording of the area 168 value of each fruit were carried out by means of digital image 169 processing based on the number of pixels that make up the 170 segmented image of the fruit, see Figure 3.

171
The data gathered were also subjected to statistical pro-172 cessing to obtain the daily average of the sample area, see 173 Figure 10.   The digital images acquired were processed with the mor-   On the other hand, in order to obtain the fruit area, the area 198 of an object circular was taken as reference, see Figure 7.

199
The process is as follows: a). The object's diameter (D) is 200 measured with a caliper. b). An image of that reference object 201 is binarized in order to obtain its area as a function of the 202 number of white pixels inside the object's silhouette. c). Eq. 1 203 is used to obtain the reference object's area.
where A 0 is the area of the real object and D/2 the radius In this work, data mining techniques were applied, such as In the normalization of the data, the equation,  linear adjustment, equation (5)  f (x) = a n x n + a n−1 x n−1 Based on equation (6), a set n = {1, . . . , 10} was applied 255 to determine the optimal degree of the polynomial fitting 256 n, as well as the numerical coefficients of the polynomial 257 a 0 , . . . , a n . 258 P(x) = a 0 + a 1 x + . . . + a n x n (6) 259 On the other hand, for the non-linear approximation math-260 ematical modeling, the potential adjustment methods were 261 applied according to equation (7), obtained from [16]: while for the exponential adjustment method, equation (8), 264 obtained from [9], was applied: where c and a are the numerical coefficient and the exponent 267 value for both the exponential and the potential fit respec-268 tively.

270
The learning stage applied data mining techniques based on 271 cross-validation using the K -subsets method. It was deter-272 mined that the total of the normalized data of the sample 273 was divided into K 1 , K 2 , K 3 subsets of data. Each subset K 274 takes the data randomly under a percentage of 70 − 30 of the 275 total data of the sample, where 70% of the total data is used 276 for learning (A) of the model, and the remaining 30% was 277 applied for validating (V ) the results obtained by learning, 278 see Figure 8. Therefore, for each K i :     maturity, and it is discarded when the maturation time has 318 concluded.

320
This stage consisted of standardizing the tomato mass 321 and area data under the mathematical normalization pro-322 cess applied under equation (4). The results obtained were 323 graphed, as seen in Figure 11.

324
Subsequently, the normalization of both parameters was 325 taken to establish a relationship between the area and the 326 mass of the fruit. The result obtained is presented in the graph 327 of Figure 12, determining that the loss of mass of the fruit 328 directly affects the area of the fruit.

329
The data obtained from the mass-area relationship were 330 subjected to an evaluation through the mathematical pro-331 cess of equations (5) and (6), obtaining; as a result, a 332 second-degree polynomial of the form, 333 P(x) = a 0 + a 1 x + a n x 2 (10) 334 2) LEARNING STAGE 335 The learning stage of the model applied equations (7), (8) (7) and (8).
coefficients a 2 , a 1 , a 0 of the linear fit by algebraic polynomi-338 als. The results are observed in Table 2.  Table 3.

348
Taking as a reference the values of

359
Each equation obtained with the learning stage represents a 360 mathematical model of approximation of tomato mass as a 361 function of its area. The objective of the validation stage is 362 to determine the optimal model of approximation to the real 363 data of the fruit mass, using the data of each K i (V ) subset.

364
The data of the K i (V ) subsets were first evaluated with 365 the mathematical process of equations (11), (12) and (13). 366 Subsequently, the recorded data of the evaluation were sub-367 jected to equation (9), with which the minimum mean square 368 error is defined. The results obtained are presented in Table 4, 369 where the mathematical model that presents the minimum 370 approximation error is the function: Following and in the same way, the K i (V ) data were math-373 ematically evaluated under equations (14), (15) and (16). 374 Afterward, the data obtained from the evaluation were pro-375 cessed with equation (9). The results obtained are observed 376 in Table 5.

377
According to the results recorded in Table 5, the optimal 378 model based on the exponential fit obtained with the least 379 mean square error was presented with the function, Finally, and following the same process described in the 382 previous models, equations (17), (18), and (19) were evalu-383 ated, and equation (9) was applied. The results obtained were 384 recorded in Table 6, where the optimal model of potential 385 adjustment is presented with the function: 386 P(x) = 1.0060x 1.4617 (22) 387 VOLUME 10, 2022  Finally, a comparison was established between the mathemat-tomato is the third vegetable with the highest production and 422 consumption in the country. Since product commercializa-423 tion is carried out in kilograms, its logistic requires hav-424 ing a good estimate of the product mass along the entire 425 post-harvest economic time frame. The reason is trying to 426 minimize the amount of product waste mass during that pro-427 cess. The advantage of the methodology proposed here over 428 weight/mass scale measurements is mainly related to fruit 429 manipulation. Fruit manipulation and the damage it brings to 430 the product is much more less (practically non-existent) with 431 the former methodology than the latter. This will derive in less 432 economic losses when choosing one methodology over the 433 other. Finally, one can consider both methodologies as non-434 destructive, but the one proposed here is more convenient for 435 the aforementioned reasons.

436
In other discussion topics, these results were obtained 437 gathering data with a fixed distance between camera and 438 fruit. Since the acceptances rates were good compared to 439 other methods (see Table 8.), other distances are considered 440 for future works. On the other hand, this proposed method 441 is time-consuming, since it needs individual mass measure-442 ments and also data processing. This certainly represents one 443 of the potential drawbacks of the method. However, we con-444 sider that this could be solved with logistic schemes such as 445 developing the method once per harvest.

447
The mass estimation model based on the area of the tomato 448 fruit obtained in this work seeks to establish itself as a compu-449 tational tool that allows producers, distributors, and marketers 450 to improve and optimize product selection processes with the 451 aim of reducing both economic and product losses. Likewise, 452 it is sought that the model should allow estimating the eco-453 nomic value of the product during marketing based on the 454 time and area of the fruit. Today this tool is in the process of 455 consensus and evaluation by the High Tech company, which 456 is responsible for producing, marketing, and exporting the 457 product. This seeks to establish a line of knowledge about 458 the efficiency established by the mass estimation model and, 459 if it is the case, establish the improvements to the proposed 460 model.

462
This work presented a non-destructive analysis to estimate 463 the mass of the tomato fruit as a function of the area 464 during the post-harvest process, based on digital image 465 processing techniques, through the construction of a com-466 puter vision physical morphometry system, mathematical 467 modeling comparison among the linear method by polyno-468 mial fitting and the non-linear methods by exponential and 469 potential fitting, developed with computational algorithms 470 programmed with MATLAB. The results obtained from the 471 analysis determined the efficiency of the linear model by 472 polynomial fit to be 94.65%, while the efficiency of the non-473 linear models, exponential and potential fit, was 99.22% and 474 99.82%, respectively. Finally, it is concluded that the best 475