Learning by Breaking: Food Fracture Anticipation for Robotic Food Manipulation

This study aimed to anticipate fractures of fragile food during robotic food manipulation. Anticipating fractures allows a robot to manipulate ingredients without irreversible failure. Food fracture models investigated in food texture fields explain the properties of fragile objects well. However, they may not directly apply to robot manipulation due to the variance in physical properties even within the same ingredient. To this end, we developed a fracture-anticipation system with a tactile sensing module and a simple recurrent neural network. The key idea was to allow the robot to break ingredients during training-sample collection. The timing of fractures was identified via simple signal processing and used for supervision. We performed real robot experiments with three typical fragile foods: tofu, potato chips, and bananas. As the first step toward flexible fragile-object manipulation, we evaluated the proposed method for the fundamental task of object picking. The method successfully grasped the fragile foods without fractures in an online demonstration. In an offline evaluation, the method predicted the fractures with a recall of approximately 80% for all ingredients with 60 breaking trials. We believe that our method can be used to avoid breakage in other types of food manipulation, e.g., holding, pressing, and rolling.

sive force when gripping the object. One popular approach 23 is slip detection [4], in which the minimum force needed to 24 move an object is determined by detecting a current slip. 25 In collaboration with these minimum-force identification 26 techniques, knowing the margin force for fracture is bene-27 ficial for robotic manipulation. To this end, we propose a 28 The associate editor coordinating the review of this manuscript and approving it for publication was Zheng H. Zhu . new task of fracture anticipation, which identifies the max-29 imum force in fragile-object manipulation before fracturing 30 the object. 31 Food fractures have been investigated mainly in food tex-32 ture studies [5], with the aim of establishing models that 33 explain how fracture occurs, according to the relationship 34 between force and deformation. Such models explain the 35 properties of fragile objects well. However, studies have 36 revealed the difficulty of applying such knowledge to robotics 37 applications due to the significant variance in physical prop-38 erties even within the same ingredient [6]. 39 To overcome the significant differences in physical prop-40 erties, we designed a learning-based framework for robots 41 to anticipate fractures. It consists of tactile sensors equipped 42 with a two-finger gripper, which determines the physical 43 properties of individual foods. Human infants explore the 44 physical properties of objects by touching them [7]. Similarly, 45 VOLUME 10,2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ FIGURE 1. Proposed framework. We allow the robot to explore several ingredients by loading them to the point of deformation and breakage. After training, the object can be grasped safely by anticipating the fracture point.
we let the robot break the food ingredients to determine when 46 the fracture occurs ( Fig. 1). We adopted a long short-term memory (LSTM) classifier [8] and an LSTM-based Seq2Seq 48 model [9] to prove the concept of the proposed framework. 49 We experimentally evaluated the framework with real food 50 data and a robot. Three fragile ingredients were adopted: tofu, 51 potato chips, and peeled bananas. In an online demonstra-52 tion, we applied the proposed framework to food ingredient 53 picking. The robot trained by our framework successfully 54 grasped the objects without fractures. In an offline evaluation, 55 we evaluated the fracture-anticipation model using samples 56 with known fracture points. We prepared the test samples 57 with within-category differences in physical properties (e.g., 58 different poses and stiffnesses) to evaluate the robustness of 59 the learning-based approach. 60 The contributions of the study are threefold:

61
• We propose a new task of fracture anticipation, which 62 aims to estimate the maximum force during object 63 manipulation.

64
• We developed a data-collection framework by break-65 ing food objects to train the anticipation model. At the 66 data-collection stage, the robot breaks fragile objects, 67 and the fracture timing is automatically identified via 68 simple signal processing. We experimentally validated 69 the concept using simple LSTM models trained with the 70 identified timing.

71
• We applied the fracture-anticipation model to robotic 72 food-picking tasks. The robot successfully grasped the 73 fragile foods without causing fractures.

74
The remainder of this paper is organized as follows. 75 Section II introduces related studies. Sections III and IV 76 present the problem formulation and the proposed method. 77 Sections V and VI present our experiments. Section VII 78 discusses the experimental results. Finally, Section VIII con-79 cludes the paper.

81
This section introduces studies related to robotic food manip-82 ulation and fracture analysis for robot applications.

84
Robots have performed various food-handling tasks, such as 85 picking [10], [11], [12], cutting [6], rolling pizza dough [13], 86 and flipping a pancake [14], some of which require careful 87 manipulation to avoid breaking ingredients. Many physically 88 soft robotic grippers that allow large deformation have been 89 developed to adapt to food products that vary widely with 90 regard to size, texture, weight, fragility, and shape. Mag-91 netorheological fluids [15], viscoelastic fluids [10], [16], 92 pneumatic fluids [17]  Researchers have proposed detecting the beginning of 112 fracture of fragile foods in a rule-based manner [16] or by 113 using the estimation error of the pressures in fingers based 114 on polynomial models [10] for robotic food manipulation. 115 The proposed task of fracture anticipation can be used in 116 collaboration with these methods, where the detected fracture 117 is used as supervision.   and breaks food ingredients. The horizontal axis indicates 129 FIGURE 3. Mathematical notation for fracture anticipation. The input X t w :t slides along the time axis and predictsŷ t δ , where δ represents the time taken to stop the gripper. The ground truth y t is True when T p − m ≤ t δ , where m represents the safety margin for avoiding fracture.

III. DEFINITION OF FRACTURE AND
time, and the vertical axis indicates the norm of the three-130 axis force signal. The line colors correspond to the loca-131 tions of the taxels. As the forces on the taxels increased, 132 the peak force appeared at different times for the different 133 taxels. Although identifying the exact moment of fracture 134 was difficult, we noted that fracture could only be observed 135 visually after the forces had peaked. We observed similar 136 tendencies for tofu, potato chips, and bananas. Thus, in this 137 study, fracture was considered to occur when the first peak 138 force appeared among the 32 taxels of the two tactile sensors. 139 In this breaking trial, fracture occurred at approximately 5.5, 140 2.5, and 5.0 s for tofu, potato chips, and banana, as shown in 141 Fig. 2. Our framework is not limited to this simple definition 142 of fracture; it is compatible with more sophisticated fracture 143 detection methods [10], [16].  Fig. 3 shows the mathematical notation for the fracture-146 anticipation problem. The robot must stop before it creates 147 a crack or fracture during food manipulation. To ensure this, 148 we stop the robot's motion at the timestep T p − m, where T p 149 represents the timing of the fracture (the peak time) and m 150 is a safety margin. Let t w = t − w be the first timestep of 151 the input observation and t δ = t + δ be the target timestep of 152 fracture prediction. Then, the fracture-anticipation problem is 153 formulated as y t δ = f (X t w :t ), where X t w :t = {x t w , . . . , x t } is a 154 sequence of observations from the tactile sensors and y t δ is 155 a binary value that indicates whether the robot exceeds the 156 fracture timing (True if T p − m ≤ t δ , False otherwise).

159
This section describes the fracture-anticipation network and 160 its application to robotic manipulation.

161
A. FRACTURE-ANTICIPATION NETWORK 162 We aimed to predict the fracture at least δ timesteps in 163 advance. To achieve this, we adopted two simple recurrent 164 neural network (RNN) models: a simple LSTM classifier and 165 a Seq2Seq model [9], which are denoted as Proposed1 and 166 Proposed2, respectively. 183 To achieve the above intention, we trained Proposed1 to 184 minimize the loss function L ce (ŷ t δ , y t δ ), where L ce represents 185 the binary cross entropy. We trained Proposed2 using the 186 following loss function: where L mse represents the mean squared error (MSE), and   To train and test our model offline, we collected a dataset. 241 During the data collection, the robot repeated the following 242 motions:

256
The tactile data obtained by our system were not always 257 recorded at regular intervals, included noise, and could lead 258 to incorrect peak-point detection, making the prediction task 259 difficult. Therefore, we preprocessed the tactile data by  The threshold was set as 5.0 × 10 −5 . After the noise was 267 removed from the tactile data, we detected each peak point 268 and defined the first peak among the 32 taxels as the ground-269 truth force peak. To make the peak detection more reliable, 270 we removed the tactile data where the timing of the peak force 271 and the visibly observed fracture are significantly different. 272 We ignored peak points as noise when their L2 norm was 273 <0.004. Finally, we set T p as the first force peak among the  We recorded tactile data for tofu, potato chips, and banana 279 in 60 trials for each object. Diverse samples were used, with 280 different brands, shapes, and positions (Fig. 7). We aimed 281 to place the objects at the same initial location, but this was 282 done by hand, and there was a natural variance. We prepared To organize the models in Figs. 4 and 5, we used two-layered 299 LSTM networks for the encoder E and decoder D. All the 300 layers had 32 hidden units. The output z t of E, which was 301 the input to the dense layer M, was set as a 32-dimensional 302 vector.

303
With the LSTM encoder, we must flatten the 304 {4 × 4 × 2 × 3}-dimensional signals into a 96-dimensional 305 vector. Thus, ConvLSTM [35] networks may be considered 306 more efficient. However, their effect may be small because 307 the size of the tactile map (4 × 4) is similar to that of the 308 convolution kernel (3×3). Hence, we implemented the model 309 with the simple RNN architecture. Similarly, we avoided 310 using a transformer owing to the limited number of training 311 samples. D had two outputs: a 96-dimensional vectorx t+k 312 and a logit. We processed the logit with the sigmoid function 313 and obtainedŷ t+k .

314
The absolute value of the signals has little meaning under 315 our robot platform because it depends on the alignment 316 between sensing points and the contact surfaces. Thus, 317 to reduce alignment-dependent variance, we used the differ-318 ence between two consecutive observations as x t .  We compared our method with a manually designed baseline 329 for the picking task. We expected our method to be more  The baseline used a gradient: the difference of the con-    Table 1 presents the picking performance achieved using a ≥ 80% for all objects. Fig. 8 shows examples of the successful 372 and failed cases of picking up objects.  Table 2 presents the prediction accuracies of our mod-375 els, which were used to evaluate the model performance. 376 Here, Accuracy represents the percentage of the agreement 377 between the predicted labels and the ground truth. Precision 378 represents the percentage of timesteps with True (i.e., the 379 target object was fractured) labels among the timesteps the 380 model predicted as True. Recall represents the percentage of 381 timesteps predicted as True by the model among the timesteps 382 with the True label. The F-measure is the harmonic mean 383 of the Precision and Recall scores. Each value was the mean 384 score of fivefold cross-validation.

385
Because our goal was to grasp target objects without frac-386 turing them, the priority for our model was to avoid passing 387 the fracture point. Thus, the Recall score was the most impor-388 tant evaluation metric. As indicated by    and rolling pizza dough [13]. 412 We investigated how the objects were picked. Fig. 9 shows 413 tactile signals for successful and failed trials of tofu picking 414 using Proposed1 and Proposed2. As shown in Fig. 9(a), 415 when the robot grasped tofu without fracturing it, the tactile 416 peak point did not appear before the moment the gripper 417 stopped moving, and the peak points were aligned for all the 418 sensor data in the time-axis direction. Although some peak 419 points appeared (red points), we can ignore them, because 420 the forces were insufficient to break the tofu, as described 421 in Section V-B. However, as shown in Fig. 9(b), when the 422 robot grasped the tofu too strongly and fractured it, some 423 tactile values of the gripper exceeded their peaks and began 424 to decrease. 425 Fig. 10 shows the prediction accuracy with respect to 426 the number of training data for Proposed1. We prepared 427 160 banana pieces: 120 for training, 20 for validation, and 428 20 for testing. The accuracy increased with the number of 429 training samples; however, it stopped significantly increasing 430 when the number of samples reached 50.

431
Although our method could manipulate food ingredients 432 without fracture, there is room for further investigation and 433 improvement. First, even though grasping was challenging as 434 the softness varies in the same ingredients shown in this study, 435 we need to extend the scalability of our method by tackling 436 more various food objects and brands. Furthermore, we will 437 investigate whether our fracture anticipation network can be 438 used for different tasks, for example, whether the robot can 439 continue grasping the food object without slippage and frac-440 ture even though the gripper is moved faster or shaken. Our 441 method can learn the picking strategy more efficiently. To this 442 end, we will consider transfer learning approaches [36], [37] 443 for our fracture-anticipation network. Finally, it would be 444 VOLUME 10, 2022