Deformation-Aware Data-Driven Grasp Synthesis

—Grasp synthesis for 3-D deformable objects remains a little-explored topic, most works aiming to minimize deformations. However, deformations are not necessarily harmful—humans are, for example, able to exploit deformations to generate new potential grasps. How to achieve that on a robot is though an open question. This letter proposes an approach that uses object stiffness information in addition to depth images for synthesizing high-quality grasps. We achieve this by incorporating object stiffness as an additional input to a state-of-the-art deep grasp planning network. We also curate a new synthetic dataset of grasps on objects of varying stiffness using the Isaac Gym simulator for training the network. We experimentally validate and compare our proposed approach against the case where we do not incorporate object stiffness on a total of 2800 grasps in simulation and 560 grasps on a real Franka Emika Panda. The experimental results show signiﬁcant improvement in grasp success rate using the proposed approach on a wide range of objects with varying shapes, sizes, and stiffnesses. Furthermore, we demonstrate that the approach can generate different grasping strategies for different stiffness values. Together, the results clearly show the value of incorporating stiffness information when grasping objects of varying stiffness. Code and video are available at: https://irobotics.aalto.ﬁ/defggcnn/.

Abstract-Grasp synthesis for 3D deformable objects remains a little-explored topic, most works aiming to minimize deformations. However, deformations are not necessarily harmful-humans are, for example, able to exploit deformations to generate new potential grasps. How to achieve that on a robot is though an open question. This paper proposes an approach that uses object stiffness information in addition to depth images for synthesizing high-quality grasps. We achieve this by incorporating object stiffness as an additional input to a state-of-the-art deep grasp planning network. We also curate a new synthetic dataset of grasps on objects of varying stiffness using the Isaac Gym simulator for training the network. We experimentally validate and compare our proposed approach against the case where we do not incorporate object stiffness on a total of 2800 grasps in simulation and 420 grasps on a real Franka Emika Panda. The experimental results show significant improvement in grasp success rate using the proposed approach on a wide range of objects with varying shapes, sizes, and stiffness. Furthermore, we demonstrate that the approach can generate different grasping strategies for different stiffness values, such as pinching for soft objects and caging for hard objects. Together, the results clearly show the value of incorporating stiffness information when grasping objects of varying stiffness.

I. INTRODUCTION
In the last decade, advancement in robotic grasping has enabled robots to automatically grasp a never-before-seen range of objects. However, most of the works on grasp synthesis still assume specific object properties such as uniform friction or rigidity. These assumptions do not hold for multi-material [2] or deformable objects and can lead to unsuccessful grasping in real-world scenarios.
Grasping non-rigid objects, on the other hand, is difficult because objects deform under interaction forces meaning that the 3-D contact locations also depend on the forces exerted on the object. Furthermore, the effect of the deformation varies across deformable objects and tasks. In some scenarios, such as grasping a water bottle, it is useful to minimize the object's deformation not to dislodge the liquid. While for other objects, such as the triangular-shaped object shown in Fig. 1 one can take advantage of the deformation to grasp them successfully. To date, most of the existing works only focus on minimizing the object deformation [3]- [7]. Although there exist few works that take advantage of the Fig. 1: The robot executes the same grasp candidate on two objects with similar shapes but different stiffness. While the rigidity of the object tried to push itself out of the gripper (a), the deformation of the deformable object in (b) gently conformed to the shape of the gripper that leads to a successful grasp.
deformation [8], [9], they mainly focus on proposing control strategies given an initial grasp configuration. Thus, it is still an open question of how object stiffness affects the choice of grasp configuration and how to harness object deformation to generate better grasps.
To address the aforementioned open issues, we propose to incorporate stiffness as an additional input to a state-ofthe-art deep grasp planning pipeline as shown in Fig. 2. Our system generates a grasp candidate and corresponding grasp quality for every pixel given an input depth image and a stiffness image. The model outputs can be reprojected into 3D space when combined with depth information, allowing a robot to execute a generated grasp in the real world.
We qualitatively evaluated the proposed grasp synthesis method on a Franka Emika Panda robot in simulation and the real world by comparing it to a method that ignores stiffness. In the simulator, we evaluated over 2800 grasps on a shake and twist task, measuring, respectively, the grasp's robustness to linear and angular disturbances. In the real world, we measured the grasp success rate of 420 grasps on 14 objects under three different cases: with stiffness information, assuming all objects are rigid, and assuming all objects are deformable. In both simulation and the real world, our proposed approach demonstrates an improvement in grasp success rate. Furthermore, the approach can generate different grasping strategies for different stiffness values, such as pinching for soft objects and caging for hard objects, even though no pinch grasps were included in the training data.
In summary, the main contributions of this paper are: • The first generative stiffness-aware deep grasping approach that adapts the grasp location depending on the object's stiffness. • The first stiffness-dependent image-based grasping dataset consisting of labeled top-down grasps on objects with varying stiffnesses. • A thorough empirical evaluation of the proposed method presenting, both in simulation and on real hardware, improvements in terms of grasp ranking and grasp success rate over a method that ignores stiffness.

II. RELATED WORK
To put our work in context, we next review three complementary viewpoints, grasping of deformable objects, datadriven grasp synthesis, and simulation of deformable object interactions.
Most approaches for grasping deformable objects aim to minimize the deformation. For a particular grasp, the minimization can be performed on-line by employing a control strategy that regulates the force at each contact [6], [7]. To plan grasps, minimum deformation can be achieved by placing fingers at locations with maximal local stiffness, determined e.g. using simulation [3]. The deformation can also be integrated as an additional component of a wrenchspace grasp quality metric [4].
In contrast to minimizing deformation, some works have proposed to utilize the object deformation, similar to this paper. Analytical grasp planning approaches following this line of study include bounded force closure [17] which guarantees force closure under a bounded external force, and deform closure [18] which generalizes form closure to deformable objects with frictionless contact. In the on-line case, finger displacements can be regulated in order to retain force closure, originally proposed for planar objects [8] and later extended to 3D [9] by using Finite Element Method (FEM) to continuously model changes in shape and contact geometry during object lifting.
Although [8], [9] utilized object deformations for stable grasping, they focus on regulating either force or displacement of the fingers given an initial grasp configuration. This is in contrast to our work which focuses on the choice of the grasp configuration by taking advantage of the object deformation.

B. Data-driven Grasp Synthesis
Rapid advances in deep learning research recently have changed the paradigm of robotic grasping from analytical methods to data-driven ones. The main reason for this paradigm shift is that data-driven methods have been proved to be able to generate grasps that typically achieve a high grasp success rate on a wide range of objects in just a matter of seconds, much faster compared to analytical methods [19]- [24]. For example, Mahler et al. [20] used a dataset consisting of millions of synthetic antipodal top-grasp to train a Grasp Quality Convolutional Neural Network (GQ-CNN) model that computes the probability of success of grasps from depth images. The GQ-CNN was further improved through the use of on-policy data and a fully-convolutional network (FCN) structure called FC-GQ-CNN [25]. The FCN structure has recently been found to perform well in grasp synthesis [21], [25]- [27], having the ability to generate dense, pixel-wise predictions for an input image efficiently.
Although the aforementioned works achieve impressive results on rigid objects, none of them explicitly investigate the usability on deformable objects especially 3D deformable objects. In this work, we tackle this problem by incorporating object deformation into a state-of-the-art deep data-driven grasping planning pipeline.

C. Soft-body Simulation
Data-driven grasping requires training data, which in this work we generate in simulation. Simulating dynamics of deformable objects relies heavily on their geometric representations, for instance, particle representation is a good choice for simulating the dynamics of fluids. Yin et al. [28] presents three primary deformable object modelling approaches, Mass-Spring System (MSS), Position-based Dynamics (PBD), and FEM, and their limitations. In this work, we decided to use FEM because it is often used to model 3D objects such as food [29] and tissues [30] and, compared to other modeling approaches, offers a more physically accurate representation of a deformable object in a continuous domain at the expense of computational cost.
Some robotic simulators that support FEM are PyBullet [31], SOFA [32], and NVIDIA's recent version of the Isaac Gym simulator [33], which supports soft body simulation through the NVIDIA Flex backend. Among the aforementioned simulators, Isaac Gym is chosen as it combines the advantages of the other two. Specifically, similar to SOFA, Isaac Gym includes a co-rotational linear model for precision in modeling and simulating the object deformation under interaction. Furthermore, similar to PyBullet, the Isaac simulator also provides the capability to integrate robot-related functions, making it easier to build robotic applications. Huang et al. [16] also provides a grasping framework to automatically perform and evaluate grasp tests on an arbitrary target object. We use this framework in our work to generate training data and to test grasps.

III. PROBLEM FORMULATION
This work addresses the problem of generating antipodal grasps on unknown objects with different stiffnesses lying on a supporting surface. The goal is to calculate a grasp for each pixel in the depth image while taking into account object stiffness. More formally, we train a model M that takes as input a depth image I d and a stiffness image I s , and produces a grasp map G that incorporates grasp quality and grasp parameters (orientation, gripper width) for grasps centered at each pixel in the input.
To achieve this goal, we propose to use the Deep Neural Network (DNN) in Fig. 2 to map from depth and stiffness images to grasps G in the image, which we can easily transform to the real world using known coordinate transforms.

A. Network
Our solution is based on the GG-CNN because it is orders of magnitude smaller than other recent grasping networks, thus it is faster to train and evaluate the network, while achieving state-of-the-art results in grasping rigid objects.
We propose Def-GG-CNN (Fig. 2), a fully convolutional network to synthesize grasps on objects with different stiffness including deformable ones. To enable Def-GG-CNN to learn stiffness-dependent grasping strategies, it has, alongside the depth image, an additional stiffness image input. The stiffness image represents Young's modulus of the object at each pixel. The output of the network is the grasp map G that represents a grasp quality, and gripper parameters (orientation, gripper width) for each pixel of the depth image. The proposed network is trained with supervised learning on a synthetic dataset further explaned in Section V.

B. Grasp map representation
Each pixel in the grasp map G represents a 4-dof grasp. We use the same representation of G as defined in [21]. As shown in Fig. 2, the grasp map G consist of three images: grasp quality Q, orientation φ, and gripper width W.
Q denotes the quality of a grasp centered at each pixel. The quality of a grasp is a scalar value between [0,1], where the higher the value, the better the grasp. φ is the orientation image, representing the pixel-wise orientation of a grasp around the image normal. Because an antipodal grasp is symmetric beyond 180 degrees, we limit the orientation between [−π/2, π/2] radians. Finally, W is the width image that describes the pixel-wise gripper width from [0, 150] pixels. We transform the pixel-dependent gripper width to real-world units using the measured depth and the camera parameters.

V. DATASET
To train Def-GG-CNN, we need a dataset consisting of depth, stiffness, quality, orientation, and width images. To date, there exists no such dataset, and, thus, we opted to curate our own synthetic dataset.
The pipeline of generating training data is visualized in Fig. 3. We first convert the triangular mesh of an object into tetrahedral mesh using fTetWild [34] and feed that tetrahedral mesh to the Isaac Gym simulator to enable its soft bodies simulation feature. The stiffness of an object can then be varied by adjusting the material parameters , i.e., Young's modulus and Poisson's Ratio. Then using the object triangular mesh, we sample grasp candidates which are later evaluated with Isaac Gym using proper quality metrics. Based on the performance of the grasps, we then label the grasps, convert them to the desired representation, and store them in the training dataset.
Depth and stiffness input: We captured depth images of target objects with a virtual camera set to view the scene from top-down. To model variable object stiffness, four values of Young's modulus from 2 · 10 4 to 2 · 10 9 were used. The Young's modulus is normalized to [0,1] range and the corresponding stiffness value is assigned to every pixel in the stiffness image that the object occupies.
Grasp candidates: Grasps are sampled with an antipodal grasp sampler to obtain approximately 200 grasp candidates for each target object. All grasps that collide with the mesh are filtered out, resulting in a final set of 25 to 40 collisionfree grasps for each object.  Fig. 3: The training data generation pipeline.
Quality metrics: None of the standard grasp quality metrics, such as the Ferrari & Canny L 1 metric [35], are directly applicable for both rigid and deformable objects. As a quality metric, we use a shake task which measures how easily an object is displaced in hand under linear accelerations. A higher metric means a better grasp as it indicates that a grasp can withstand higher accelerations. We use this metric to label a grasp as a positive or negative grasp by checking if the linear acceleration it can withstand is above or below a threshold. Specifically, after successfully lifting the object for each grasp candidate, we linearly increase the acceleration of the grasp along 16 directions until the gripper loses contact with the objects or reaches the upper acceleration limit, which is set to 50 m/s 2 . Then we compute the average acceleration over all directions, and if this value is higher than the threshold of 25 m/s 2 , we label the grasp candidate as a positive grasp.
Ground-truth grasp map: To further simplify the data generation, we only use positively labeled grasps as groundtruth grasps to train the network. To generate the groundtruth grasps, we first transform all grasps to the image space. To do so, we represent the grasps as rectangles in the image as displayed in Fig. 4. Four parameters define the rectangles: grasp center, grasp orientation, grasp width, and finger height. Finally, we use the rectangles as image masks to generate ground-truth grasp maps G. Specifically, all pixels of the quality images Q, angle images φ, and width images W within the rectangle are set to the values given from the shake task. In contrast, all pixels outside the rectangle are set to invalid.
Training dataset: As a training dataset, we generate and label grasps on 30 objects. The objects include 13 primitive objects provided in Isaac Gym, 5 objects from the YCB dataset [36], and 12 objects with adversarial geometry from the EGAD! dataset [37]. Because we set the stiffness for each object to four different values, the training set contains, in total, 120 objects. We use the Franka Emika Panda gripper model to execute grasps on objects in the simulator. To counteract the small size of the training set, we further augment the dataset with random crops, zooms, and rotations to create a set of 5400 depth and stiffness images with 27000 corresponding labeled grasp maps.

VI. EXPERIMENTS AND RESULTS
The experiments address the following three questions: • Can Def-GG-CNN synthesize high-quality grasps for deformable objects and would they succeed in simulation? • Is Def-GG-CNN robust against errors in the stiffness input? • Can Def-GG-CNN, trained purely on synthetic data, generate successful grasps in the real world? • How does the stiffness information affect the choice of the grasp configuration?

A. Grasping in Simulation
To investigate the quality of synthesized grasps in simulation, we evaluated the approach on two sets of objects: 7 common objects shown in Fig. 5, and 28 adversarial objects from the recent EGAD! test dataset [37]. For each object and stiffness, we evaluated the five grasps with the highest quality on a shake task and twist task. While the shake task measures how easily an object slips out of the gripper under linear accelerations, the twist task measures that under angular accelerations. A grasp is successful if the object is in the gripper during the whole procedure, the grasp can withstand the linear acceleration limit of 25 m/s 2 , and the angular acceleration limit of 500 rad/s 2 . By doing this, we can quantify how the generated grasps behave under different disturbances. To demonstrate the importance and the effect of stiffness input, we compared the stiffness-aware grasps against grasps generated with an identical approach without stiffness information. In total, this amounts to 1400 grasps per method. Table I shows the simulation result of both test sets. We can see that the proposed approach that takes stiffness input into account achieves a higher grasp success rate across all object sets and disturbances. It is noteworthy , however, that although we trained the network on the quality metrics from the shake task, the generated grasps also performed well on the twist task.
Focusing on the shake task results, the average grasp success rate using our approach over all stiffnesses compared to the baseline is 35% higher on the Common objects and 11% higher on EGAD! objects. Moreover, the performance of the baseline approach deteriorates significantly when moving from a high to a low value of Young's modulus. For instance, on the Common test set, the relative performance drop for the baseline approach when changing the Young's modulus from 2·10 6 to 2·10 5 is 10%, and from 2·10 6 to 2·10 4 the drop is 26%. This decline is much higher compared to the 0% and 13% drop using our approach. Similar performance differences are also observed for the EGAD! test set. The primary reason the baseline approach witnesses a higher performance drop is because it generates the same grasps for a target object regardless of its stiffness. Although the generated grasps often picked the objects successfully, they usually slip away from the gripper during the shake or twist task. In contrast, the network that took the stiffness input into account learned to avoid areas with a high probability of slippage, resulting in a higher grasp success rate. In addition, it is noteworthy that there is also some deterioration with the highest stiffness (2 · 10 9 ) for both approaches. The primary reason is that some objects in both test set have very complex shape which are extremely hard to grasp when they are rigid. This observation strengthen the idea of taking advantage of object deformation to successfully grasp complex-shaped objects.
We also evaluated the models on the multi-material object 7 shown in Fig. 5, where the stiffness of the red part could differ from the blue part. The result showed that if we assumed object 7 is entirely rigid, the method could not generate any good grasps on it. However, if we assumed the red part was softer than the blue part, we could generate successful grasps that usually aimed for the softer area of the object. This simple example demonstrates the benefit of planning stiffness-aware grasps on irregular-shaped multimaterial objects.

B. Sensitivity Analysis
To examine the robustness of the results of our approach in the presence of uncertainty, we conducted a sensitivity analysis where we introduce uncertainty to the input stiffness images by varying the stiffness parameter , i.e., Young's modulus across an error range of [-60%, +60%]. We then evaluated the generated grasps on the Common test set under the shake task. Fig. 6 shows the result of the sensitivity analysis. We can see that the grasp success rate decreases consistently with increasing error, but that small errors in the range of 5-15% have only limited negative effect on the performance. This experiment indicates that our method is robust against some errors in expected stiffness, which suggests potential for realworld application even when the stiffness is not precisely known.

C. Grasp Transfer to Physical Robot
To investigate how well the synthesized grasps perform in real world, we evaluated the grasp success rate on a Franka Emika Panda equipped with a parallel-jaw gripper. This allows us to study if grasps generated with the approach trained only on synthetic data transfer to real objects. 14 objects to grasp (Fig. 7) were chosen as they represent a high variation in size, shape, and stiffness.
We used an Intel RealSense D345 camera mounted to the robot's wrist to capture the RGB-D image. In addition to depth image, we also need to provide a stiffness image of the object. To do this, we segmented the object from the scene by subtracting the background and the table from the image and then assigning the same stiffness value to each pixel that the object occupied. We manually set the magnitude of stiffness for each object according to its perceived stiffness. The best grasp pose is then computed using the proposed method.
The robot executed the best grasp by moving to a pregrasp position approximately 25 mm above the grasp. Then, the robot moves linearly downwards until reaching the grasp pose or contact with the table is detected. From there, the robot closes its gripper, lifts the object, performs a predefined trajectory, and finally places an object at the goal position. A grasp is successful if the robot can pick the object and move it without dropping it. Otherwise, it is unsuccessful.
To single out the effect stiffness input has on grasp performance for each object, we ran the experiments with three different stiffnesses: the correct one, only high, only low. For each object and stiffness, we randomly placed it ten times and evaluated the best grasp candidate. In total, this setup amounts to 420 grasps on 14 objects.
The experimental results are presented in Fig. 8. The results show that with the correct stiffness information, the grasp success rate of the proposed approach is approximately 17% higher than when the stiffness is assumed to be either high or low. This result indicates that grasps generated on rigid objects do not necessarily transfer successfully to deformable objects and vice versa.
For instance, if we assumed the deformable objects, such as objects 8,9,11, and 14, were rigid, many generated grasps were on specific parts of them, such as the arms or legs of the toy or wheel of the car. These grasps usually picked the object successfully but then, due to the deformation, dropped it when the robot started to accelerate. The same experiment on objects 6, 7 showed that grasps generated on the top or bottom of the object usually failed due to the elasticity of the object. If we instead assumed rigid objects, such as 1, 3, 4, 5, were soft, the network generated pinch grasps that failed due to collisions with the object. Some failed grasps are shown in Fig. 10a.
One interesting finding was that given the correct stiffness information, the method was able to generate different grasping strategies depending on the stiffness of the object, as shown in Fig. 9. Specifically, in the case of the soft sponge Fig. 9a, the proposed method learned that the grasp quality is high across the whole objects thanks to their deformation, which in turn, enables pinch grasps. While in the case of a hard sponge Fig. 9b, the high-quality grasp tends to be generated at the center of the object, and the grasp width is almost as big as the object in order to successfully cage the object.

D. Discussion
All experimental results show the benefit of generating stiffness-aware grasps. By comparing the proposed approach to the case where the stiffness information is ignored, we see that the proposed approach achieves higher grasp success rates. The primary reason for the difference in performance is that the object stiffness facilitates learning where to generate grasps that minimize the slippage caused by the deformation. If the object's stiffness was ignored, the network generated the same grasps regardless of the object stiffness. Together, these result backs the claim made in [9] that grasps do not transfer well between rigid and deformable objects. Therefore, incorporating object stiffness in robotic grasping pipelines is beneficial when dealing with a wide range of unknown objects.
Another interesting finding is that our approach can generate different grasp types such as pinch or cage grasps depending on the stiffness, even though there was no pinch grasps in the training dataset. This behavior is shown on object 6 in Fig. 9. Specifically, the sponge with a low Young's modulus admits pinching behavior where the grasp press on the object and pinch, while the hard sponge only admits caging grasps. One potential reason for such behavior is that the proposed network learned that the grasp quality is almost uniform across soft objects thanks to their deformation. Similar behaviors were also reported in [38] where data was collected from 14 robots over the course of two months. However, it is worth pointing out that our approach learned to produce the same behavior on a completely synthetic dataset with orders of magnitude fewer data. Furthermore, our proposed approach provides more meaningful insights regarding the relationship between object deformation and grasps.

VII. CONCLUSIONS AND FUTURE WORK
Grasping deformable objects has not been as well studied as rigid object grasping due to complexity in the modeling and simulating the dynamic behavior of such objects. However, with the rapid development of physics-based simulators that support soft bodies, the research gap between rigid and deformable objects is shrinking. To leverage the capability of such simulators and to challenge the rigidity assumption that has dominated robotic grasping, we presented an approach to synthesize grasps on objects with varying stiffness by a deep neural network trained on purely synthetic data. The key (b) Cage grasp on rigid sponge E = 2 · 10 9 . Fig. 9: Stiffness input image, along with grasp quality map and best synthesized grasp candidate indicated by two white fingers. For stiffness input image, the darker the color the stiffer the object is. For the grasp quality map, red indicates higher quality, and the green point denotes the best grasp. idea in this work is to integrate the object stiffness property into the grasp planning pipeline to study the relationship between the object deformation and the generated grasps. To train the proposed network, we generated our own training dataset using the Isaac Gym simulator. We demonstrated the performance of the generated grasps through experiments in both simulation and real-world scenarios on a wide range of objects with varying sizes, shapes, and stiffness. The results show a clear improvement in grasp success rate when taking stiffness property into account. Furthermore, the proposed approach shows the ability to generate different grasp strategies depending on object stiffness. The generalizability to objects with non-uniform stiffness remains open although the method should be able to account for all variability captured in the training data, making the simulation quality and efficiency a central bottleneck.
The idea of exploiting deformations for grasping opens many interesting avenues. Our proposed method employs FEM simulations for which the computational cost may limit their use for general-purpose systems. Investigating the possibility to devise analytical quality measures that would exploit the deformations is an important future work.