Probabilistic Surface Friction Estimation Based on Visual and Haptic Measurements

Accurately modeling local surface properties of objects is crucial to many robotic applications, from grasping to material recognition. Surface properties like friction are however difficult to estimate, as visual observation of the object does not convey enough information over these properties. In contrast, haptic exploration is time consuming as it only provides information relevant to the explored parts of the object. In this letter, we propose a joint visuo-haptic object model that enables the estimation of surface friction coefficient over an entire object by exploiting the correlation of visual and haptic information, together with a limited haptic exploration by a robotic arm. We demonstrate the validity of the proposed method by showing its ability to estimate varying friction coefficients on a range of real multi-material objects. Furthermore, we illustrate how the estimated friction coefficients can improve grasping success rate by guiding a grasp planner toward high friction areas.


I. INTRODUCTION
Nowadays, robots are used extensively to perform various tasks from simple pick-and-place to sophisticated object manipulation in complex environments from factory floors to hospitals.For such tasks, robots are required to interact with, and adapt to, unknown environments and objects.In order to successfully accomplish these tasks, robots need to identify various properties of the objects to be handled.For these reasons, identifying object models that can represent the properties of objects has become a crucial issue in robotics.
Many object-modelling approaches have focused on object shape and geometry by utilizing vision [1]- [3].However, other physical properties also play an important role in characterizing object behavior during interaction and handling.In particular, surface properties such as surface friction, texture, and roughness are vital for manipulation planning.
Various methods have been proposed to learn object surface properties from vision [4]- [6] or haptic feedback [7]- [11].Also the combination of both vision and haptic cues [12] has been proposed, similar to human perception [13].However, most published works assume the surface properties to be identical across the whole surface of the object.This assumption does not hold for many real objects, since objects often consist of multiple materials.
To address this, we propose a method to estimate the surface properties of a multi-material object by combining a visual prior with haptic feedback obtained while performing haptic exploration on the object.We focus on one property, Fig. 1: After the shape of an object has been captured from a camera, surface friction on a small part of the object is estimated using haptic exploration.The friction over the entire object is then predicted by coupling visual information with the local haptic measurements.the surface friction coefficient, but the proposed method could be applied to other properties such as texture and roughness.The approach is based on the assumption that visual similarity implies similarity of surface properties.By measuring a property directly using haptic exploration over a small part of the object, the joint distribution of visual and haptic features can be constructed.Using the joint distribution, the measurement can then be generalized over all parts of the object that are visible.The inference allows recovering both the expected value of friction for each part as well as a respective measure of prediction confidence.
The main contributions of this paper are: • a probabilistic method to estimate the object friction coefficient based on visual prior and haptic feedback without being restricted by the assumption that objects have uniform homogeneous friction everywhere; • a set of experiments on a physical robot showing the proposed method working on a wide range of objects, including multi-material objects; • a case study, demonstrating the ability of the estimated friction coefficients to guide grasp planning towards areas of high friction, improving grasping success rate.

A. Friction estimation from vision
In the context of friction estimation from vision, most recent improvements come from the adoption of deep learning.Majority of the works focuses on recognizing the material using images then assigning the friction coefficient based on a material dataset.For instance, Zhang et al. [4] presented a material recognition method of deep reflectance code that encodes information about the physical properties of a surface from reflectance measurements.The predicted friction coefficient of a sample is then assigned as the average of friction coefficients of corresponding samples in the dataset.Another approach for material recognition was proposed by Xue et al. [5], where they developed a deep neural network architecture using angular variation features.Brandão et al. [6] proposed a solution to this problem by combining a state-of-the-art deep Convolutional Neural Network (CNN) architecture to predict broad material classes from images with known distributions of material friction.The predicted friction coefficient is then used to plan the locomotion of biped robot on different terrains.
One downside of the aforementioned approaches is that wrong material recognition will lead to wrong friction coefficient estimation.As the mentioned works utilize different visual features to recognize the materials, the recognition result depends heavily on the quality of visual input.However, vision is usually impaired by occlusions, lighting condition, and location of the sensor.Furthermore, even assuming the visual input to be perfectly gathered, these approaches may also fail in cases such as 1) different visual features having the same friction coefficients, or 2) same visual features having different friction coefficients.In this paper, we overcome these limitations by combining the visual input with haptic feedback to directly estimate friction coefficients of an object.The addition of haptic feedback offers the possibility of accessing information that is hardly perceptible visually.In addition, the aforementioned works only attempt to estimate surface properties for outdoor scenes and applied to mobile robotics domain while in this work we target household objects.

B. Friction estimation from haptic feedback
The idea of using exploratory actions to estimate physical properties of objects has been carried out in many works.One of the earliest work on friction estimation from haptic feedback was proposed by Yoshikawa et al. [7], where the authors described a method to estimate the friction distribution and the centre of friction of an object by pushing it with a mobile manipulator.Similar works on estimating friction coefficient were carried out with different exploration actions such as pushing [14], pressing [15], or lateral sliding [8]- [11].However, all these works are only valid under the assumption that the surface properties are identical across the whole object surface.This limitation makes it difficult to apply the methods on a wider range of objects including multi-material objects, something that we address in this work.Rosales et al. [12] attempted to lift this limitation by proposing a representation that consists of both shape and friction that were gathered through an exploration strategy.They then used a Gaussian Process to approximate the distribution of the friction coefficient over the surface.However, the results presented in that work show that the friction coefficient is only estimated for the regions that are explored by the robot.Unexplored areas on the object surface are then assigned a non-sense value.Our method, on the other hand, estimates friction coefficient also for unexplored areas based on information gathered from explored areas.Additionally, [12] only considers single-material objects such as a paperboard box, and a metallic can for experimental evaluation.In this work, we experimentally evaluated our method with different objects including both single-material and multi-material objects.

III. PROBLEM FORMULATION
The problem of identifying the friction coefficients of an unknown object is difficult.A typical approach for this problem is to utilize haptic sensory feedback.However, haptic feedback is usually high cost, noisy, and unreliable due to the fact that haptic sensing introduces many issues such as low durability, low performance, high cost, and less compatibility with other sensors [16].Another way of gathering data that can be processed for friction estimation is visual sensing.Although visual feedback is cheap and intuitive, it does not always provide enough information to identify friction coefficient of an unknown object.In this work, we address the problem of estimating friction coefficient of unknown objects lying on a supporting plane by combining both visual and haptic sensing.The goal of this paper is to understand the correlation between the known visual and haptic feature of the target object and, based on that correlation, to then extrapolate the unknown haptic features from the visual features.
Visual information about the scene is obtained by a RGB-D camera whose pose is known relative to the robot.Haptic feedback is gathered while performing exploratory actions on the target object using a robotic arm equipped with a plastic finger and force/torque sensor.To make it easier for latter representation, the regions that are touched by the robot during the haptic exploration are called explored regions, and the one that are not touched unexplored regions.Let V and H denote visual and haptic features, respectively.For the explored regions, both V and H are known while for unexplored regions only V is known.The goal is to infer haptic feature H from the visual feature V of unexplored regions based on the joint (V, H) model of the explored regions.In other words, the objective is to find the conditional probability P(H | V) for unexplored regions.
To this end, we propose to model the joint distribution P(V, H) by fitting a Gaussian Mixture Model (GMM) over all the visuo-haptic features of all points in the explored regions.More precisely, for an object with n materials, let us build the C-component GMM that better fit the visuohaptic data from the explored regions, with one component for each of the n materials, plus one background component describing an uninformative visuo-haptic prior, i.e., C = n + 1. Formally, we build a multivariate GMM model M to estimate the joint probability distribution P(V, H) from gathered data in the explored regions, i.e., where π c , µ c , and Σ c denote the prior probability, mean, and covariance of the c-th Gaussian component respectively.Formally, let us decompose the GMM parameters µ c and Σ c as After the model M is fitted to a given dataset, by using Gaussian Mixture Regression (GMR), we can estimate the haptic feature H i at each data point i = 1, ..., N in the unexplored regions given input V i by means of the conditional probability P(H | V) expressed as Then, given an input V i , the mean μH i and covariance ΣH i of its corresponding output H i are computed by where IV. IMPLEMENTATION The system pipeline shown in Fig. 2 consists of: (i) filtering and pre-segmenting the real objects, (ii) conducting an haptic exploration process to gather and couple haptic data with visual data, (iii) modeling the variable friction model using visuo-haptic data, and (iv) inferring the friction coefficients and inference confidence of unexplored regions from the modelled distribution.

A. Visual filtering and pre-segmentation
The scene in the original point-cloud Y contains a target object lying on a table.As the pose of the camera with respect to the robot is known, we can first remove points that are part of the supporting surface, and points that belong to the background indicated by their distance exceeding a certain threshold.We then obtain a filtered point-cloud Ȳ containing only the view of the object to be pre-segmented.
Formally, let y = (x p , V) denote a point in the filter point-cloud Ȳ (y ∈ Ȳ ), x p ∈ R 3 is the position of the point with respect to the camera frame, and V ∈ R 3 is the RGB component vector representing its visual feature.It should be noted that for objects with textured appearance, visual texture features could be used instead of color.
As discussed in the previous section, the main goal of this work is to estimate haptic features for unexplored regions based on the gathered visual and haptic features of explored regions.Thus, we consider a region-based representation of the target object.The idea is to divide the target object into a large number N of connected regions, wherein each region i = 1, ..., N has its own visual feature and haptic feature.We achieved this using a state-of-the-art supervoxel segmentation named VCCS [17].Given the filtered point-cloud Ȳ , we Fig. 3: A sample object, together with the filtered point-cloud of the target object divided into regions through supervoxel segmentation [17].
These constrains guarantees that all points in the filtered point-cloud Ȳ belong to a region, and no point can belong to two regions.Fig. 3 shows the result after the filtering and pre-segmentation step.The next step is to gather and couple haptic data with visual data through haptic exploration.

B. Haptic exploration and tracking
In this work, haptic data is gathered using a force/torque (F/T) sensor attached to the wrist of a robot arm.The haptic data consists of the contact point on the object surface x c ∈ R 3 with respect to the robot base frame calculated using forward kinematics, and the contact force between the finger and the object f ∈ R 3 expressed in contact frame.The kinetic friction coefficient can be then estimated using Coulomb friction model as where f t and f z are the tangent and normal forces at the contact point, respectively.The reason we chose Coulomb friction model is that it produces less noisy point-wise estimation and requires less computational time compared to others friction model such as LuGre model [18].This claim is made based on the result of conducted experiments where we used both Coloumb and LuGre friction models to estimate the friction coefficient of an object, one of which is shown in Fig. 4.
To obtain haptic data of an object, we perform an exploratory action called lateral sliding on the object where we use the robot arm equipped with an F/T sensor to slide on the object surface along a linear path.Hybrid force and position control is used during the exploration to guarantee the contact with the object surface.The exploration depends on the object remaining immobile.In case the supporting surface does not immobilize the object, this could also be achieved by using a dual arm setup.
During the exploration, we also need to map the gathered haptic data to the corresponded position in the pre-segmented visual data obtained from the previous step.This is done by first tracking the point y that has the closest distance to the current contact point (x p ≈ x c ), y is then assigned the calculated friction coefficient at the current contact point from 7. The benefit of using this approach is that some uncertainties caused by calibration procedure can be reduced.After the exploration and tracking, the explored points include both visual V and haptic data H, which can be represented as y = (x p , V, H).The next step is to learn a model from the explored regions and use it to infer haptic feature for unexplored regions.

C. Variable friction model using visuo-haptic measurements
We consider a visuo-haptic dataset ξ = {ξ j } N j=1 defined by N observations ξ j ∈ R D .Each datapoint ξ j is represented with input/output components indexed by V and H, so that ξ j = ξ V j ξ H j and D = D V + D H .In this work, ξ j is a concatenation of visual features, (input component) and haptic features (output component).
As the haptic exploration is conducted only once on a linear path along the object, the tip of the robot only touches a few points in each explored region.Thus, the number of points that has been assigned a friction coefficient value is always smaller than the total number of points of each explored region.Therefore, the covariance between RGB color components are computed using all of the points in the region while the covariance between friction coefficient and color components are computed using only the points that are touched by the robot.Then, given the visuo-haptic dataset ξ, we use a GMM with C-components optimized through Expectation Maximization (EM) to encode the joint probability distribution P(ξ I , ξ O ).After a GMM is fitted to the dataset, GMR can subsequently be used to estimate haptic features ξ O * for visual features ξ I * ∈ R D of unexplored regions as mentioned in Section III.
Additionally, as discussed in Section III we include a background component in the GMM model to reflect the estimation uncertainty.The background component is constructed using measurements from the entire scene, thus representing the entire variability.If an input, i.e., visual feature, is not close to any main components, the corresponding output will depend primarily on the background component.This Fig. 5: The experimental setup allows the model to capture the estimation uncertainty such that a high variance is predicted when a particular region is not visually similar to any of the regions with haptic measurements.This uncertainty measurement can be used to actively make requests for new haptic explorations in uncertain regions.

V. EXPERIMENTS AND RESULTS
To demonstrate the ability of the proposed approach to estimate surface friction without being restricted by the assumption that objects have uniform homogeneous friction everywhere, we first evaluate the capabilities of the method by testing it with different objects, including multi-material objects.Furthermore, we report on the repeatability of the results.Afterwards, we present a grasping case study in order to demonstrate the benefits of accurate friction estimation in practical robotics applications.

A. Experimental setup
The experiments are performed using the Franka Emika Panda robot and a Kinect 360°camera to capture the input point-clouds as shown in Fig. 5.We used an Aruco marker for the extrinsic calibration of the camera.Once the input point-cloud was captured, it was filtered and pre-segmented as explained in Section IV.To perform the haptic exploration, we used a six-axis force-torque sensor (ATI Mini45) attached between the robot's wrist and the gripper.

B. Model representation with real robot and objects
To study the capability of the proposed method, we ran the experiment on seven different objects, shown in Fig. 6.Of these objects, the book and the cereal box represent singlematerial objects, where friction coefficient is identical across the whole surface, while the rest of the objects are composed of multiple materials.
Fig. 6 shows qualitative results obtained from the proposed method applied on the target objects.We show in the result the original object, the estimated friction, and the estimated uncertainty of each target object respectively.For the estimated friction, regions are colored according to their friction coefficient value, i.e., more red corresponds to higher friction.For the estimated uncertainty, the level of uncertainty is represented by green color, such that the greener the higher uncertainty.The results show that the proposed method has successfully estimated the friction coefficient for all the target objects even in presence of multiple materials.Specifically, the method is able to not only produce similar friction coefficients across the object surface in the case of single-material object (i.e., book and cereal box), but also to provide different friction coefficients for different parts of objects in the case of multi-material objects.For example, in the case of the yellow mug the method estimated higher friction coefficient value for the rubber band, and provided an accurate boundary between the rubber part and the ceramic part of the object.
Another interesting point from the result is that our method works even in cases where visual features varies a lot but haptic features are identical across the surface, like the book and the cereal box.As this scenario would likely cause problems to methods estimating friction only from vision as discussed in Section II, this experiment shows the benefit of combining both visual and haptic feedback for friction estimation.Additionally, in the case of the butcher knife with rubber handle, without encapsulating haptic feedback, one would hardly know if the handle has lower (made of plastic) or higher (made of rubber) friction coefficient compared to the steel blade.Since we conducted haptic exploration across the object, our method is capable of estimating higher friction coefficient for the handle, which is consistent with the ground truth object.
Furthermore, the experimental results also show the uncertainty map of the proposed method.In the case of the book and the cereal box, the uncertainty is extremely low because there are no abnormal visual features.In other words, all visual features of unexplored regions are almost as varied as the ones of explored regions.On the other hand, for the yellow mug case, the uncertainty is high at the right edge of the mug where the blue pattern is located.This result is suspected to be because the haptic exploration is carried out only in the middle of the mug, where the color is always yellow or brown.As the blue color is far from the explored visual feature, that region is classified by the background class, which in turn produces a high uncertainty for the prediction.A similar result is shown in the case of the butcher knife, the uncertainty is high in the region that has different lighting condition.However, in both these cases, even though the uncertainty is high, the estimated friction is still similar to that of the rest of the regions of the same material.

C. Repeatability of the results
Next we evaluated the repeatability of the proposed method by running the experiment five times on each object.The estimated friction coefficient is recorded and plotted as density plots for analysis as shown in Fig. 7.For brevity, only the results of the cereal box representing single-material objects and the yellow mug representing multi-material objects are plotted for discussion.The results show that the proposed Fig. 6: Real objects, along with estimated friction coefficient and uncertainty returned by the proposed method.Red indicates higher friction coefficient value, while green denotes higher uncertainty (best viewed in color).method produces reasonable result in term of repeatability.Note that despite small variation between repetitions, we still can clearly see that both density plots represent accurately the number of the material in each case.For example, in the case of the cereal box the density plot only has one peak which denotes a single-material object while the density plot of the yellow mug contains two peaks representing a multimaterial object.

D. Grasping case study
In robotic grasping, grasp selection is usually complex as its stability takes into account different factors such as gripper geometry, object shape, mass, and surface friction.In particular, a set of best grasps are sampled and evaluated from a grasp sampling and evaluation algorithm.Typically, parameters like surface friction are kept constant during the process; however, the selected grasps that are executed in the real worlds may fail as well due to the contact with low friction surface.In order to demonstrate the usefulness of our method, we conducted a grasping case study where the estimated friction information is used to sample and evaluate the grasps.
In this demonstration, we first capture the target object from different viewpoints and merge them together in order to obtain a multi-view point-cloud of the object.The proposed method is then applied to the given point-cloud to produce the surface friction estimation.As the input of the grasp sampler is typically a mesh, we converted the estimated point-cloud to a mesh using Meshlab.Since the mesh does not contain any information about the estimated Fig. 8: The grasps generated from the grasp samplers.In the case of non-uniform friction, the left part of the object has higher friction than the right part.The color of the gripper represent the quality of the grasp, the greener the better.friction, we calculate the center of each face of the mesh, find its closet point and assign the friction coefficient value of the point to the corresponded face.Next, the mesh with assigned friction coefficient is fed to a grasp sampler to generate grasp candidates.Grasp candidates are sampled using antipodal grasp sampling method.Specifically, we randomly select a point on the mesh and assume this point as the first contact point.At this first contact point, a direction ray that lies inside of the friction cone is generated.If the ray intersects with a certain face, the intersect point will be the second contact point, and the closing vector is assured.Next, we randomly generate an approach vector along the closing vector.The generated grasps will then be checked for different type of collisions.The grasps that collide with the mesh are filtered out and the grasps that are not in collision are then evaluated using Ferrari & Canny L1 quality metric [19].
In this study, we sample 1000 grasps each on two object models: one with uniform friction coefficient and one with non-uniform friction coefficient computed using the proposed model.The object used in this study in the black cup as shown in Fig. 6.Good grasp candidates together with friction cones at contact points of both cases are presented in Fig. 8.These results show that the sampler behaves as expected: In the case of the model with a uniform friction coefficient (Fig. 8a), the grasp candidates are distributed across the entire object, while the grasp candidates on the non-uniform friction coefficient model (Fig. 8b) only appear around the left side of the object, where the friction coefficient is high.
To study the effect of the proposed method on grasp performance in the real world, we chose five best grasps in each case and executed them with the real robot.For the uniform friction coefficient case, we chose the five best grasps that are on the right side of the object.By doing this, we can see how the grasps actually behave if they make contact with low friction area of the object.The grasps in both cases were executed with the same grasping force of 20 N. To evaluate if a grasp was successful, the robot moved to the planned grasp pose, closed its fingers, and moved the arm back to the starting position.Once there, the gripper was rotated around the last joint (Fig. 9).A grasp was consider successful if the object was grasped in a stable manner for the whole procedure and unsuccessful if the object was dropped 1 .
All evaluated grasp candidates generated with the sampler that utilized the proposed method were successful as they grasped the object at the high friction area.The chosen grasp candidates generated in the uniform friction case failed to grasp the object since they aimed for the low friction area of the object, which in turn, produced slippage during grasping.The results show that even when the grasp sampler generates grasps with high quality under the uniform friction case, the grasps still may fail when executed in the real world due to incorrect friction assumption.

VI. CONCLUSIONS AND FUTURE WORK
We presented an approach that enables the estimation of local object physical properties, like the surface friction coefficient, from visual and haptic cues, which goes beyond the state-of-the-art by lifting the assumption that the target object has uniform friction across its surface.The key component in this work is the use of a probabilistic model to estimate the surface friction coefficient of the unexplored areas from visuo-haptic data gathered by haptic exploration.Furthermore, we also presented an approach to represent a level of uncertainty of the estimate.This could be useful in future work to actively make requests for new haptic explorations in the regions with high uncertainty.We demonstrated the capability and the repeatability of the approach through experiments on a wide range of objects including single-material and multi-material objects.The results shows that the proposed approach is capable of providing object representations with varying surface friction coefficient.Moreover, the friction coefficients can be used to guide grasp planning towards areas of high friction, improving robotic grasping success rate.

Fig. 2 :
Fig.2:The proposed pipeline: the object's visual properties are first acquired as point-cloud, which is then filtered and pre-segmented into regions.The robot then performs an haptic exploration over some of the regions.The proposed model then is used to estimate the friction coefficient over the whole object, together with the corresponding confidence.

Fig. 4 :
Fig. 4: A qualitative comparison between friction models.The orange line denotes the friction coefficient estimated using Coloumb friction model, while the blue line denotes the one using LuGre friction model (best viewed in color).

Fig. 7 :
Fig. 7: The density plots of the cereal box and the yellow mug, showing consistent friction coefficient estimation over five repetitions of the experiment.
(a) Uniform friction case (b) Non-uniform friction case

Fig. 9 :
Fig. 9: The robot executing different grasps on an object.The grasps are proposed by a sampler employing (a) the proposed friction model, or (b) a uniform friction model.