Precise Robotic Manipulation of Bulky Components

Increasing the flexibility of robots needs systems more capable in perceiving and interacting with the environment. A challenge is still to easy design the robotic system around the application, especially when the objects to be manipulated are bulky, and the relative positions between the robot and the objects to be manipulated are uncertain and high precision is required to successfully complete a task. In this paper a possible guideline to design a system capable to localize itself, identify a target, bulky, object and manipulate it, is presented. A method for tuning the impedance control parameters is shown, to keep interaction forces below dangerous values. The autonomous localization, grasping and assembly of a sidewall panel of an airplane is used as test. Experiments show that the success rate of completing a task increases, combining vision perception and force control, with respect to the single use of visual localization and position control.


I. INTRODUCTION
The robotic manipulation and assembly tasks are widely studied and implemented in industry [1]. Typically the goal in industrial automated assembly applications is to reduce the cycle time, reducing costs and increasing productivity [2], [3]. In many applications, it is already possible to achieve high speed motion with autonomous free-collision trajectory generation [4], and, in the same time, high positioning performance [5], [6]. Many examples can be found in automotive applications [7]- [9] and electronic manufacturing where high precision in positioning is required, as well as flexibility to manipulate different small components [10].
In a wide field of applications, however, the relative position of the robot and the objects to be picked and handled is known with limited accuracy. In such a condition, when the inaccuracies to be compensated are few millimeters, or tens of degrees, the impedance control is widely used to supervise the interaction [12]- [14]. These algorithms have been proved in applications that need high-accurate force tracking [15]- [21], The associate editor coordinating the review of this manuscript and approving it for publication was Okyay Kaynak . but cannot compensate large deviations, since they rely only on the information given from measurements of forces and torques.
Large positions deviations represent an issue not only when the position of the objects to be manipulated is partially unknown, but also when the manipulated objects are large, and a small error in the picking orientation introduces large errors at the object extremities. In such a condition, the use of visual feedback in robotic control has been widely investigated from the begin of the '90s [23], realized mounting the camera on the robot wrist (eye-in-hand) or on a fixed position on the robot base (hand-to-eye). On the one hand, eye-in-hand is better in term of achievable accuracy since the calibration of the position of the camera with respect to the robot hand is often linearizable, and less sensitive to modelling errors. However such configuration suffers from occlusions, and when the handled object is texture-less, as close the camera is, as less references it can see. Therefore, when large objects have to be handled, the eye-in-hand configuration cannot be implemented. On the other hand, the hand-to-eye configuration allows greater flexibility in the positioning of the camera (or of the cameras), granting the possibility to avoid FIGURE 1. Example of manipulation of bulky objects in unstructured environment from EURECA project [11].
occlusion. However, such configuration is less accurate, since the calibration model is strongly non-linear [24], and if the vision system is mounted far from the object, the accuracy of the estimated position decreases.
The coupling of impedance controllers and visual servoing have been investigated from the very beginning of the 90's, to overcome these limitations [25], [26]. As matter of example, in [27] a vision system is used to online generate the set point for a robotic manipulator controlled by impedance control during peg-in-hole task. In [28] the authors use impedance control to improve the performances in the estimate, coming from a camera, of the position of a known and planar object. [29] propose an hybrid force and vision based impedance control in which vision is used to generate fictitious forces capable to modify the robot motion according to impedance law. Similarly [30] propose to use vision system to detect obstacles and to generate fake forces to perform an insertion. In [31] the authors used a vision system along with impedance control to implement an hybrid position/impedance control for the assembly of a small component. With the proposed approach it is necessary to teach the robot and the vision system by a demonstration assembly to calibrate and close the vision-robot-object chain, moreover impedance control is active in the direction of assembly to not exceed assembly forces, while in the other directions, position control is used, making impossible any error compensation due to the vision system. An important step ahead in the state of the art is [22], where a position-based visual impedance control is proposed, shown in figure 2. Specifically, the methodology encapsulates a motion planner between the low-level impedance controller and the pose estimation stage, resulting in a framework that can be adapted in many industrial applications. The drawback of this approach is that the vision-system should have a continuous, clear, vision of the scene to feed impedance controller. This assumption is far to be realistic when bulky objects are manipulated, due to occlusions and to the fact that even a small error in the position estimate can become huge proportionally to the distance of the edges (e.g., an error of 1 • becomes ∼17 mm at the distance of 1 m). With the aim of deploying a methodology that increases the performance of a robotized system when picks and handles bulky objects, the here proposed approach extends the approach in [22], and the target position is computed asynchronously, triggering a new behavior for the robot. Making asynchronous the generation of the set-point, makes challenging the tuning of the impedance controller, since the vision system accuracy cannot grant the compensation of minor errors. Indeed, when large objects are manipulated, precision of 10 mm ∼0.5 • in the pose identification can be considered as a technical challenge. Impedance control is typically already implemented by the robot producer, and in any case is of easy implementation for a typical user, hence a fine tuning of its parameters is introduced to fulfill tolerances and force constraints, considering large objects. The work, therefore, will describe the criteria for the selection of the best methodology for each step, and how to tune the different modules, in an easy way. Specifically, the aim of this paper is to show how precision position of a robotic manipulator as well as low interaction forces can cohabit, exploiting vision system and impedance control, compensating for each other errors, involving the manipulation and assembly of a huge and bulky component, under restrictive tolerances. Such approach is tested in a real industrial case such as the autonomous assembly of an airplane sidewall panel, showing the power of the cooperation.

II. METHOD
In this section, easy and ready-to-use rules for the choice and tuning of various modules, for the precise robotic manipulation of a bulky object in a cluttered environment, is presented. The description of a method for choosing the hardware/software components according to the requirements, is introduced, than the driving choices for vision system selection are presented, along with a fine tuning of the classical impedance control parameters.

A. APPROACH
The flowchart in figure 3 presents the driving choices to select the proposed method, when commonly used approaches are not feasible.
Considering the pick and place tasks, usually, in industrial environments, fixed base robots are used to pick an object and VOLUME 8, 2020 place, or assemble, it. The positions are predefined, in assembly lines, and precomputed, collision-free, trajectories are easily executed by the robot. In case the object and assembly poses are not known a priori, a vision sensor is typically introduced to localize them.
Many previous studies suggest 2D vision as a reliable and good approach, nevertheless, in the absence of strong features as textures, colors or high contrasts, 2D approach lacks in performance. Moreover, in the considered case, where distance and orientation of the object is not fixed, different viewpoints of the same object can create a different perception of the object, making 2D vision approach not feasible. When non-structured environments are considered, light source can not be fully controllable, and the object has different location and orientation with respect to the camera, 3D vision overcomes many challenges related to low contrast and poor lighting, as well as distance and orientation uncertainties, typical of 2D systems. Popular techniques for object recognition and pose estimation, based on 3D data, exploit 3D CAD models for object recognition and shape matching [32]. Many of these techniques exploits local descriptors [33], [34]. Local descriptors rely on strong geometrical local features. They do not have the notion of the whole object, but describe the local geometry around key points. As alternative, global descriptors are used. They are calculated for a set of points (i.e. a cluster) that represents an object. In the presented use case, an object with poor geometrical features is considered, hence a pipeline based on object segmentation and principal component analysis can be used as global descriptor. A segmentation step is required before extracting the cluster that represents the object.
Given the repeatability and accuracy of the vision system/algorithm, if tolerances required by the process are respected, an easy way to perform the manipulation is to compute online the trajectories according to the visual information, and execute them with a pure position trajectory following control. On the contrary, if corrections are required, visual servoing is a widely used approach to online compensate the motion with respect to errors in poses. Visual servoing might not be feasible, in particular considering the manipulation of large objects, that occlude the field of view of the camera. In such cases providing the robot with additional sensing can solve the problem. Interaction is involved, hence with a properly tuned impedance control, the robot can be provided with the compliance necessary to successfully perform the manipulation task.
The remainder of this section addresses the description of the vision system and algorithm, and a tuning method for the impedance control parameters, subject to geometrical and force constraint.

B. VISION SYSTEM AND POSE ESTIMATION
Considering a bulky object, camera specifications and its position with respect to it are crucial. In order to frame the entire part, the vision system must be in Eye-to-Hand configuration, meaning that the camera is rigidly placed with respect to the robot arm base.
The field of view (FOV) of the camera represents a limit with respect to the minimum distance between the camera and the object, that can be easily computed as where s is the maximum size of the object in one direction. Hence the object must be placed with respect to the camera at a certain distance greater than d min camera . Moving the camera (e.g. with a PTU) can partially relax the constraint on the minimum camera distance at the cost of increased computational time and a more complex hand-eye calibration, introducing calibration errors. Moreover every 3D sensor has its specific optimal working range, hence, as additional constraint for choosing the position of the camera, this must be taken into account: Once the position of the camera is defined, pose estimation algorithms aiming at finding the coordinates of the object (position P = (x, y, z) and orientation O = (rx, ry, rz)) with respect to the camera frame, can be designed. The developed algorithm is divided in 3 main steps: Segmentation, Matching, and Registration. Segmentation exploits the Region Growing [35] algorithm. This algorithm combines the points that are close enough in terms of smoothness constraint, so that each extracted cluster is a set of points considered part of the same smooth surface. Region Growing also eliminates clusters that have a number of points below a defined threshold. Such threshold can be computed as where d is the distance of the object from the camera, size x and size y are the sizes of the object respectively on x and y axes, and #points is the number of points in the cluster of the point cloud. In the matching phase, the first step consists in calculate the Oriented Bounding Box (OBB) for each cluster. The OBB is the smallest bounding box containing the set of points that compose the cluster. The OBB dimensions are compared with the ones of the part. If the dimensions are not comparable within a tolerance, the corresponding cluster is rejected. Instead, clusters whose OBB fulfill the geometric constraints are selected as possible match. In the end, Iterative Closest Point (ICP) [36] is used to refine the alignment and improve the pose estimation accuracy with each possible match. The distance between the reference object point cloud and the cluster of the possible match is also computed. The cluster with lower distance from the CAD model is selected and the relative pose is considered as the reference for the pose estimation.
To evaluate the accuracy of the registration the RMSE between the reference point cloud and the extracted point cloud is computed. However, pose estimation accuracy is also affected by different error sources, such as the camera calibration and the hand-to-eye calibration.
In the end, the goal of the vision system presented above is to provide the robot with good pose information, i.e., within the required tolerances imposed by the external features.

C. CARTESIAN IMPEDANCE CONTROLLER
The Cartesian impedance loop aims at modifying the tool set point, to make the robot end effector behavior as an equivalent mass-spring-damper system, described by the following: where matrices M i , C i , K i represent impedance control inertia, damping and stiffness, respectively, and can be defined according to the desired compliance of the robot, and F ext is the vector of external forces. From (4) it is possible to obtain the desired set of cartesian accelerations for the end effector [37]: Figure 4 shows the block diagram of the control schema. The behavior of the end-effector of the robot it is now equivalent to a mass-spring-damper system, reacting and adapting to external forces, making the contact with the external environment compliant instead of being rigid.

1) IMPEDANCE PARAMETER TUNING
To ensure low interaction forces with a desired dynamical behavior and a good position compensation, a method for tuning M i , C i and K i is presented in this section. Without any loss of generality, for the rest of this section, only the translational direction along one axis is considered for the impedance parameters tuning. The system equations, while Cartesian impedance control is applied, correspond to the motion of a mass-spring-damper system. The reference set-point is chosen to be zero for the tuning of the parameters, with the environment applying a force. Considering the environment as a pure stiff wall, with stiffness defined as K env , the force applied is produced by the relative motion between the environment and the system, computed as where x env and x are the environment and system positions, respectively. Considering a funnel-shaped reference as external environment, the relative motion is described by a ramp limited in time where α correspond to the slope, t is the time andt the time where the slope flattens.
If the environmental stiffness K env is not known, an estimate can be provided by an observer as in [38].
The system behavior (i.e. end-effector position) in a single direction of motion, is described by (7) M iẍ + C iẋ + K i x = F ext (7) Substituting (6) in (7) and rearranging leads to: where K aug = K i + K env is the equivalent stiffness of the augmented system. Considering an overdamped augmented system, the solution of the second order differential equation (8) is For sufficiently large times, the two exponential terms will go to zero, living with the equation of a straight line with coefficients m = K env α K aug and q = − C i K env α K 2 aug . The relative position between the system and the environment is x env − x = αt − mt + q. Considering the maximum allowed force F max = K env (x env − x max ) and evaluating the VOLUME 8, 2020  (13), is marked in red.
distance at t =t (where x = x max ), the following inequality must holds to avoid force overshoot.
The solution (9) is computed for overdamped systems, hence in the following ζ aug is taken to be larger than one, and the corresponding damping can be computed From (10), substituting (11) in q and solving for M i , the critical mass for which the maximum force is reached can be obtained.
Equation (12) satisfies (10) only for positive values of the term F max K aug − αt 1 − K env K aug , after that point for the same value of mass there will be an higher value of stiffness, the transient will be the same but the system will ''penetrate'' more the environment, leading to higher forces that exceed the maximum allowed.
A limit stiffness ratio k lim can be computed, where the mass is equal to zero Given F max , δx,t, α = δx t , it is possible to tune the values of M i , C i , K i of the desired impedance behavior that guarantees no force overshoot.
where k s < 1, m s < 1 are safety coefficients used to avoid (i) too high natural frequency w the first, (ii) force overshoot due to imperfect modeling the second.
In figure 6 the time response of the system and the force exchanged are shown for different values of k s = [1.1, 0.9, 0.5, 0.1], for m s = 1 and ζ aug = 1.2, F max = 30N , δx = 0.01m,t = 2.2s. It can be seen the wrong behavior of the system for k s > 1, the value of the masses is very similar for the first two cases, but the difference in stiffness produces different behavior hence different forces.
The steady state position error, when the ramp flattens can be computed imposing to the system a constant force F ext = K env δx, solving equation (8) leads to the final The steady state error will be D. CONSIDERATIONS The 3D vision system is able to estimate the poses within a certain tolerance, hence some considerations regarding the maximum produced errors must be taken. Errors can be introduced by (i) a wrong estimate of the pose and (ii) a wrong impedance compensation both for grasping and place position. The errors in pose estimation by the camera are due to its resolution, hence the maximum δx to be compensated by the impedance control depends on it. During the grasping phase, in the worst case scenario of maximum error in position estimate, the error after impedance compensation will be e g ss as in (16), the same applies to place position and we will have e p ss , where superscripts g and p stands for grasping and place respectively. Knowing the repeatability range rr of the camera for both the poses and assuming normal distribution, the maximum displacements to be compensated, due to linear position estimate, can be computed as δx

III. CASE STUDY
The proposed methodology is implemented in a real industrial scenario, presented in this section. The results here shown are intended to be an anecdotal description of a use case where the suggested approach generates a great improvement with respect to other two typical approaches in robotic manipulation.

A. SYSTEM DESCRIPTION
The chosen robot manipulator is a KUKA iiwa 14 R820, which is suitable, thanks to its redundancy, to perform motions in a small and cluttered environment as the interior of an airplane fuselage. Moreover it has high payload (14 kg), required for heavy objects manipulation. The robot is equipped with a tool provided with suction cups and a vacuum system to grasp, manipulate, and assemble the panel. Two independent circuits can be activated, via electrovalve, to manipulate curve objects (sidewall panels) as well as planar objects (cargo panels). The vision system is composed by a pan-tilt unit (PTU) in order to cover all the working area and a depth camera to acquire 3D data. The selected pan-tilt unit is a FLIR PTU-46 that guarantee a pan range of ±159 • and a tilt range from −47 • to 31 • with a resolution of 0.003 • . 3D data are acquired with a ODOS StarForm Swift Time-of-Flight (ToF) depth camera. The camera has a resolution of 640×480 px at 40 fps and works in the range of 0.5-6 meters and a precision around 1 cm. The robot and the camera, as well as the sidewall panel, are mounted on passive wheeled mobile carts, necessary to move them inside the aircraft fuselage easily. This element introduces uncertainties in the relative positions between robot and panel and robot and assembly point.

B. TASK DESCRIPTION
The task the robot has to perform is the autonomous assembly of an aircraft sidewall panel, depicted in fig 7.
Manipulating bulky components inside narrow spaces such as the interior of aircraft fuselage, limits the robot's movements which must be precise to avoid collisions. The correct localization of the robot and the object is fundamental to online create a correct environment for trajectory planning.
The vision system initially identifies the fuselage windows to provide the proper position of the robot in the environment, than it scans the panel and localize it with respect to the robot pose. A 3D map of the environment is than created according to the information coming from the camera, and the trajectories are planned online. Impedance control, conveniently tuned, is adopted for the tasks that requires contacts and interaction between the robot gripper and the panel or the environment, in this application for grasping and assembly. The grasping task begins with the robot positioned in front of the panel, according to the position received from the camera system. A forward motion is performed along the outgoing direction of the tool, while impedance control is active. Since it is required that all the four suction cups are properly aligned and fully in contact with the surface to guarantee a successful grasping, small reorientations are allowed by the compliant controller. Indeed even a small error in the position and/or orientation estimate of the panel, even due to the camera resolution, may lead to a misalignment of the tool with respect to the curved surface of the panel. The assembly procedure of the sidewall panel is the most complex of the whole process, indeed, since the tolerances of the assembly features are very tight (1 ÷ 10 mm depending on the directions), the allowed error is very small. Relying on pose estimation only, it is very difficult, usually impossible, to properly fit the features. As for the grasping task, the success of the assembly strongly relies on interaction control to ensure the correct position.

C. EXPERIMENTAL RESULTS
The experimental results are shown in this section. First of all a measurement of the accuracy and repeatability of the camera in estimating the poses is presented. Than a comparison between three different approaches (vision system without impedance control, impedance control only, vision system and interaction control working together) is done keeping the poses fixed. Finally the robustness of the combined approach is shown in terms of success rate, randomly changing the poses.

1) VISION SYSTEM
The most important values that the vision system has to estimate, for both the assembly and the panel positions, are x and y distances and rotation θ around z axis, with respect VOLUME 8, 2020

2) EXPERIMENTAL TESTS WITH QUASI-FIXED POSITIONS
In this section three different approaches are presented and compared in order to show the robustness of our approach, considering quasi-fixed positions (small errors in positioning can be introduced since the panel position was manually restored after each trial). Table 2 summarizes the success for each approach in localizing the robot and the panel trough vision system (when used), as well as the success in performing the grasping and assembly tasks.
a. Vision/position control The first 10 trials were made relying on vision information only: the robot target positions were exactly the positions given by the camera. Despite the success rate of identifying the panel and the windows is very high, the success of the complete task is null. This because, even if the grasping of the panel sometimes is successful, it is not done in the correct position: because of the complex shape of the sidewall panel, even a small error in the position estimate makes the grasping not feasible, or the coupling between the tool and the panel is not done according to the nominal one. It has to be underlined that the 0/10 successes of the assembly is with respect to the 10 trial, but, since it directly depends on the 4/10 grasping successes, it is actually measured on 4 trial only. b. Impedance control If only impedance control is used, without any precise information coming from the vision system, the success rate still remains very low. In this case, the grasping is made more robust due to the capability to adapt to small errors trough impedance control, but the assembly is very complex since the tolerances don't allow any big error. Moreover if the position between the panel and the robot, or the robot and the fuselage change, even a little, there is no possibility to compensate for any error, making this approach not flexible at all. c. Vision/impedance control Combining the capabilities of both the approaches, the vision system is able to compensate for huge error and estimate a good initial position, on the other hand, impedance control is very good in compensating for small errors in the estimate of the camera. In this case, for the grasping task there is no problem. In the assembly task, the success rate is very high with respect to the two other cases. Despite this some failure may still happen because of the very restrictive tolerances on the assembly features. The time responses of the position and force of the robot endeffector, in the direction of the environmental correction, are shown in figure 9.

3) ERRORS ANALYSIS
In this section the errors causing failure to the process are discussed with respect to the three different approaches illustrated above. a. Error in poses estimation: Considering the vision/ position and vision/impedance control it is not correct to talk about errors in pose estimate, rather it is a resolution  limit that arises. The repeatability is presented above, in table 1. From the experiments, it was observed that errors may occur only in cases in which the field of view of the camera is partially occluded and the object can be only partially seen. In the cases in which vision system is not present the poses are not estimated, but taken for grant even if small errors may occur, due to manual/autonomous (with an AGV) positioning. b. Error in panel grasping: Errors in panel grasping occur when the coupling between the tool and the panel is not properly done, and the panel is not grasped. They can be the result of an improper estimate/knowledge of the pose: it occurs mainly in the case of vision/position control since there is no possibility to compensate for a small error in the pose estimate. The panel surface has a complex, non-planar shape: since all the four suction cups must be in contact with the surface to grasp the panel, a small error in the estimate can produce a misalignment, preventing the possibility to properly touch the surface. Impedance control performs better in rejecting this error, since the compliance of the robot allows small reorientation, having the correct alignment and contact between the tool and the panel surface. Errors can still occur if the pose of the panel is not the nominal one. Vision/impedance control allows to have a good estimate of the pose and a compensation for small errors (in particular in orientation), increasing the success rate. c. Error in assembly: Errors in the assembly phase occur when the panel is not properly inserted in the features nor properly fixed. They can be caused either by an improper estimate of the assembly position and/or a grasping not in the nominal position, that is an incorrect coupling between the manipulator tool and the panel.
In these cases even if the assembly pose is properly estimated, during the insertion phase the panel is not properly positioned. In the case of vision/position control a compensation can not be done, while considering a compliant behavior of the robot (impedance control, vision/impedance control), if the error is into the allowed tolerances, the insertion can be properly done, adjusting small errors thanks to the robot behavior.
In conclusion vision/impedance control combines the potentiality of the vision system to estimate a correct enough initial position, compensating for small errors with the robot compliant behavior, allowing to have a very high success rate.

4) ROBUSTNESS WITH DIFFERENT POSES
Proven the reliability of the combined approach composed by vision information and impedance control, the robustness of this solution is tested, changing for 10 repetitions the position of the robot with respect to the panel and the assembly position. The positions are randomly selected, with respect to the nominal, in the range ±100 mm both in x and y direction and ±5 • for the orientation along z axis, for both panel and assembly positions. The success rate in this test is the same as in the previous case, with 9/10 successfully assembly performed. The only condition that must be satisfied is the reachability of the robot of both the panel and the assembly pose. A second important condition is to keep the robot far enough from singularity positions while force control is used, this because forces are estimated trough the Jacobian matrix, which becomes singular. With the usage of a force/torque sensor at the end-effector this constraint can be relaxed.

IV. CONCLUSIONS
Combining the potentiality of vision and force control it is shown that a complex task can be successfully completed, due to their ability to compensate for each other errors.
This paper aims to show how precision position of a robotic manipulator as well as low interaction forces can cohabit, exploiting vision system and impedance control, compensating for each other errors, involving the manipulation and assembly of a huge and bulky component under restrictive tolerances. Such approach is tested in a real industrial case such as the autonomous assembly of an airplane sidewall panel, showing the powerful of the cooperation.
In conclusion in this paper is shown how vision assisted impedance control can be useful to solve a real industrial application. For the selected application the object to be manipulated is huge an bulky and the environment is very reduced, having very low allowed errors. The possibility to perform visual servoing was precluded due to the huge dimensions of the panel, which occluded the possibility to have reference. In the end it is demonstrated the robustness with respect to error in positioning of this approach: a priori knowledge of the positions of the robot and the panel is not required.
This work considers the manipulation of a huge and bulky, rigid object and its precise assembly under strict tolerances. Future research work will address different objects manipulation, involving non-rigid objects manipulation, where some degree of deflection must be taken into account. Moreover a more complex model for the environment with a damping component will be investigated, for a better comprehension of the interaction forces that may lead to an unsuccessful assembly or damage to the object.