Modeling and Control of Sampled-Data Image-Based Visual Servoing With Three-Dimensional Features

Image-based visual servoing (IBVS) with 3-D features refers to the use of 3-D feature vectors in the visual control. The availability of low-cost and lightweight RGB-D cameras makes it natural to use 3-D point coordinates of the acquired image to construct the feature vector. In this article, by using 3-D features, we propose, first, a novel sampled-data model of the feature dynamics, which, in contrast to the usual forward Euler approximation, retains the rigid motion constraint. The need to introduce the sampled-data model arises from the fact that, due to the limited camera frame rate and actuation delays, the effects of a finite sampling time of the visual control system cannot be neglected. The stability analysis of the resulting discrete-time control system is carried out, showing that the desired equilibrium point of the visual control system is almost globally asymptotically stable. Finally, a new IBVS control algorithm is designed by resorting to the Lyapunov direct method. It explicitly takes into account the camera velocity limits while ensuring stability at the same time. Furthermore, despite large sampling times, it guarantees the absence of hidden oscillations and a smooth approach of the camera to the prescribed target configuration. The experiments are carried out in an emulated in-store logistic scenario by performing pick&place and object handover tasks, testifying to the effectiveness of the approach.


I. INTRODUCTION
V ISUAL control schemes use visual feedback from a camera to navigate a mobile robot or to position the robot end-effector of a manipulator.This objective is achieved by extracting features from the camera image.Features used in visual servoing can be points, lines, and contours (see [1], [2], [3], and references therein) or 3-D features generated by combining 2-D and 3-D information [4], [5].If these are used to estimate the camera pose directly in the 3-D space, the control strategy is called position-based visual servoing (PBVS), and it aims at minimizing the error between the current and target poses.On the other hand, the so-called image-based visual servoing (IBVS) aims at directly bringing the current image features to the target ones.Reference The authors are with the Dipartimento di Ingegneria, Università degli Studi della Campania Luigi Vanvitelli, 81031 Aversa, Italy (e-mail: marco.costanzo@unicampania.it;giuseppe.demaria@unicampania.it; ciro.natale@unicampania.it; antonio.russo1@unicampania.it).
Digital Object Identifier 10.1109/TCST.2023.3292311tutorials for such approaches are [6], [7], and [8].The IBVS approach is usually preferred to the PBVS one, owing to its robustness against camera calibration errors.IBVS controllers require the extraction of visual features from the camera image and their matching with the features of the given target image.
Then, the tracking module provides the visual controller with the feature vector frame by frame.To avoid the bottleneck of feature extraction and matching steps, the so-called dense approaches have been proposed in the literature.For instance, the photometric approach [9] is based on the luminance of all the pixels in the figure and requires stable lighting conditions.Dense depth map strategies [10], [11] require accurate camera calibration to provide satisfactory accuracy in the camera positioning.
For large camera transformations, the feature-based visual control results in inaccurate matching.To overcome such a difficulty, data-driven visual servoing approaches have been proposed recently.Deep neural network-based methods [12], [13], [14] are adopted to estimate the relative pose between the current and desired images and to perform the camera positioning.The mentioned methods generalize to novel and dynamic scenes [15], but the camera positioning accuracy is still unsatisfactory, especially in dynamic scenes [16].
In this article, with reference to applications where pick&place and handover of objects are dominant, as in intralogistic scenarios, we will adopt the 3-D IBVS control scheme because it guarantees high camera positioning accuracy in the short range of interaction among the actors of a shared task and the object to handle, and last but not least, it is robust against the changing of the scene over time if the visibility constraint is satisfied.
The feature-based visual control strategies proposed in the literature (refer to [17] for an up-to-date textbook on visual servoing) are based on the continuous-time model describing the relationship between the feature rate and the camera velocity screw.The control law aims at giving the feature error a decoupled and monotonically decreasing behavior by acting on the camera velocity as input to the robot controller.An interesting first discussion on stability and convergence problems of such control strategies can be found in [18].In [19], with reference to the IBVS with 3-D features, the equilibrium points of the 3-D IBVS closed-loop system are identified, and the instability of the undesired equilibrium points is proven by exploiting the isomorphism between SE(3) and the subset of the feature space to which the feature vector must belong to comply with the rigid motion constraint.This result has been extended in [20], where the basin of attraction of such undesired equilibria has been characterized for the classical continuous-time visual servoing scheme.
Due to the limited camera frame rate, actuation delays, and the time needed for computations, the sampling time of the visual control system is not negligible, and its intrinsic sampled-data nature should be taken into account to pursue the objective of improving the visual servoing quality and robotic system overall performance, by ensuring stability at the same time, as it is required in specific scenarios.For instance, in human-robot collaborative scenarios, the collaboration acceptability and the optimal sharing of the task require that the robot is able to perform actions at a speed comparable to that of the human partner.Costanzo et al. [21], [22] present the shelf replenishment task performed by the robot autonomously or in a collaborative way, respectively, by using standard algorithms and libraries for object detection [23] and visual servoing [24].
A first discrete-time analysis and design of a visual control scheme aimed at regulating the relative pose between a robotic camera and a rigid object can be found in [25].The authors use the forward Euler finite-difference approximation to formulate a discrete-time approximation of the visual servoing system model.Other discrete-time design approaches refer to multirate visual control [26], or they resort to model predictive control (MPC) strategies.MPC techniques are used with the aim to improve visual servoing performance and to take into account work space and robotic system constraints in the control law design.In [27], three sets of constraints are considered, namely: 1) robot mechanical limitations, i.e., work space limits and joint speed saturation; 2) visibility constraint, to ensure that the visual measurements stay in the image plane; and 3) control constraints, such as actuator limitations.Recently, a lightweight MPC architecture using a slim neural network, which can be trained on the fly, is proposed in [14] to solve the IBVS objective of minimizing the error between the current and target images.In [28], the model predictive path integral (MPPI), developed for autonomous robot navigation tasks, is proposed for coping with the PBVS control goal.The cited discrete-time designs are based on the approximation of the feature dynamic model derived via the forward Euler method.However, the obtained discrete-time model does not retain the rigid motion constraint, which is the base of the continuous-time model.Furthermore, the discrete-time IBVS model obtained via the forward Euler method approximates the continuous-time model with a relatively good degree of accuracy only in a neighborhood of the desired equilibrium point.This, in turn, implies that the stability analysis can be performed only locally.
In this article, we first propose an exact nonlinear sampled-data model of the feature dynamics by suitably exploiting the properties of the interaction matrix.Then, we inspect the stability properties of the error dynamics equilibrium points.Extending the results presented in [19] and [20] to the sampled-data case, we prove that, for a suitable choice of the sampling time and control parameters, the sampled-data IBVS closed-loop system exhibits a desired equilibrium point being almost globally asymptotically stable [29], implying that the only trajectories not converging to it are those belonging to zero Lebesgue measure sets.Based on the results on the stability of the sampled-data visual control system specifically developed in this article, we propose a novel control algorithm aimed at improving the accuracy and execution speed of the visual servoing task.The algorithm is based on the discrete-time Lyapunov analysis, and it allows the designer to guarantee end-effector Cartesian velocity constraints, depending on the robot configuration and joint velocity limits.Moreover, the algorithm enforces the constraint of monotonic convergence of the feature error over time, i.e., it does not exhibit any hidden oscillations, and it guarantees a prescribed smooth approach of the camera toward the target configuration.These control objectives are achieved through adaptation of the control gain obtained as the solution of a constrained optimization problem.
The proposed IBVS control schemes are experimentally evaluated in a lab-scale emulated in-store logistic scenario, where pick&place operations for shelf replenishment are executed by a robot in collaboration with a human partner, by utilizing object handover and dexterous handling maneuvers [30].

II. IBVS MODEL
In this section, to make the notation used in this article self-consistent, we recall notations and the continuous-time dynamics of the IBVS taken from [20], necessary to derive the corresponding exact nonlinear sampled-data model.

A. Continuous-Time Model
The IBVS model is based on the data provided by RGB-D cameras, consisting of RGB images and per-pixel depth information.Let p i be the ith point on the image, captured by the camera, having pixel coordinates (u i , v i ) ⊤ , and let z i be the corresponding depth.Denote with K the camera calibration matrix where C x and C y represent the center of the camera in pixel coordinates, and by denoting with f the focal distance and with l x and l y the width and the height of the pixel, F x = f /l x and F y = f /l y .Then, the point p i in 3-D coordinates with respect to the camera frame is expressed as We assume that the 3-D feature vectors of the captured RGB image, indicated hereafter with s i , are composed of the 3-D coordinates of the points p i , i.e., s i = p i .The IBVS approach aims at designing a closed-loop control scheme that regulates the current image feature set to a given target set of an image acquired beforehand.Let s i , s ⋆ i ∈ R 3 , i = 1, . . ., n, be the ith current and target feature vectors, respectively; then, the vectors s, s represent the vectors of n matched image features.Denote with v = v ⊤ ω ⊤ ⊤ ∈ R 6 the body velocity screw of the camera, where v ∈ R 3 is the linear velocity and ω ∈ R 3 is the angular velocity.Then, the link between the rate of s(t) and v(t) is described by the following relationship: where L(s) ∈ R 3n×6 is the so-called interaction matrix, and due to the particular feature set selected, it takes the structure where I 3 indicates the 3 × 3 identity matrix and S(s i ) is the skew operator applied to the vector s i such that S(s i )ω = s i × ω.
Assuming that there exist at least three distinct nonaligned features, the matrix L(s) has full rank, i.e., rank(L(s)) = 6.
By defining the error feature vector e ∈ R 3n as then the distance between the current feature and the reference one is represented by ∥e(t)∥.Moreover, denoting with L e = L(e + s ⋆ ), the dynamics of the current feature deviation from the target one can be easily derived from ( 5) and (7) as Hereafter, we will drop the dependence on the time for the sake of notation simplicity.
The basic approach to IBVS control, aimed at guaranteeing a decoupled and exponential decrease of the error, is to design the velocity screw as [7] v = −λL † e e, λ > 0 (9) where L † e ∈ R 6×3n is the Moore-Penrose pseudoinverse of L e and λ is a positive constant or a positive time-varying gain.Therefore, the visual servoing error dynamics is The matrix L e L † e ∈ R 3n×3n has rank 6; thus, L e L † e has a nontrivial null space and the configurations of the feature vector s such that e ∈ ker(L † e ) = ker(L ⊤ e ) are those in which the velocity controller output is zero and the error gets stuck in an undesired equilibrium point with s ̸ = s ⋆ .
The configurations of the feature vector s and, consequently, of the error e are subject to the rigid motion constraint, that is, where the symbol ⊗ indicates the Kronecker product, and I n and 1 n are the n × n identity matrix and the vector respectively.Furthermore, r and θ are the unit axis and the rotation angle of the rotation matrix R, which, together with the translation vector p, describes the displacement from the target feature vector to the current one.

B. Sampled-Data Model
The camera frame rate and the sampling rate that can be achieved by the visual perception pipeline are usually much lower than the sampling rate of the low-level robot control loop.With standard hardware and taking into account any possible latency of the robot control interface, the sampling time of the visual control loop can be as high as 100 ms.Therefore, the sampled-data nature of the control loop should be explicitly taken into account in the design to allow the maximum possible performance of the robotic system while, at the same time, ensuring the stability and precision of the closed-loop system.Fig. 1 shows a block scheme of the visual control system, where the feature dynamic system is commanded with a sampled-data control input through a zero-order-hold (ZOH) block.We here assume that the robot accepts Cartesian velocity commands, and it can be considered an ideal Cartesian motion device.To prove that this assumption is reasonable, the following experiment has been carried out.Considering the same setup that will be used in Section V, the robot is real-time controlled via velocity commands at 1 kHz, and the velocity commands are updated at 20 Hz.The robot is commanded with sinusoidal translational and rotational velocity commands, sampled at 50 ms, with an amplitude of 0.4 m/s and 0.8 rad/s, respectively, and a frequency of 0.5 Hz.The left and right plots of Fig. 2 show the norm of the translational and rotational velocities, respectively.The blue piecewise constant line is the command, while the measured velocity is reported in red.The results clearly show that the measured velocity reaches the set point value much faster than the command sampling rate.
The first step toward the design of the proposed IBVS control algorithm is to derive the sampled-data model of the feature dynamics without resorting to the forward Euler approximation adopted in the previous literature.Taking into account the structure of L(s) in ( 6) and the properties of the skew symmetric matrix S(s i ), i.e., S(s i )ω = −S(ω)s i = S ⊤ (ω)s i , then the term at the right-hand side of (5) can be written as L(s)v = S⊤ (ω)s − v, and the feature dynamics in (5) can be expressed as where S(ω) ∈ R 3n×3n and v ∈ R 3n have the following expressions: Denoting with T the control sampling time, the velocity control v is kept constant at the value v(kT is instantaneously updated at each step k and is constant within this time interval.This makes the system in (13) a linear piecewise constant system, and the feature evolution in this time interval can be computed as Assuming that t = T and by a suitable change of variable, the feature dynamics in the sampling instants are governed by (16) where s k stands for s(kT ).
By expanding the exponential matrices in the previous equation in power series, it holds that and the discrete-time feature dynamics results in By rearranging the previous equation, we obtain and by changing the index in the first summation, we have Finally, by following the same arguments used to derive (13), in view of ( 17), ( 20) can be written in a more compact form as with P k ∈ R 3n×3n defined as Equation ( 21) represents the sampled-data feature dynamic model, which describes exactly the behavior of the IBVS system at the sampling instants.Now, let us denote the sampled-data feature error vector e ∈ R 3n as Then, it is straightforward to derive the feature error dynamics as where Let the velocity control law in (24) be defined as with, again, λ being either a positive constant or a positive time-varying scalar.Then, the expression of the sampled-data closed-loop feature error dynamics in ( 24) becomes

III. STABILITY PROPERTIES A. Geometric Characterization
The dynamics in (26) presents two families of equilibrium points: the desired equilibrium e k = 0 and the set of undesired equilibrium points such that e k ∈ ker(L ⊤ e k ), that is, the set of e k such that L e k L † e k e k = 0.In [19], it is proven that, given the target s ⋆ , there exist only three feasible configurations ŝx , x = a, b, c, of the feature vector s corresponding to the undesired equilibrium configurations with e ∈ ker(L ⊤ e ), s ̸ = s ⋆ and satisfying the rigid motion constraint (11).The three configurations are characterized by (θ, p, r) = ( θ , p, r) that satisfy the following conditions: where By virtue of the rigid motion constraint expressed in (11), the visual servoing dynamics can be characterized in SE (3) by means of the variables θ, r, and p.In the SE(3) space, the desired equilibrium configuration s = s ⋆ corresponds to θ = 0, r ∈ R 3 (that is, R(r, θ ) = I 3 ), and p = 0.The undesired Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.equilibrium configurations, instead, are those characterized by θ, r, and p satisfying conditions (27).
Following the same idea used for the continuous-time analysis in [20], we denote with e I the projection of e on the Im(L e ), that is, e I = L e L † e e, and e N the projection of e on the ker(L ⊤ e ), that is, e N = (I 3n − L e L † e )e.Note that, even if the feature vector s ∈ R 3n , the rigid motion constraint (11) imposes the features to belong to the subset R ⋆ ⊂ R 3n [isomorphic to SE(3)] defined as the set of all the feature vectors s satisfying (11).Fig. 3 graphically illustrates such constraint in the plane (∥e N ∥, ∥e I ∥).The figure has been numerically constructed with the procedure described in [20].The red areas correspond to points that do not satisfy the constraint (11); thus, any point of the system trajectories reported in the plane (∥e N ∥, ∥e I ∥) cannot belong to such regions.
Denoting with ρ x the vertical lines passing through ∥e N ∥ = êx , being êx = ŝx − s ⋆ , with x = a, b, c, the three undesired configurations, in [20], we have proved the following results.
1) The boundaries of the red areas in Fig. 3 are tangent to the vertical lines ρ x in the points 3) In view of [20, Proposition 1], for all three undesired equilibria ŝx (the subscript x will be dropped to simplify the rest of the treatment), the following property holds.Any point s obtained as a pure translation of ŝ belongs to the manifold N = {s | s = ŝ + p}, where p = p − p and p = 1 n ⊗ p.Such a set for any given s ⋆ corresponds to the ρ lines in the plane (∥e N ∥, ∥e I ∥) in Fig. 3.In other words, considering a pure translation starting from ŝ, the feature error projection e N remains constant and equal to ê.

B. Dynamics Characterization
In our former paper [20], we proved that the trajectories of the continuous-time closed-loop system (10) starting from e 0 such that s 0 ∈ N converge to the undesired equilibrium ê for any value of the gain λ.Here, we are ready to prove that such behavior is exhibited also by the sampled-data closedloop system (26) but with a limit on the gain λ depending on the sampling time.Note that the proof of the next proposition is fairly more articulated due to the presence of the matrix P(ω k ) in the sampled-data system dynamics.
Given the above reasoning, we are now ready to analyze the behavior of the system (26) when s 0 ∈ N .
Proposition 1: With reference to the system (26), the only trajectories e k that asymptotically converge to ê are those starting from e 0 such that s 0 ∈ N if and only if 0 < λT < 2.
Proof: Recalling result 3, any s 0 ∈ N can be generated as a pure translation of ŝ, that is, s 0 = ŝ + p0 , with p0 = p0 − p.Thus, for any given s ⋆ , e 0 = ê + p0 .Then, given the dynamics in (26), for k = 0, it holds that Since ê ∈ ker(L ⊤ ê ) and its components êi ∈ R 3 are such that it holds that Then, which, considering that p0 ∈ Im(L e 0 ), becomes Since s 0 ∈ N , then ω 0 = 0, and from (22), P 0 = T I 3n , providing , resulting in the fact that e 1 represents a mere translation with respect to ê.This, in turn, implies that no rotation is generated.Hence, by iterating the above reasoning, it is possible to write the dynamics of trajectories e k with e 0 such that s 0 ∈ N as or, equivalently, Thus, Thus, the feature trajectories starting from the set N converge to the undesired equilibrium points ŝ without leaving N , which means that the error converges to ê.
To prove that they are the only ones, consider a slight perturbation on e 0 obtained by perturbing θ = θ + δθ and r = r + δr.Since, on the manifold N , the angular velocity ω = 0, then a slight perturbation from N generates a low angular velocity.
The matrix P k = P(ω k ) in the system dynamics (26), by virtue of the Euler-Rodrigues rotation formula R = e S(θ r) = I 3 +sin θ S(r)+(1−cos θ)S 2 (r) [31], can be written as For very low values of ∥ω k ∥, it holds Now, consider the Lyapunov candidate function whose first-order variation along the error dynamics ( 26) e k e k and e N k = e k − e I k , as (41) Substituting (39) into (41), it holds that Note that the approximation (42) of ( 41) is obtained by neglecting terms of the order ∥ω k ∥ 2 ; therefore, there exist a neighborhood of ω k = 0 where they have the same sign.Furthermore, in view of (39), system (26) becomes As long as ω k is negligible, considering e k = e I k + e N k , the solution of the system can be approximated as implying that e I k decreases.By evaluating the single terms at the right-hand side of (42), we note that, being s 0 close to the manifold N , the vectors e N k = e N 0 and ê are almost aligned and such that ê ≈ ∥e N k ∥.Thus, the difference ê − e N k can be considered orthogonal to both vectors, i.e., it is almost aligned with e I k , implying that the first term in (42) is positive definite.Regarding the second term in (42), we cannot evaluate its sign, but we can evaluate its norm.In fact, comparing the first and second terms, it can be observed that the norm of the second term is definitely smaller than the norm of the first one due to the presence of terms S(ω k ) (with ω k having negligible norm) and T 2 (being T smaller than 1).Thus, the second term can be neglected compared to the first one.Regarding the third and last terms, the term characterized by T 3 S2 (ω k ) is positive semidefinite, and it can be neglected due to its small magnitude.Therefore, the first-order variation V k in (42) can be further approximated as where the first term is linear with respect to e I k , while the second term is quadratic.As long as the norm of ω k is negligible, the solution of e I k evolves as in (45), and there exist a time instant k such that the linear term definitely dominates the quadratic one, thus yielding a positive first-order variation of the Lyapunov function.This implies that e k − ê increases, and the trajectory moves away from the manifold N , thus proving that any trajectory starting outside N cannot converge to the undesired equilibria.
The arguments on the feature error dynamics character, discussed in the proof of Proposition 1, are elucidated by means of the simulation results illustrated in Figs. 4 and 5.The results in Fig. 4 are obtained by starting the feature dynamics with an initial condition close to the manifold N , that is, θ 0 = 179.9• , r 0 = r, and p 0 = [0.3,0, 0] ⊤ .The left plot in Fig. 4 shows the error norm trajectory in the plane (∥e N ∥, ∥e I ∥).It can be observed that the trajectory Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
initially evolves due to the translational velocity control only; in fact, the angular velocity ∥ω∥ in the right plot is negligible.In correspondence with the sign variation of V , in the right plot, ∥ω k ∥ begins to grow up; at the same time, ∥e N ∥ starts to change noticeably, and ∥e I ∥ begins to increase.For this reason, the error trajectory in the left plot goes away from the ρ line, which means that the feature dynamics moves away from the manifold N .
The analysis in Proposition 1 and the simulation have been carried out so far, assuming the absence of any camera measurement noise.It is clear that, in real cases, in the presence of noise, the feature trajectory cannot lie on the zero Lebesgue measurement manifold N , even if the initial condition belongs to it.Anyway, the importance of Proposition 1 is to highlight a pathology on the trajectory convergence character when θ 0 ≈ θ and r 0 ≈ r due to the very low norm of the camera angular velocity in the initial part of the feature trajectory.Fig. 5 shows the simulation results carried out assuming θ 0 = θ, r 0 = r, and a random uniform uncertainty of 1% on the depth measurement.As illustrated in the left plot, in the first part of the trajectory, the camera moves only due to a translational velocity control; thus, it translates only and reaches the target at around t = 3 s, but θ is still close to θ. Between 3 and 6 s, the angular velocity is almost zero, and the error norm is practically constant (see the right plot in Fig. 5).At t = 6 s, the angular velocity starts increasing, and the error trajectory moves away from the manifold N .This behavior entails a drawback: the end-effector rotates only after getting very close to the target, e.g., in a task of object pick&place, this implies that the robot may collide with the object or the desk on which it lies, and/or depending on the robot configuration, a collision between the camera and the robot arm may occur.The above analysis suggests that initial conditions of the feature points close to the manifold N should be avoided in order to prevent the generation of sole translational trajectories followed by sole rotations; the latter performed when the camera is very close to the target configuration.By simulation, we will show later that the basin of convergence of the trajectories that do not present the described pathology is very large.
We now discuss the stability properties of the desired equilibrium e = 0.
Proposition 2: Consider the sampled-data closed-loop system (26) then the desired equilibrium point e = 0 of system ( 26) is almost globally asymptotically stable.

Proof:
Since the skew matrix is normal, i.e., S ⊤ (ω k )S(ω k ) = S(ω k )S ⊤ (ω k ), by resorting to [32, Th. 2.5.3], the skew matrix S ⊤ (ω k ) can be factorized as where U k is unitary and U * k is the conjugate transpose of U k .Since the diagonal matrix at the right-hand side of ( 48) is a 3 × 3 singular diagonal matrix with the first element of the diagonal equal to zero, then the following integral can be written as: where ω T k = ∥ω k ∥(T /2).Thus, the matrix P k can be expressed as where ¯ k , Ūk ∈ C 3n×3n are defines as By selecting as Lyapunov function candidate V (e k ) = e ⊤ k e k , its first-order variation V k = e ⊤ k+1 e k+1 −e ⊤ k e k along the error dynamics (26) can be written as Denoting with B k the matrix P k −T I 3n , we can write Then, V k in (52) can be rewritten as Recalling that L e k L † e k is the projector matrix on the Im(L e k ) and e I k = L e k L † e k e k , then V k can be upper bounded as follows: As the matrix then we have P ⊤ k P k ≤ T 2 .In view of (53), the norm of the matrix B k is such that The graph of the function γ(ω T ) is reported in Fig. 6, where the plot clearly shows that γ(ω T k ) ≤ ω T k .
In view of the above arguments and the definition of α k , the upper bound for V k can be rewritten as , and by virtue of (29), the three terms in parenthesis tend to zero with the same rate.This, in turn, implies that, for ∥e I k ∥ ̸ = 0, it is always possible to select λ k T such that V k ≤ 0. Hence, for such selection of λ k T , the rate of variation of the Lyapunov function is negative definite almost everywhere (since, apart from ∥e k ∥ = 0, V k = 0 only in the undesired equilibrium points).Since the only trajectories converging to the undesired equilibria are those belonging to the zero Lebesgue measure sets introduced in Proposition 1, by virtue of (29), almost global asymptotic stability is achieved for the desired equilibrium point if, for each k, λ k is such that implying that V k < 0, ∀k.

IV. SAMPLED-DATA VISUAL SERVOING CONTROL
In this section, a novel visual servoing control algorithm is presented to pursue the fulfillment of the following requirements.
1) The feature error dynamics should exhibit a monotonic decrease over the continuous time t.
2) The camera velocity control should comply with the joint velocity limits.
3) The camera velocity control should guarantee a prescribed smooth approach of the camera toward the prescribed target configuration.4) The visual control should adapt to the specific robotic system without the need of experimentally tuning the control parameters.The first three requirements aim at fully exploiting the robot's dynamic performance, while the last requirement ensures the portability of the visual controller.
The control algorithm proposed in the following proposition requires the definition of two scalar functions, i.e., π(•, •) called cost function and l(•) called landing function.The first one is defined as which, as a function of λ k T , being λ k > 0, is a concave up parabola passing through the origin and the point The landing function l(e k ), which is designed to allow a smooth approach to the feature target configuration, is defined as follows.Denote with e M k the mean square error e M k = ∥e k ∥/ √ n, with e L and e H two positive scalars such that e L < e H , let µ ∈ (0, 1], and then where q(•) is a fifth-order polynomial whose coefficients are determined such that the first-and second-order derivatives are zero at the extremes of the interval [0, 1] and q(0) = 0, q(1) = 1.Note that 0 < l(e k ) ≤ 1, and µ = 1 implies that l(e k ) = 1.Finally, denote with v M k and ω M k the maximum allowed norm of the linear and angular velocities at the kth step, respectively.
Remark 1: The term "landing function" for l(e k ) has been chosen because an appropriate setting of the values e L and e H , and the parameter µ allows to design the deceleration profile for a smooth approach to the target configuration.Guidelines to select these values will be provided in Section V.
The design of the visual servoing controller is then presented in the following proposition.
Proposition 3: Consider the sampled-data closed-loop system (26) with λ = λ k .Assuming that s 0 ̸ ∈ N , if the gain λ k is determined, for each sampling time interval, by solving P1: min then the following holds.
1) The equilibrium point e = 0 of the sampled-data closedloop system is almost globally asymptotically stable.
2) The feature error norm ∥e(t)∥ exhibits a monotonic decrease over the time t.Proof: In Proposition 2 we have proven that the sampled-data closed loop system in (26) is almost globally asymptotically stable if λ k T is selected in the range defined in (47).The proof has been performed via the Lyapunov direct method by selecting the Lyapunov function candidate V (e k ) = e ⊤ k e k .An upper bound of its first-order variation V k can be written by rearranging (57) as since l(e k ) belongs to the range (0, 1].Then, the equilibrium point e = 0 is almost globally asymptotically stable if λ k T is in the range (61b).Now, we have to prove that the proposed gain adaptation algorithm allows us to achieve the objective 2).
By multiplying and dividing the last term in (54) by T 2 and by dividing the left-hand and the right-hand side by ∥e I k ∥ 2 , we obtain Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.where a k , by virtue of (50), is given as follows: Let us define the function g k (λ k t) = ∥e(kT + t)∥ 2 − ∥e(kT )∥ 2 , t ∈ [0, T ], which has the following properties: The function g k (λ k T ) is upper bounded by the right-hand side of (63), which is a parabola with branches pointing up and negative definite for λ k T belonging to the open interval in (47).
The function f k (λ k T ) and its upper bound are represented in Fig. 7.
Define λ ⋆ k T as λ ⋆ k T = arg min Then, for each k, f k (λ k T ) is monotonically decreasing in the interval [0, λ ⋆ k T ].Now, we claim the following.1) as a consequence of the monotonic decrease of and, by virtue of properties (67), we have which ensures the monotonic decrease of g k (λ ⋆ k t) in [0, λ ⋆ k T ] and then of ∥e(kT + t)∥ with t ∈ [0, T ], ∀k.
Proof of Claim 2: As Then, g k (λ k t) is not monotonically decreasing in [0, λk T ] ∀k, and the sampled-data visual servoing system (26) exhibits hidden oscillations.Since the function π(e k , λ k T ) has a minimum in correspondence of λk with λk < λ ⋆ k , the solution of problem P1 ensures the absence of hidden oscillations.
If λ k = λk is such that the constraint (61c) is not satisfied, the minimization algorithm will select λ k < λk ; as the camera velocity depends linearly from λ k , this again ensures that g k (λ k t) is monotonically decreasing for t ∈ [0, T ], ∀k ∈ N 0 .
Note that Problem P1 in (61) has a simple closed-form solution in λ k T .In fact, this is a scalar minimization problem of the parabola defined by the function π.The unconstrained minimum of the parabola is λ k T = l(e k )/(1 + α k ω 1 k ), which satisfies constraint (61b).Moreover, the constraints in (61c) are linear inequality constraint in λ k T and can be written as Thus, the closed-form solution of P1 can be written as After defining the novel visual servoing control algorithm guaranteeing almost global asymptotic stability of the closed-loop system and the absence of hidden oscillations, it is of interest to investigate for which cases of the initial orientation error the pathological behavior described in Section III actually affects the behavior of the closed-loop system causing a longer converging time (see Figs. 4 and  5).This has been verified through a simulation campaign, as presented in Figs. 8 and 9, carried out to show the behavior of the controlled system for different values of the initial orientation error θ 0 .All the simulations have been performed by adopting the control algorithm proposed in Proposition 3 with Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Simulation results for values of the initial angle error Left: value of α.Right: translational and rotational commanded velocities.0.1 m/s, and ω M k = 0.75 rad/s for all k.The response of the closed-loop system has been evaluated considering six different values of the initial axis-angle error θ 0 and r(0), i.e., θ 0 = 180 • , 179 • , 170 • , 160 • , 150 • , and 120 • and r(0) = ra , rb , rab with ra and rb the rotation axis corresponding to the undesired equilibria êa and êb , respectively, and rab is a rotation axis obtained as mean between ra and rb .These values are combined, as shown in the legends of Figs. 8 and  9.It addition, it is assumed that the depth measurements are affected by white noise corresponding to 1% of the current depth value.Fig. 8 displays the error convergence for each value of the initial angle error, with the left plot showing the plane (∥e N ∥, ∥e I ∥), while ∥e∥ is shown on the right plot.As expected, in all the considered cases, the error asymptotically converges to zero, but the initial angle plays a major role in the determination of the actual error trajectory.Consider first the cases where r(0) = ra .When θ 0 = 170 • , the camera starts with a lower rotational velocity [see Fig. 9 (right bottom)].In fact, until t = 5 s, the camera motion is predominantly a translation, and the error trajectory decreases mostly along e I [see again Fig. 8 (left)].As the initial angle θ 0 decreases, this behavior is less noticeable, and when θ 0 = 120 • , the error trajectory evolves from the very beginning along both e I and e N , and the camera rotates and translates at the same time.As a result, the overall error convergence is faster, as shown in Fig. 8 (right).Therefore, comparing the convergence times obtained for different values of the initial rotation errors, it can be concluded that, for θ 0 < 140 • , the effect of the pathological behavior is almost negligible.Hence, only trajectories with large initial rotation errors are truly affected by the latency generated by small angular velocities.Unsurprisingly, such behavior is evident also considering a different initial rotation axis.When r(0) = rb and θ = 179 • , the trajectory seems to evolve toward the undesired equilibrium, but, similar to the simulation in Fig. 4, it slowly converges to the origin.The same pathological behavior appears also when the angle is the critical one θ (0) = 180 • , but the rotation axis is neither of the three equilibrium ones (brown curves) showing that the angle plays a major role in the determination of such behavior.Note that, despite the brown and the cyan curves in Fig. 8 across the ρ a line in the (∥e N ∥, ∥e I ∥) plane, this does not mean that the trajectory crosses the manifold N e since the represented plane is just a convenient 2-D visual representation of the 3n-dimensional state space.
Finally, Fig. 9 shows the value α = ∥e∥/∥e I ∥ used to define the upper bound of the gain in Proposition 2. From (29), we know that, near the origin, α = 1.The plot shows that α remains finite and close to 1 in all the simulated cases, and it reaches a maximum value of α = 1.75 in the θ 0 = 170 • case.

A. Experimental Setup
The visual servoing algorithm has been tested in a collaborative in-store logistic scenario for shelf replenishment tasks.A Kuka LBR iiwa 7 robot has to pick various objects and place them on the shelves.The objects could be picked either from a desk or exchanged with the clerk by means of a handover maneuver.The end-effector is equipped with an Intel RealSense D435i RGB-D camera mounted in an eye-inhand configuration.The grasping device is a WSG50 gripper by Weiss Robotics equipped with the SUNTouch force/tactile sensors [33].The small clearance between the shelves limits the robot's collision-free workspace.To successfully execute the placing task, the robot dexterity is enhanced by the in-hand manipulation capabilities allowed by the slipping control algorithm described in [34] based on the dynamic friction model and the nonlinear observer proposed in [35].To perform controlled sliding, the robot is required to grasp the object in prescribed grasping points with a maximum positioning error of 2 mm [21].The visual servoing algorithm is considered correctly executed if the root mean square error e M goes below 2 mm.A video of the experiments described in this section is available at https://youtu.be/unNwifaBD4A.
The algorithm runs on an ROS network.The iiwa robot is controlled via the fast robot interface (FRI) with a PC running Ubuntu with the PREEMPT_RT real-time patch, ROS Melodic, Intel i7-8700K CPU @ 3.70 GHz, and 64 GB of RAM.The robot control rate is set to 1 kHz.The joint reference position commands are filtered via an additional first-order filter with a cutoff frequency of 100 Hz.
The RealSense camera is connected to the ROS network via the REALSENSE-ROS wrapper, the camera resolution is set to 640 × 480 pixels, and the camera rate is set to the upper limit of 30 Hz.The default ROS wrapper publishes the depth image data with a depth resolution of 1 mm, whereas, by setting the camera driver to a depth resolution of 0.1 mm, we obtain an accuracy of 1% of the depth measurement in the range of 6 m, which is adequate for a collaborative task.
Since the FRI interface imposes a maximum joint displacement of 0.001 rad between two consecutive commands and the robot control rate is 1 kHz, the joint velocity limit qM results to be 1 rad/s.This has to be translated into Cartesian Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
velocity limits.The relationship between the Cartesian and joint velocity commands is given by where q ∈ R 7 is the joint position and J(q) ∈ R 6×7 is the robot Jacobian.From (76), the squared norm of the joint velocity is which is a quadratic form depending on the robot Jacobian.From (77), it is straightforward to find the time-varying Cartesian velocity limits that keep the joint speed below the constraint value qM where σ 1 ( J(q)) is the largest singular value of the robot Jacobian and the vectors σ v11 ( J(q)) and σ v21 ( J(q)) ∈ R 3 contain the first and last three elements of the corresponding singular vector, respectively.

B. Algorithm Implementation Details
We assume that the robot already knows which object has to grasp and the desired grasping pose that corresponds to the target image.To acquire the target image, we manually brought the robot to the grasp location such that the fingertips touch the object at the prescribed grasping points.
The 3-D feature points vector s i and the corresponding target s ⋆ i , i ∈ 1, . . ., n, are extracted and matched at the beginning (t = 0) via the keypoint matching algorithm available in the ViSP library [24] (vpKeyPoint class) by using the first frame acquired by the camera and the target image.After the initial matching, the features are tracked only on the current image by using the keypoint tracking algorithm available in the same library (vpKltOpencv class).

C. Simulation and Real Experiment Comparison
The sampled-data model of the visual servoing system and the proposed controller are available in a Simulink library at the following URL https://github.com/Vanvitelli-Robotics/MATLAB_visual_servo.All the simulated plots in this article are generated through this library.
This section compares the experimental visual servoing task with the simulated one to highlight that assuming the robot as a pure Cartesian motion device is reasonable.
The robot is commanded to grasp an object from the desk by using the algorithm proposed in Proposition 3 to synthesize the control action.The initial conditions are selected as θ (0) = 17 • , r(0) = [0.0415−0.9989 −0.0208] ⊤ , and p(0) = [0.06470.0146 0.1145] ⊤ m.The landing function in (60) l(e k ) is designed taking into account the capability of the robot to decelerate.By assuming a uniformly decelerated motion at the end of the visual servoing task, an estimation of the deceleration space is 3v M 2 /2a M , where a M is the maximum  Cartesian acceleration that the robot can generate.Assuming that a M is about 5 m/s 2 , then e H is set to 5 mm, while e L = 2 mm since it corresponds to the accepted bound on the steady-state error.Concerning µ, when the root mean square error e M is below e L , assuming a pure translational motion, the translational velocity is upper bounded as ∥v∥ ≤ µe L √ n/T .Therefore, µ = 0.25/ √ n has been selected in order to have a maximum velocity of 1 cm/s inside the bound defined by e L .
The initial matched features s(0) and the corresponding target s ⋆ are used to initialize the simulator too.Moreover, the same time-varying Cartesian velocity limits v M k and ω M k of the real experiment [computed as in ( 78) and ( 79)] are used in the simulator.Fig. 10 shows the results of the comparison.The blue and black lines represent the root mean square error e M resulting from the simulation and the real experiment, respectively.
Remarkably, the blue and black lines have very similar behavior.The simulated e M converges to zero asymptotically, while the one of the real experiment is ultimately bounded in a neighborhood of e = 0 whose radius is less than 1 mm in spite of the feature error measurement uncertainty and the limited bandwidth of the robot control interface.
Fig. 11 (top) and (bottom) shows the norm of the translational and rotational controller output velocities, respectively, both for the real and simulated experiments and the velocity limits v M k and ω M k .Note that the proposed visual servoing algorithm is able to not only ensure convergence of the feature error very close to zero but also push the robot up to the maximum of its performances.

D. Tracking Experiment
This section shows a tracking experiment in a collaborative scenario.The robot is commanded to align the camera to a target object, while a human partner moves the object.Note that, in this context, the word "tracking" refers to the task of following a moving object, while the reference features s ⋆ are kept constant for the whole experiment.Thus, the object motion is a disturbance that the controller has to compensate for.This experiment is aimed at showing that, with the proposed control algorithm, the robot is capable of following the velocity imposed by the human partner up to its velocity limits.Fig. 12 shows the snapshots of the experiments, the first image shows the equilibrium configuration, and the robot has to track the object and maintain the same relative camera-object pose during the whole experiment.Fig. 13 shows the resulting mean square error e M and the computed gain λ, while Fig. 14 shows the norm of the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
translational and rotational control input and the respective limits.
In the beginning, between t = 5 s and t = 9 s, the human partner slowly rises the object up [see Fig. 12(2)], and the robot is able to follow it within its velocity limits; at t = 9 s, the human partner stops, and the robot reaches again the desired relative pose [see Fig. 12(3)].At t = 10 s, the human partner rapidly moves the object down [see Fig. 12(4)]; this time, the robot is pushed at the edge of its translational velocity limits [see Fig. 14 (up)], and correspondingly, the error has a peak.At the same time, the gain λ [see Fig. 13 (bottom)], automatically computed by the control algorithm, increases to push the velocity up to the robot limits.Nevertheless, the robot is still able to counteract this disturbance, and at t = 11 s, the robot reaches again the equilibrium relative pose [see Fig. 12(5)].
The same motion has been repeated in the orthogonal direction.The human partner first slowly moves the object toward the robot and then moves it back again with a fast motion.The results are very similar to the previous ones; during the slow motion, between t = 13 s and t = 16 s, the robot is able to follow the object by keeping the camera velocity within its limits.Once again, at t = 16 s, when the human partner rapidly moves the object back, the robot is pushed to its translational velocity limits [see Fig. 14 (up)].Finally, at the end of this phase (t = 18 s), the robot reaches the equilibrium pose again, as shown in [see Fig. 12(6)].
The remaining part of the experiment is devoted to rotational motions.The human partner slowly rotates the object [see Fig. 12(7)] to reach the configuration in Fig. 12(8); then, with a fast rotational motion [see Fig. 12(9)], the human brings the object in its original orientation [see Fig. 12 (10)].Three repetitions of this rotational perturbation are performed and shown in the graphs between t = 19 s and t = 50 s.The behavior is qualitatively similar to the translational case, and the rotational velocity control plot reaches lower values (about 0.1 rad/s) during the slow human motions followed by higher peaks during the fast motions.Note how, during the last two fast rotational motions (t = 32 s and t = 44 s), the robot is pushed to its rotational velocity limits as the ∥ω∥ signal in Fig. 14 (bottom) saturates to ω M .
The whole experiment followed by an additional phase where the human operator moves randomly in all the rotational and translational directions is available in the accompanying video.

E. Pick&Place Experiments
This section shows pick&place experimental tasks in a collaborative scenario.
The object to pick is placed on the picking desk [see Fig.   one reports the control gain λ.The blue lines represent the results obtained when the object pose is not perturbed by the human partner.The red lines represent the perturbed case.At about 0.5 s, the human partner changes the position of the object to be grasped to avoid the collision between the robot and another object present on the picking desk.The object position perturbation lasts 0.5 s and ends at t = 1 s.During this time interval, the robot follows the moving object, and the root mean square error is almost constant.When the perturbation ends, the error decreases and converges toward zero.When the root mean square error e M reaches the prescribed value of 2 mm, the visual servoing subtask is considered completed, and the robot grasps the object, moves it to reach the target location, and places it on the destination shelf to complete the pick&place task.The placing trajectory, after the visual servoing phase, has been planned offline by using the motion planners available in MOVEit [36].

F. Pick&Place Experiments With Human-Robot Handover
The human partner hands the object, picked from the desk, over to the robot.The human operator handles the object in the camera's field of view, and the robot reaches the grasping location by means of the visual controller.Since the object is never perfectly still, the object pose is subject to continuous perturbations due to the unavoidable shaking of the human hand.A plastic bottle and a carton box are picked and placed in sequence.
The human operator shows the carton box to the robot, who grasps it by using the visual controller [see Fig.   18 shows the error and the norm of the commanded translational and rotational velocities.Unsurprisingly, the plots are similar to the previous experiments, and once again, the robot is pushed close to its velocity limits.Note that, in less than 1 s, the visual servoing reaches the value e M = 2 mm, and the remaining time in the plot corresponds to the gripper closing and grasping phase.During the gripper closing, the visual servoing controller is still active to track the object and reject any perturbation given by the human operator (see the accompanying video).This time, the robot has to place the object between the two shelves on the right (see Fig. 17).Before the placing phase, the robot goes in front of the prescribed shelf in a configuration not feasible for the placing operation (top right figure).By using a gripperpivoting maneuver, the robot is able to change on the fly the relative orientation between the gripper and the object to  Handover subtask performed with the plastic bottle.Top: root mean square error e M .Bottom: norm of the commanded camera translational velocity (left axis: blue curve) and norm of commanded camera rotational velocity (right axis: red curve).The dashed lines represent the velocity limits.reach a feasible configuration (bottom left figure; see also the accompanying video).Finally, the robot is able to place the object on the shelf (bottom right figure).
The same experiment is repeated with the plastic bottle.Fig. 19 shows the results of the visual servoing subtask.This time, the human operator intentionally applies larger Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
perturbations to the object pose.At about t = 0.5 s, a first perturbation is applied, but the robot is already at the edge of the velocity constraint, and the control is not able to counteract the perturbation; thus, the error increases.When the perturbation ends, the error decreases again.This is visible in Fig. 19 (top).Between t = 3 s and t = 4 s, another perturbation is applied when the error is very close to the equilibrium point; in this case, the visual control counteracts this event by increasing the velocity command.This is visible in the two peaks in the bottom plot corresponding to the two peaks that correspond to the perturbations in the top plot.

VI. CONCLUSION
In this article, we presented a novel IBVS with 3-D features.A sampled-data model is presented to describe the exact closed-loop dynamics in the sampling times while retaining the rigid motion constraint.Useful insights into the well-known undesired equilibria of an IBVS scheme are provided to characterize pathological situations in the application of an IBVS control strategy based on 3-D feature points.
The control design is aimed at pushing the robotic system performances up in terms of accuracy and execution speed.The algorithm is based on an exact sampled-data model of the visual servoing system and a discrete-time Lyapunov analysis.The input to the robot controller has been synthesized by adapting the control gain solving a constrained optimization problem.The constraints refer to the camera Cartesian velocity limits, depending on the robot configuration and joint velocity limits, and the monotonic decreasing in the continuous time of the feature error dynamics.Even though the algorithm assumes that the robot is an ideal Cartesian motion generator, the experimental results in the real environment have shown that such an assumption is entirely reasonable.
Three pick&place experiments, involving a human partner and requiring handover maneuvers and in-hand dexterous handling, have been presented.The accompanying video shows that the robot exhibits a high accuracy in grasping and placing the objects and behaves, in terms of speed of execution, in a way comparable to the human operator, satisfying the requirement for the optimal sharing of a collaborative logistic task.

Fig. 1 .
Fig. 1.Block scheme of the controlled sampled-data IBVS system.The block VC represents the visual controller.

Fig. 2 .
Fig. 2. Test to show that the ideal Cartesian motion device assumption is reasonable.Left: norm of the translational velocity.Right: norm of the rotational velocity.

Fig. 3 .
Fig.3.Plane (∥e N ∥, ∥e I ∥); the undesired equilibria are reported on the horizontal axis.The points in the red area do not satisfy the rigid motion constraint.

Fig. 7 .
Fig. 7. Plot of f k (λ k T ) and its upper bound in (57) normalized with respect to ∥e I ∥ 2 .

Fig. 10 .
Fig. 10.Comparison of e M between simulation and real experiment.Black line: real experiment.Blue line: simulation.The inner plot is a zoomed-in view that shows the difference in the neighborhood of e M = 0.

Fig. 14 .
Fig. 14.Tracking experiment.Transnational (top) and rotational (bottom) control velocities norm and the respective maximum values v M and ω M .
15 (top left)], and the robot is commanded to pick and place it on a selected shelf [see Fig. 15 (top right)].Two experiments are executed; in the first one, the robot performs the pick&place task autonomously.In the second one, the object position is perturbed by a human operator during the visual servoing subtask [see Fig. 15 (bottom)].Fig. 16 shows the results of the two experiments.The top plot reports the root mean square error e M , while the bottom

Fig. 15 .
Fig. 15.Pick&place experiments.Top left: initial configuration.Top right: final placing configuration.Bottom: snapshots of a disturbance induced by a human operator who moves the object to be picked.

Fig. 17 .
Fig. 17.Handover experiment performed with Object B. Top left: handover phase.Top right: preplacing configuration.Bottom left: after the change of the relative gripper pose orientation by means of the slipping control algorithm (the so-called pivoting maneuver).Bottom right: actual placing phase.

Fig. 18 .
Fig. 18.Handover subtask performed with the carton box.Top: root mean square error e M .Bottom: norm of the commanded camera translational velocity (left axis: blue curve) and norm of commanded camera rotational velocity (right axis: red curve).The dashed lines represent the velocity limits.

Fig. 19 .
Fig. 19.Handover subtask performed with the plastic bottle.Top: root mean square error e M .Bottom: norm of the commanded camera translational velocity (left axis: blue curve) and norm of commanded camera rotational velocity (right axis: red curve).The dashed lines represent the velocity limits.