Dynamic Visual Tracking for Robot Manipulator Using Adaptive Fading Kalman Filter

This paper focuses on the problem of visual tracking of a moving target with the temporary occlusion of image feature, a dynamic visual tracking control system for robot manipulator is developed by using adaptive fading Kalman filter (AFKF). The estimation of the residual covariance is used to compute the forgetting factor to automatically adjust the weight of the image observation data for improving the visual state estimation accuracy. When the target features are occluded, the prediction of missing observation sequence are generated by using the predicted compensation noise and preorder observation sequence to determine the forgetting factor for estimating the missing visual states. Then, a parameter adaptive law with projection error compensation is designed to realize the visual tracking with uncertain camera parameters. Finally, the trajectory tracking experiments based on a real robot platform is carried out to verify the performance of the proposed state estimator and tracking controller. The results show that the proposed method can accurately realize the visual tracking with the occluded trajectory and inaccurate camera parameters, which improves the flexibility of dynamic visual tracking of robot manipulator.


I. INTRODUCTION
Target tracking is widely used in the navigation of mobile robot, object grasping and capture of robot manipulator. Visual tracking of a moving target uses the camera to acquire the target image features, establish the mapping between visual and task space, and obtain the control command of the robot, which is to realize the dynamic visual servoing control of the robot manipulator [1]. The application of vision sensor can improve the robot tracking performance in complex environment.
Generally, the moving targets can be divided into three types: (1) Identified objects, such as the ground lines, landmarks, which are dynamic objects relative to robots. Such targets are often used in navigation applications of mobile robots [2], [3]. (2) Target with predictable trajectory, which is mostly used for object capture [4], [5]. (3) Dynamic targets, the trajectory is unpredictable, and the motion state can only be obtained through the observer in real time to realize the The associate editor coordinating the review of this manuscript and approving it for publication was Saeid Nahavandi . robot tracking control. It is often used for grasping tumbling satellites [6]- [9], and face tracking [10], [11].
This paper focuses on the dynamic target tracking of the robot manipulator using visual servo method. The visual tracking control system for dynamic targets can be divided into visual state estimator and visual tracking controller in [12]. The visual state estimator can directly estimate the optimal system state and the object motion state in the image plane through the real-time observer. The visual tracking controller generates the robot's control command directly in the image plane. This design is also adopted in [13], [14], an observer is designed to estimate the position and velocity of the image feature points, and a tracking controller is designed to track the object trajectory using robot manipulator with uncertain model parameters.
In fact, considering the design of both state estimators and tracking controllers is not limited to the study on visual tracking of dynamic target, but it comes from the work on the design of state estimators and controllers for uncertain control variables in more common mechanical systems. Such as in [15], for the tracking problem of the flexible joint manipulator, a composite observer is designed to estimate VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ the fault and disturbance, and the sliding mode controller is designed for the fault-tolerant control. And as in [16], for the vibration of space manipulators for on-orbit service, neural network controller and disturbance are designed to suppress the influence of friction and dynamic coupling on joint control performance.
In the study of visual state estimation, the basic problem is to consider the effect of sensor measurement noise on the extraction of visual features. In [17], for visual tracking of micro-agents in minimally invasive surgery, Kalman state estimators are used to estimate the inter-sample states of micro-agents due to the low acquisition rate of medical imaging modalities. In [18], the extended Kalman filter (EKF) is used to estimate the object velocity state with visual measurement noise, which is applied to grasp a tumbling satellite. Reference [19] is also applied to the satellite tracking, where Kalman filter can not only estimate the motion state of the grapple fixture, but also estimate the inertial parameters of the satellite. In the above studies, Kalman filter is adopted as the state estimator, and of course, extended state observer (ESO) is also commonly used [20].
The hot topic of visual tracking is how to track the target features in the case of occlusions, feature losses and limited camera field of view (FOV). In the case of sensor failure (unreliable data) or measurement of feature loss, it is common to use additional sensors as compensation, such as laser rangefinder scanning in mobile robot research [21]. It requires designing a state observer to estimate unmeasurable visual features. For the case of the feature occlusion, [22] summarizes it as visual intermittent measurement problem. It discusses the convergence condition on the dwell time, the minimum time of object visible and maximum time of outside the camera FOV. The results ensure to the image state estimator convergence to reasonable bound under visual intermittent measurement. On the basis of [22], [23] designs an observer and predictor for the image feature outside the camera FOV. When the image features are unavailable, the target motion state can be estimated by switching system between the observer and the predictor. Similarly, the observers and predictors designed by [24] can estimate the position and orientation of 3D moving targets.
On the other hand, for the design of the tracking controller, it needs to consider the parameter uncertainties, which include robot dynamics model, grasp model and camera model. In view of the dynamics parameters and grasp model parameters, it often uses adaptive neural network method to compensate parameter uncertainty. In [25], multi-layer neural networks are used to approximate unknown nonlinear parameter dynamics of autonomous underwater vehicles (AUVs), adaptive robust control are adopted to compensate for environmental disturbances. Reference [26] uses the adaptive neural networks to approximate the unknown nonlinear dynamics of robot manipulator with dead-zone input. Reference [27] developes adaptive neural network-based visual tracking method to solve the problems of an uncertain object grasp position for dual-arm manipulation. The visual tracking task needs to determine the desired image features in advance, [28] discusses the problem of visual tracking controller without desired image features. The reference frame is defined by visual targets and planar motion constraint, and the pose estimation algorithm is designed for mobile robot.
In addition, inaccurate calibration of camera model parameters will reduce the performance of visual tracking. In [29], an adaptive visual servo tracking controller is designed to compensate unknown external parameters and visual feature depth when the parameters completely unknown. The camera model parameters and depth of image feature are included in the image Jacobian matrix. When there are uncertain for a few parameters, the uncertain parameters are linear extracted from image Jacobian. The adaptive law can be used to estimate parameters for the performance of the visual tracking system. Reference [30] focuses on the robot visual tracking for working on an unknown constraint surface, where the depth of image feature point is the unknown time-varying. The depth-independent interaction matrix framework and the depth parameter adaptive laws are developed to compensate the unknown depth. In [31], the visual tracking tasks are considered as a nonlinear optimization problem. The depth-independent interaction matrix is used for the linearly extraction of unknown camera model parameters, the model predictive control (MPC) is used to estimate parameters. Furthermore, on the basis of [31], [32] introduces visual constraints to the predictive control methods for avoid the visual features outside the camera FOV.
In this paper, we focus on the problem of visual state estimation and tracking control for a dynamic target. Adaptive fading Kalman filter (AFKF) algorithm is used for the state estimation under the case of image feature temporary occlusion. The forgetting factor is computed to adjust the weight of the image observation data to improve the visual state estimation accuracy. When the target feature lost, the preorder observation sequence is used to determine the forgetting factor for estimating the missing visual states. Compared with [22] and [23], the proposed method solves the tracking problem in the case of temporary and complete occlusion of image features by solving one parameter (the forgetting factor) without additional design of trajectory predictor. The advantage of this method is convenient for applying to a real robot platform. For the sake of the design of visual tracking controller, based on the depth-independent interaction matrix in [30] and [31], we consider to design the parameter adaptive law including projection error compensation to further improve the estimation accuracy of camera model parameters.
This paper is organized as follows: Section II discusses the design of visual state estimator, where a new computation method of the forgetting factor is proposed in Section II-B. The design of visual tracking controller is presented in Section III, which provides a new adaptive update law for uncalibrated camera parameters in Section III-B. In Section IV and V, simulation and experiment results are presented to demonstrate the effectiveness of the proposed control method. Finally, the conclusion and future work are provided in Section VI.

II. VISUAL STATE ESTIMATION OF DYNAMIC TARGET
Since the trajectories of a moving target are unknown, an observer is needed to estimate its motion state. The image position information of the target feature point can be obtained directly through the camera, and its velocity in the image plane can be calculated. However, the differential processing of image position introduces noise, then Kalman filter is used to reduce the noise and improve the accuracy of state estimation.

A. AFKF ALGORITHM
The normal Kalman filter is a linear Gaussian filtering algorithm based on the state space, which relies on accurate model parameters. However, in practical applications, the model parameters often deviates from the real system to a certain extent, thus the normal Kalman filter cannot guarantee the filter convergence. Kalman filter adopts iterative algorithm, where the state estimation at certain instant is influenced by all observation data [33]. Adaptive fading Kalman Filter (AFKF) uses the forgetting factor to adjust the weights of all observation data in real-time, to improve the utilization of the new observation data, and to reduce the influence of old observation data to state estimation.
Let the Cartesian trajectory of the target point o be (x o (t),ẋ o (t)) ∈ R 3 , and the corresponding image trajectory be (y o (t),ẏ o (t)) ∈ R 2 . AFKF is used to estimate the unknown motion state of the moving target in the image plane, which includes the position, velocity of the feature point. The where at the instant of k, y k is the measurement vector in visual space, A and C are the state transition matrix and observation matrix, respectively. W k and V k are the sequences of state and measurement noise. The adaptive fading Kalman filter algorithm (AFKF) is given as follows [33], Compared with the normal Kalman filter, the forgetting factor α k is introduced into the propagation ofP k in (5), which is the covariance between predicted and real value, and the update of gainK k in (6), so as to update the predicted O k and estimatedÔ k of state vector, and the error covariance P k between estimated and real value. Since the predicted error covariance matrix is enlarged by α k times, the utilization ratio of the new measurement data is improved. The forgetting factor α k is defined by (9) where β k is a scalar factor, and P k is the time propagation error covariance of the normal Kalman filter. To compute the α k , it needs to compute the β k .
where S k andS k denote the covariance of the residual and its estimation respectively, and they are calculated by the following formula, where r i is the residual, and r i = y i − CO i .
In AFKF algorithm, the estimation of the residual covari-anceS k is used to calculate the forgetting factor α k , and α k is introduced into the predicted error covariance matrixP k , which can increase the weights of new observation data applied in the filtering process and improve the accuracy of state estimation. When the gain K k is ideal, r k is the white noise vector, and its auto-correlation is, When the model parameters are inaccurate, the autocorrelation of r k is not zero. Then, the estimated of the residual covariance at k is, The covariance of the residual represents the role of the current error to the system. When the dynamic model is unknown or partially unknown, the covariance of the residual and predicted error will increase due to the action of the unknown part. It can be seen from (14) that the increase of the residual covariance due to β k can be considered as the increase of the predicted error covariance due to α k . Therefore, the incomplete data in the dynamic systems can be compensated by increasing P k . Since β k ≥ 1, then α k ≥ 1. VOLUME 8, 2020 When the increase of P k is small, the value (1 − β k ) is small, then (1 − β k ) R k can be ignored; When the increase of P k is greater than R k , it can consider (1 − β k ) R k ignored due to R k is small. Thus, α k is approximately equal to β k . Theñ P (k) = β (k) P (k), Therefore, this algorithm is called AFKF based on adjusting the covariance of predicted error.
According to the updated state vectorÔ k , the position and velocity trajectories of the target point in the image space are obtained, will be input into the visual controller in Section III as the desired trajectories of the target point.

B. COMPUTATION OF THE FORGETTING FACTOR
In Section II-A, we reviewed the AFKF algorithm, whose core is the computation of the forgetting factor. Its computation method is formulated by the specific task. In this study, the above AFKF algorithm is adopted, and a special computation method for the visual tracking task under the target occlusion is proposed. Specifically, we will explain how to compute the forgetting factor without the image observation data.
When the image features of target point are obscured, the AFKF algorithm will fail due to the lack of image observation data. The general strategy is to adopt the linear prediction method to approximate a linear movement for the target over the invisible time, i.e. assuming the velocity is constant. However, when the velocity changes greatly, it cannot achieve the target tracking effectively under this assumption. We use the preorder observation sequence to estimate the target state in the invisible stage to improve the tracking performance.
Assuming the image feature is obscured at k instant, the state vector cannot be obtained. At this time, the observation equation under the case of image feature occlusion is given, where y k denote the prediction of the missing observation vector, O k denote the prediction of state vector. According to (16), there is the covariance of an observation noise sequence R k between the observation and real value at k instant. Y k is not the real observation noise sequence, but it is the compensation noise, which approximates the real observation vector by introducing the covariance R k . Since there is an estimation error with the covariance P k between the estimated state vectorÔ k and the real state vector O k , the covariance of the observation noise sequence can be written as Since the current observation state of y k is unknown, its covariance P k is also unknown, which makes U k imponderable. We use preorder observation sequence y k−1 to replace y k , and its error covariance P k−1 to replace P k . The approximate error covariance matrix of the observation noise sequence at k instant is obtained as By (16), the new residual can be calculated by follows, Considering (18), AFKF update process under occlusion is, Substitute (22) into (19), and the new residual is obtained, According to the new residual r k and the covariance of new residual S k , the forgetting factor α k can be computed by (9) and (10). Therefore, under the case of image feature occlusion, the state vector O k can keep updating due to the computation of the forgetting factor in AFKF.

III. UNCALIBRATED VISUAL TRACKING CONTROLLER DESIGN
Inaccurate calibration of camera model parameters will reduce the performance of visual tracking. Therefore, the uncertainty of camera model parameters should be considered in the design of tracking controller. In this section, Image-based visual servoing method (IBVS) method [29] is adopted, where the image Jacobian matrix is linearized, which contains camera internal and external parameters, and the parameter adaptive law is designed to solve the parameter uncertainty of the camera model.

A. MODEL OF DEPTH-INDEPENDENT INTERACTION MATRIX
The image Jacobian matrix contains the depth parameters of the target feature points, as well as the camera internal and external parameters. Since the depth parameters appear in the form of nonlinearity, the camera model parameters cannot be linearized. This model of depth-independent interaction matrix is introduced to extract depth parameters, and the inaccurate camera parameters could be linearized.
Firstly, the camera perspective projection model is showed, which represents the mapping between the feature point coordinates in the image plane and Cartesian space.
where y (t) and x (t) are the image coordinate and Cartesian coordinate relative to robot base coordinate system, and z c (t) is the depth of the feature point in the camera coordinate system. F is the first two rows of the projection matrix M, and m 3 is the third row. The derivative of (24) is given, Considering the kinematic constraint of robot arm, (25) can be rewritten as follows, In (26), N (y (t)) ∈ R 2×4 is defined as the depth-independent interaction matrix, and it satisfies the linearization condition, i.e. for any vector ζ ∈ R 4×1 , N (y (t)) ζ can be linearly parameterized as, where B (y (t), ζ ) is the regression matrix, which is independent of camera parameters; θ is the parameter vector, which is composed of unknown parameters in the projection matrix M.
Since the depth z c (t) is nonlinear, it needs to separate z c (t) from (26) for extracting θ linearized. The depth-independent Jacobian matrix D (t) ∈ R 2×3 is defined, Then, (26) can be rewritten as, The unknown parameter vector θ is included in D (t), and its values are needed to designing an adaptive laws to determine, so as to obtain the joint control commands of robot arm.

B. DESIGN OF ADAPTIVE LAW FOR CAMERA PARAMETERS
When the camera model parameters are uncertain, the elements of the projection matrix M are uncertain. In this section, a new adaptive update law for uncertain camera parameters is proposed, which introduces the compensation of projection error estimation. Let the parameter vector be 4]. The tracking error of the feature points in the image space is given as follows, where the desired image trajectories y d (t) andẏ d (t) are obtained by (15). We defines the reference velocity in the image planeẏ r (t) in (32), which is obtained from the position tracking error y (t) and the desired velocity trajectorẏ y d (t), and κ is the positive constant. For (31), we introduce the estimation of the velocity tracking error ŷ (t) in the image plane, The adaptive sliding mode vector s y (t) in the image plane is defined as follows, whereŷ (t) denote the estimation of the image velocity trajectory, which is calculated by the estimation of projection matrixM (t). By (29), the estimation of the depthindependent Jacobian matrixD + is used to map the reference velocityẏ r (t) to the joint space of the robot arm, then the reference in the joint space is given, whereD Similarly, the adaptive sliding mode vector s q (t) in the joint space is defined as follows, SinceD (t) andẑ (t) are obtained byθ (t), which is linearly extracted from M. In order to ensure the existence ofD + (t), M (t) ∈ R 3×4 must be full rank, i.e. the rank ofM (t) is 3. Here a potential field force θ (t) is introduced to keep theM (t) full rank. The negative gradient function ∂ θ (t) ∂θ (t) is used to update theθ (t) to keep it away from the singularity region ofM (t) It also guarantees the existence ofD + (t), referred to [31] for details. Next, we define the estimation error of the projection matrix, that is M (t) − M (t) . By (24) and (27), the error is given, where m 3 (t) − m 3 (t) is the estimation error of the third row ofM (t), F (t) − F is the estimation error of the first two rows. θ (t) =θ (t) − θ is the updated error of θ (t).
In order to improve the accuracy of parameter estimation, the new estimation errors of depth z (t) and depth-independent Jacobian matrix D (t) are introduced as the compensation of projection error estimation. According to (29), the error is defined as,  By the regression matrix (x (t), y (t)) and (ẏ(t), x(t),ẋ(t)), the adaptive update law of θ(t) is designed as, where , P 1 and P 2 are the positive definite weighting matrix. According to the dynamic equation of robot arm, the visual tracking control law is designed as, where K 1 , K 2 and K 3 denote a symmetric positive definite matrix. In (40), the sliding mode vector s q (t) and s y (t) are regarded as the control error. The gradient function ∂ θ (t) ∂θ (t) is used to ensureD + (t) the existence.

A. THE BEHAVIOR OF VISUAL STATE ESTIMATOR
This section verifies the performance of AFKF algorithm in visual state estimation, which can be divided into two parts: the target trajectory with and without occlusion. In simulation, the origin of robot end-effector coordinate system is set as the feature point O of dynamic target. Firstly, the model parameters and initial values of robot arm are shown in Table 1, where the true trajectories O (t) of the target are given for comparison. The simulation time is 1.5s, and the sampling interval is 1ms. In order to indicate the superiority of the proposed method, the results obtained by the normal Kalman filter method are introduced for comparison, where both two methods adopted the same noise parameters. According to the desired trajectory (anticlockwise elliptic) in Table 1, the results obtained by the two methods are shown in Fig. 1. The performance of the estimated position trajectories are shown in Fig. 1(a). Since the initial position of the end-effector point is given inconsistently with the target feature point, both two trajectories move toward the target position rapidly in the initial stage.
In the process of the whole trajectory tracking, it is obvious that the KF trajectory does not coincide with the true trajectory, and there are large deviations in some stages, and the  AFKF trajectory basically coincide with the true trajectory. Fig. 1(b) shows the variation of the estimated error. The error of AFKF trajectory is affected by the large error in the initial stage with exceeding 10 pixels, and the overall tracking average error of AFKF trajectory is limited within 2 pixels. The estimated velocity trajectory is shown in Fig. 1(c). Since the differential operation is adopted which introduced the noise, and the estimated trajectory presents an oscillating state. The AFKF trajectory moves toward the true trajectory at a practical velocity in the initial stage, and follows the desired velocity trajectory in an oscillation state. In contrast, the KF trajectory moves toward the desired trajectory at a very large velocity in the initial phase, and the maximum reaches 35 pixel/s and the average exceeds 20 pixel/s from 0s to 0.5s, which is impractical for a real applications. After 0.5s, KF trajectory follows the desired trajectory with a certain tracking error, and the error is >5 pixel/s as shown in Fig. 1(d).
Although the error of AFKF trajectory reaches 35 pixel/s in the initial stage, it recovers rapidly <5 pixel/s within 0.2s, whose average error is <4 pixel/s. It is in line with a real tracking task of robot arm.
The performance of AFKF is verified in case of target feature occlusion, and the simulation results are shown as Fig. 2. Similarly, the linear prediction method and KF are used for comparison. Linear prediction is used to estimate the motion state under occlusion. The occlusion is lasted from 0.3s to 0.5s, when the observation cannot be obtained. The two methods are used to estimate the invisible motion state. As shown in Fig. 2(a), the two trajectories deviate from the true trajectory under occlusion, and the tracking error of linear + KF is much greater than that of AFKF. This is because the proposed method considers the preorder observation sequence, and predicts the noise Y k to compensate for the estimated state, thus the deviation is not too large. When the target features become visible at 0.5s, AFKF trajectory rapidly moves toward the true trajectory, due to the update ofS k . Since the forgetting factor is instantly corrected, the propagation of P k in (5) is rapidly adjusted, and it converges to the true trajectory. After the feature is visible, the linear + KF trajectory slowly is adjusted, to move toward the true trajectory. The position tracking error of AFKF in the occlusion stage is the maximum of 20 pixels, and the average is 10 pixels. And the maximum of linear + KF is 45 pixels and the average is 20 pixels, as shown in Fig.2 (b). In comparison, the velocity tracking error of the two methods is larger than that of the position trajectory due to the larger noise of velocity estimation. As shown in Fig.2(c), the overall trend of the estimated velocity trajectory is similar to that of the position trajectory. When the target feature becomes re-visible, the estimated trajectory moves slowly toward the true trajectory, not immediately. This is also because the propagation of P k is affected by the convergence due to the influence of large noise. In the occlusion stage, the tracking error of AFKF is the maximum 25 pixel/s and the average 15 pixel/s. The maximum of KF is 40 pixel/s and the average is 22 pixel/s. It is indicated that the introduced noise Y k plays an obvious role in the occlusion stage. In simulation, the estimated trajectories in section 4.1 are used as the desired trajectories, which are obtained by AFKF and linear + KF methods. The visual tracking results without occlusion are shown in Fig. 3. Compared with the desired trajectory in Fig. 1, AFKF trajectory follows the desired trajectory with a very small tracking error. Since the estimation M of the projection matrix is used, the imprecise parameters are adjusted through the parameter adaptive law (39) in the initial stage in order to reach the tracking requirements. Thus AFKF trajectory slowly converges to the desired trajectory, as shown in Fig. 3(a). In addition, KF trajectory can also be used to follow the desired trajectory through the proposed parameter adaptive law. In the upper half of the true trajectory, KF trajectory deviates significantly, which corresponds to the estimated desired trajectory in Fig. 1(a).
Compared with the results of trajectory estimation, KF trajectory converges to the true trajectory with a smaller error, which indicates that the proposed adaptive method  can achieve the desired trajectory tracking through parameter update. Although there is some error between the estimated desired and true trajectory.
The tracking error is collected in Table 2, where the maximum value represents the tracking (estimated) error at the initial time, which is the moment with the maximum error in the tracking process. The average error of AFKF position trajectory is 0.43, which is nearly 80% lower than the KF error of 2.31, and the average error of velocity trajectory is also 71% lower. The results indicate that the advantage of AFKF is mainly reflected in the position estimation. Due to the oscillation in the velocity estimation, the accuracy is relatively reduced.
The tracking results in the presence of occlusion are shown in Fig. 4. Again, the overall trend of the curve is similar to that of Fig. 2. In the initial stage, the parameter adaptive law is used to adjust the parameters of the inaccurate camera model, and the tracking trajectory move toward the true trajectory. As shown in Fig. 4(a), the trajectory follows the desired trajectory in Fig. 2(a) during the occlusion stage, which indicates that the inaccurate camera parameters do not affect the actual tracking performance. When the features become visible, AFKF trajectory converges to the true trajectory with small fluctuations, which indicates that there is a certain error in the actual tracking control. In the occlusion stage, the average tracking error of AFKF trajectory is <10 pixels, and KF is >15 pixels, as shown in Fig. 4(b).
The tracking and estimation errors with occlusion are collected in Table 3. The average of AFKF position estimation is 6.46 pixels. The estimation error increases significantly compared to the results in Table 2 due to the lack of observation. Compared with the linear prediction and KF methods, AFKF reduces the estimation errors of position and velocity by 65.8% and 54.2%, respectively. In the occlusion stage, the estimation error will increase, and the maximum represents the deviation between the estimated and the true value when the target become visible, and it is easy to know that the trajectory estimation of AFKF has a smaller deviation from the true trajectory. In the whole process, AFKF is better to estimate the target trajectory, thus the tracking error is smaller.

V. EXPERIMENTS AND RESULTS
In the experiment, a monocular color camera with a resolution of 800 * 600 pixels is used, which is fixed above the robot's workspace, as shown in Fig. 5. The images of the working area of the robot manipulator are acquired by the camera in real time, and the feature points pasted on the end-effector are obtained through object recognition and feature extraction. The dual-arm 7DOF robot platform is adopted, in which the left arm is used to simulate the unknown motion of the target feature (target arm), and the right arm is used to follow the motion of the target feature (tracking arm). The feature points are pasted on both the end-effectors of the target arm and the tracking arm, which are red and green squares respectively.
The image coordinates of the square center are obtained by fast image processing algorithm. The number of the feature points on the each end-effector is four, which can avoid the image Jacobian singularity [27]. The coordinate transformation matrix of each feature point relatively to the coordinate system of the end-effector is precisely calibrated in order to obtain the Cartesian coordinates of the feature points through the robot kinematics.
The experiment is divided into two groups. The first group verifies the performance of the proposed uncalibrated visual tracking controller. Given the desired image trajectory, the end-effector features of the tracking arm are controlled to achieve visual tracking. The second group verifies the overall performance of visual estimator and tracking controller. The features on the target arm move with the given random image trajectory (the trajectory data is recorded as error calculation only), and the features on the end-effector of the tracking arm are controlled to following the motion of the target features.

A. EXPERIMENT 1: GIVEN THE KNOWN VISUAL TRAJECTORIES
The system parameters and initial state are shown in Table 4. The estimation of the camera projection matrix at the initial timeM (0) is arbitrarily given, and its true value M is unknown. The desired image trajectory of the feature point on the target arm is ellipse, and the image trajectory of one feature point is only given, since the trajectories of the other three feature points are similar, except for the difference in spatial distance.
The snapshots of experiment 1 are shown in Fig. 6. According to of the labeled desired trajectory, the tracking arm move with visual tracking controller. The path point A is the initial position, and the process is divided into two parts: adjustment stage A-C and tracking stage D-F. In the whole process, the camera parameters are continuously updated to achieve an accurate visual tracking task.
The experimental results are shown in Fig. 7. Fig.7 (a) shows the image trajectory tracking results. Due to the influence of motion state estimation error and camera measurement noise, the tracking error in experiment is larger than that in simulation, but the maximum is limited within 6 pixels, and the average is <3 pixels, as shown in Fig.7 (b). Fig.7 (c) represents the tracking trajectory in Cartesian space, and it can be seen that there is an obvious change in the depth direction (z-axis), which indicates that the proposed tracking method achieves the 3-D target tracking. However, the motion along the z-axis of the camera frame is limited to a small scale. This is because the proposed method adopts image-based visual servoing method (IBVS) of the monocular camera, which does not estimate the depth information, thus the proposed method is not suitable for the large scale motion along the z-axis. Fig.7 (d) shows the all joint trajectories, and it can be seen that the range of joint angle variation is small, which reflects the stability of the tracking process.

B. EXPERIMENT 2: UNKNOWN AND OCCLUDED VISUAL TRAJECTORIES
In experiment 2, the feature trajectory of the target arm is shown in Table 5. Again, the image trajectory of one feature point is given. The tracking arm only moves to follow the motion estimated state of the target feature observed by camera. The occlusion of the feature points is carried out randomly. At a certain moment, a white paper is artificially covered over the target feature points, thus the feature points are completely covered. After a proper time interval, the white paper is removed and the feature points are re-visible. This time interval is determined according to the actual situation, i.e. when the relative pose between two arms reaches a certain range or the features are close to out of the camera field of view, the white paper is immediately removed. The controller parameters are the same as in experiment 1, and the experimental snapshots are shown in Fig. 8.
In order to avoid collisions, a safe distance between the two end-effectors is required. Therefore, the visual tracking error is not zero, but is the constant image distance between the corresponding feature points. The path point A is the initial state, C is the stable tracking stage, F is the start of occlusion, and H is the re-visible stage. The process is divided into four stages: in the A-C stage, the motion state of the target features are estimated by AFKF, and the visual controller is used to drive the tracking arm to quickly approach the target arm. In C-F stage, the uncertain camera parameters are updated to achieve accurate target tracking. In F-H stage, the features are not visible, and AFKF estimates the motion state and guides the tracking arm to move. Although the tracking arm deviates significantly from the target trajectory, the general trend remains consistent with the target trajectory, which verifies the effectiveness of AFKF trajectory prediction. In H-I stage, the target is re-visible, and the motion state estimated by AFKF becomes accurate, and the tracking arm again rapidly approaches the target trajectory.
The experimental results are shown in Fig.9 (a) and (b), which show the results of image trajectory tracking and error respectively. The experiment lasted closely to 40s.  At the initial state A, the image distance between the two groups of feature points is more than 200 pixels. In A-C stage (0-8s), the tracking arm moves toward the target at 25 pixels per second. In C-F stage (8s-22s), the tracking arm performs stable tracking with an error within 2 pixels. In F-H stage (22s-33s), the average tracking error is <9 pixels when the target is not visible. In H-I stage (33s-38s), the target is re-visible, and the tracking arm moves toward the target at >10 pixels per second, and finally the tracking error is within 2 pixels in 3s. We further analyze the stable tracking stage C-F and occlusion stage F-H. The results are shown in Table 6. In C-F stage, the average error is 2.76 pixels, which comes from many sources. The main source is the position estimation AFKF (1.86 pixels), and the rest is from the measurement and camera parameter update error, which is less than 1 pixel. The maximum denotes the error when the features become not visible. Generally, the range of the tracking error with <3 pixels can be applied to actual robot applications. In F-H stage of the invisible feature, the average error is 14.05 pixels, where the errors from the position and velocity predictions of AFKF contribute 5.37 and 12.62 pixels respectively, and the time of occlusion increase with the maximum (H point) reaching 16.58 and 30.45 pixels. In general, during the occlusion 11s, the movement distance of the target feature is more than 50 pixels, and the final error is about 8 pixels, which indicates that the proposed method is more suitable for the situations of the feature occluded with short time.

VI. CONCLUSION
In this paper, an AFKF-based visual tracking method of moving targets for robot manipulation is developed to solve the problems of the visual state estimation under occlusion and visual servoing with parameter uncertainty. The computation method of the forgetting factor is proposed to improve the prediction accuracy of occluded trajectory. The projection error compensation is introduced into the adaptive law of uncertain camera parameters to improve the tracking accuracy. The simulation results indicate the effectiveness of AFKF state estimator and uncalibrated visual tracking controller.
The two experiments are carried out by a real robot platform, and the results show that the ellipse trajectory tracking is achieved with an average error <3 pixels when the real camera parameters are unknown. The unknown and occluded trajectory tracking is successfully achieved with an average error <3 pixels without occlusion and <15 pixels with occlusion. The experimental results can satisfy the requirements of the common robot manipulation applications. In particularly, the proposed method is suitable for the extraction of non-robust target feature in a visual tracking task of robot manipulation, which is easy to lose due to environmental interference. Furthermore, the target occlusion in experiment is randomly given, and the occlusion time will be further discussed in future work to expand its application range.