AFAM-PEC: Adaptive Failure Avoidance Tracking Mechanism Using Prediction-Estimation Collaboration

During recent years correlation tracking is considered fast and effective by the virtue of circulant structure of the sampling data for learning phase of filter and Fourier domain calculation of correlation. During the occurrence of occlusion, motion blur and out of view movement of target, most of the correlation filter based trackers start to learn using erroneous samples and tracker starts drifting. Currently, adaptive correlation filter based tracking algorithms are being combined with redetection modules. This hybridization helps in redetection of the target in long term tracking. The redetection modules are mostly classifier, which classify the true object after tracking failure occurrence. The methods perform favorable during short term occlusion or partial occlusion. To further increase the tracking efficiency specifically during long term occlusion, while maintaining real time processing speed, this study proposes tracking failure avoidance method. We first propose, a strategy to detect the occlusion using two cues from the response map i.e., peak correlation score and peak to side lobe ratio. After successful detection of tracking failure, second strategy is proposed to save the target being getting more erroneous. Kalman filter based predictor continuously predicts the location during occlusion. Kalman filter passes this result to Support Vector Machine (SVM). When the target reappears in frame, support vector machine based classifier classifies the correct object using the predicted location of Kalman filter. This decreases the chance of tracking failure as Kalman filter continuously updates itself during occlusion and predicts the next location using its own previous prediction. Once the true object is detected by classifier after the clearance of occlusion, this result is forwarded to correlation filter tracker to resume its operation of tracking and updating its parameters. Together these two proposed schemes show significant improvement in tracking efficiency. Furthermore, this collaboration in redetection phase shows significant improvement in the tracking accuracy over videos containing six challenging aspects of visual object tracking as mentioned in the literature.


I. INTRODUCTION
Visual object tracking has always been considered as an active area of interest in the research field of computer vision because of its side spread applications and challenging issues like motion blur, object deformation, noisy environment, fast motion, clutter and finally occlusion [1], [2]. Long term tracking is considered effective if an algorithm tracks an The associate editor coordinating the review of this manuscript and approving it for publication was Mehul S. Raval . object of interest for long duration of time in all or any of the above challenging scenarios. Without considering orientation estimation of the object, tracking process can be divided into two sub parts i.e. i) translation estimation and ii) scale estimation of target in next frame [3].
For translation estimation, broadly tracking algorithms can be divided into two groups; i) generative and ii) discriminative. In the generative scheme, the information of the object is used while considering tracking as search problem. The discriminating scheme considers the tracking as VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ classification problem, while using the object and its background information. Discriminative tracking using correlation filter is studied by a number of researchers in the field of object tracking [4]- [11]. Exploiting circular structure and computing correlation in frequency domain which is simply multiplication extreme fast tracker is presented in [12]. Due to adaptive nature of correlation filter, online fast learning mechanism makes correlation filter suitable for fast appearance changing object tracking. Though correlation filters are very successful in visual object tracking but still two major limitations exist; First they do not have the inherent capability of tracking resumption once the object is lost or the object moves out of the camera's field of view. Second is that the less reliable tracked frame causes the correlation filter to learn wrong target appearance and this learning error accumulates with the passage of frames. The first limitation is addressed in [9], [13] by considering the redetection module, where redetection is carried out in each frame, which in turn increases the computational cost. Another approach to reduce the computational cost is by defining a threshold to activate the redetection module is presented in [14]. The second limitation of correlation filter based tracking is solved in [3] by learning multiple correlation filters having different learning rates. To cover the fixed template size problem of kernel filters, correlation filter adaptive to scale changing is presented in [15]. Dense spatio-temporal context information is used in [16] to increase the efficiency and robustness of the correlation filters. Simple tracking approach with an appearance model based on multi-scale image feature extraction using data-independent basis is presented in [17]. Particle filters is also incorporated in kernelized correlation filters to redetect the tracker when response map becomes less reliable [18]. Fusion of multiple features in correlation filter framework is proposed in [19]. In this method adaptive weights are assigned to each feature to minimize the interference of noise. To enhance the quality of response map in correlation filter based algorithms, metric learning model strategy is given in [20]. Convolutional Neural network based tracking strategies are also proposed by numerous researcher during recent years, for latest examples, see [21], [22]. These neural network based algorithms require a lot of training data and large computational time.
In this paper we proposed a new scheme which incorporates the Kalman filter and support vector machine (SVM) into discriminative correlation tracking. Moreover, MULTI-CUE based reliability detection scheme i.e. frames which are most reliable to update the target model is presented.
The main contributions of the scheme proposed in this paper are: 1) The Kalman filter based prediction under heavy occlusion shows better tracking efficiency compared with [14], in case of linear motion (because linear Kalman filter is used in this paper). 2) It is shown that peak correlation score alone is not good enough to detect heavy occlusion, motion blur, scale variation, background clutter, out of plane rotation and deformation. We provide comparison with the pervious works [14], [16], [17]. Hence, the peak to side lobe ratio is incorporated with peak correlation score in proposed strategy.
3) The presented algorithm in this paper is able to avoid the template update under erroneous input (wrongly tracked object) and gives significant improvement in tracking accuracy.
In the end we compare the results of the proposed algorithm with the state-of-the-art trackers on selected challenging videos having six attributes, from benchmark data sets OTB50 [23], OTB100 [24], TColor-128 [25], and UAV-123 [26].
The remainder of the paper is organized as follows: Section II presents the related work and background. Section III presents the proposed tracking scheme. Section IV presents implementation details. Section V presents analysis and evaluation of the proposed tracking scheme. Finally, Section VI presents conclusion.

II. RELATED WORK AND PROBLEM BACKGROUND
In recent years there has been an increase in interest of researchers in computer vision in the field of Visual object Tracking (VoT) because of the availability of high-speed computational resources along with availability of benchmark datasets and results. This section will give insight of most closely related tracking schemes to the proposed method: i) correlation filter based tracking ii) tracking learning and detection also known as tracking by detection. Further detailed discussions are available in [2], [27]- [30].

A. TRACKING LEARNING AND DETECTION (TRACKING BY DETECTION)
This method considers the object tracking as detection problem in every frame. To make the correlation filter adaptive to the appearance changes of the target of interest, recently proposed methods draw positive and negative sample around the expected target to update the classifier discussed in [3]. However, slightly erroneous labeling of samples accumulate over time and tracker start drifting. This problem is known as sampling ambiguity. To handle it many methods have been proposed such as ensemble tracking [30], randomized ensemble tracking [31], adaptive randomized ensemble tracking [32], online multiple instance learning [33], and transfer learning based tracking [34], [35]. Another problem, with the approach explained at the start of this paragraph, is tradeoff between stability and adaptivity. To keep the system stable along with reasonable model adaptivity, tracking scheme has been decomposed into three modules i.e. training learning and detection given in [36], [37]. Basic idea in the subsequent method is to update the detector with conservative rate using the extra sample obtained from the results of aggressively updated tracker. This online detector can be used in case of occurrence of tracking failure. Examples of such tracker are given in [9], [13], [14]. Online detector for reinitialization of tracker in case of tracking failure is also proposed in [3]. The detection module is activated only if the response is lower than specified threshold. In our proposed tracking scheme, we also use, support vector machine based online trained detector module which is different from the already proposed [14], [14] techniques. We activate the support vector machine based detector module on the basis of two parameters rather than only peak correlation value. In our approach, Adaptive Failure Avoidance Tracking Mechanism using Prediction-Estimation Collaboration (AFAM-PEC) response map is utilized to calculate the peak to side lobe ratio along with peak correlation value.

B. CORRELATION TRACKING
Correlation filters are applied in many application areas like object detection and recognition [38]. This operator simply works as element to element multiplication in frequency domain, researchers have applied correlation filter extensively to visual object tracking in the last decade due to its less computational cost attribute. Minimum output sum of squared error (MOSSE) filter is proposed in [4] for tracking on monotonic images, where the filter is updated on every frame. This filter is computationally inexpensive having processing speed of more than hundred frames per second. Kernelized correlation filter is proposed in [12], [39], which employs the properties of circulant matrices for extreme fast learning and detection with the help of fast Fourier transform. Efforts have been made to enhance the tracking performance by using correlation filters. Examples of algorithms based on correlation filter includes multi-channel filters [39]- [41], spatio-temporal context learning [16], scale handling and estimation [42]- [44] and spatial regularization [45]- [47]. Most of these techniques are very good in adopting the fast changing appearance of the model, but due to non-availability of long term memory of target appearance, these techniques are susceptible to drift in case of occlusion and out of the view movement of target object. This problem is solved by keeping the long-term memory of the target and deploying two filters, one for short term memory and other for long term memory [3]. At the same time this increases the computational cost and more memory will be consumed. Compressive tracking algorithms are also presented in recent years which extracts the features from multi scale feature space whose basis function do not depend upon the data. One of the examples of these types of trackers is given in [17].
Unlike existing techniques that employ only correlation filter for translation estimation even during occlusion, we introduced the predictor module, to handle the drifting/tracking failure in case of occlusion, motion blur and out of the view movement of the target. In our approach (AFAM-PEC) predictor is incorporated with the short term memory correlation filter. Peak to side lobe ratio and peak correlation score from response map is calculated to predict the occlusion/motion blur/out of view movement of an object. Based on these two parameters confidence score is calculated, which will Algorithm 1 Proposed Tracking Scheme Input Box at x o Containing Target, Output Estimated position and scale of an object x t = (x t ,ŷ t ,ŝ t ), Context Regression model R con , target appearance regression model R tar , support vector machine based classifier D svm and measurement follower predictor P kf .

Repeat
Extract the search window in current frame t at (x t−1 ,ŷ t−1 ) and draw out the features; //Translation estimation Compute the correlation response map y t using R con and equation no (4) and estimate the new positionŷ t ; //Scale estimation Build the target pyramid around (x,ŷ t ) and compute the correlation map y s using R tar and (4); Extract the parameters from response map y s ; Estimate the optimal scale s using (7); X t = (x t , y t ,ŝ); // Target prediction and redetection Do Use P kf to predict the next state of object, x t = x t (P kf ) using (x t ,ŷ t ); Use detector D rf to find the all possible states x i in X (D rf ); Compute y i , for each state in X (D rf ); Stop updating R t and jˆs; if max (y i ) > T t then x t = x I ; where i = argmax, X i y I ; input to kalman = x t in (13); update the parameters of the detector using (9); else xt = x t ; input to kalman = x t while max (yˆs) < T r or PSR (yˆs) < T psr end Update R con using (5,6); if max (yˆs) < T a and PSR (yˆs) > T psr then Update R t using jˆs and (5,6); end Until end of video sequence; decide the reliability of the tracking result for that specific frame. Short term memory filter will stop updating its weights if tracking reliability is less then certain threshold say t r . Measurement follower predictor is used to predict the next position of the object during occlusion time. Once the tracking result reliability approaches specified threshold say t s , again short term memory filter is activated to estimate the next state of the object.

III. THE PROPOSED TRACKING SCHEME
Our objective is to develop robust online training based visual object tracking algorithm, to handle the long term occlusion more effectively in comparison to other already proposed long term tracking methods. Without considering the orientation of the object, tracking is simply the estimation of the translation and scale of an object [3], [14]. In our proposed framework, the translation estimation is based on the correlation of temporal context and scale estimation is based on the discriminative correlation filter.
In this section, the main components of the proposed tracking procedure are described. First of all long term correlation tracking [3], [14] is described in IIIA. Next we describe the estimator module [14] in section IIIB, we use support vector machine classifier. Section IIIC describes the Kalman filter based predictor. Finally, in section IIID, predictor strategy is described, which assist the translation estimation filter during long term occlusion. Different notations and variables used in the following sections are given in the Table 1.
where σ is mapping to kernel space and λ denotes regularization parameter, which is always greater then or equal to zero. As labeling is not binary, hence w contains the coefficients of gaussian ridge regression model [48]. By using fast Fourier transform the above objective function is minimized to the (2).
where c is calculated by (3) using discrete Fourier transform as follows: f denotes the discrete Fourier transform (DFT) and y = In the new frame, response map over image patch u of size P×Q is calculated by using inverse discrete Fourier transform as per (4).ŷ where is element-wise multiplication,x is learned target appearance model and maximum value ofŷ is the new target location. Two correlation filters are trained using single frame, one to model the target appearance solely and other to model the surrounding along with the target. As surrounding information does not change quickly and remains temporally stable, it is very useful to differentiate the target from the background in case of occlusion [3], [14], [16]. Weighted cosine window is applied to feature channels to remove the boundary discontinuities of the response map. Context Regression model R con is adaptive to cater the occlusion, abrupt motion and deformation with the learning rate as:î Target appearance regression model R tar is learned from the most reliable and confidently tracked frames. Reliability is determined using maximal value ofŷ [14]. Unlike the existing techniques [3], [14] to maintain the model stability in true letter and spirit, two thresholds are defined to update the target regression model R tar using (4). First threshold T a is on peak correlation value. Second threshold T psr , is on peak to side lobe ration of the response map. If both the criterion are met, then only target appearance regression model is updated using (4) i.e. max(ŷ) > T a & PSR (ŷ) > T psr . Not that only the peak correlation value is enough to ensure the model stability in case of long term occlusion, which can be seen in Fig. 11. We update the target appearance regression model only if the tracker results are above the certain reliability threshold i.e. T tar , we keep the learning rate β aggressive. For optimal scale selection of tracked target, image pyramid technique is implemented using the concept of [34]. If P × Q is the size of target and N is the number of scales, thenŝ where each s ∈ S, Unlike [34], we make the updating of target regression model more robust and R tar is updated using (4) if it satisfies the condition max(ŷˆs) > T a & PSR (ŷˆs) > T psr .

B. SUPPORT VECTOR MACHINE BASED ESTIMATOR
To increase the robustness of tracking algorithm, detection module is necessary to recover the target when tracking failure occurs due to long term occlusion and reentering into the camera view, erroneous input to model update module or out of camera view movement of an object. Researchers proposed this model of online detector and carried out redetection on each frame in [13], [49], [50]. To decrease the computational efficiency, certain threshold is defined to activate the detector. Detector is activated only if the maximum value of response map is less then certain predefined threshold [3]. Unlike to these two approaches, support vector based detector in collaboration with Kalman filter based predictor is implemented in our approach i.e. proposed detector is activated if either of the following two conditions are true i.e. i) max (yˆs)<T r and ii) PSR (yˆs)< T psr . SVM is trained incrementally by considering thick training samples around the estimated position. Binary labels have been assigned with respect to overlap ratio as given in [50]. We assume the training set {f i , c i |i= 1, 2, . . . ,N is given having N number of samples in the frame. f i is the feature vector of i th sample and c i is the binary class label for i th sample i.e. c i ∈ {+1, −1}. SVM classifier is defined as follows: where h is hyperplane of SVM detector, l (h; (v, c)) = max {0, 1, . . . , c h, v } and h, v is inner product between v and h. Passive aggressive algorithm is applied to update the hyperplane parameters as follows: where T ∈ (0, +∞) controls the rate of updating of h, is the gradient of loss function. Unlike existing techniques, in our proposed work the parameters of the detectors are updated using (9), when the max (y i,t ) > T t , where y i is the response map value for i th possible state calculated by the detector out of i number of states in X (D rf ) for t th frame.

C. KALMAN FILTER BASED PREDICTOR
In literature different Kalman filter [51] based tracking algorithms have been proposed, for example [52]- [55]. To increase the efficiency of tracking algorithms, Kalman filter based algorithms have been hybridized with many other tracking algorithms, some of the examples are in [56]- [58]. Different from the existing techniques, we incorporated Kalman filter in synchronization with estimator module to avoid the tracking failure caused by long term occlusion, motion blur or clutter background. Kalman filter works in closed loop cycle with prediction and correction steps given by (10)-(14) respectively. In our proposed tracking framework, Kalman filter is activated in case of the failure of tracking caused by any of the above-mentioned issues. During occlusion Kalman filter takes the current state from the main tracking algorithm (in our case it is KCF) defined by (1)- (4) and predicts the next state by using (10) and (11). Main tracking algorithm (KCF) will stop updating its parameters and target appearance regression model. In the next frame Kalman filter corrects itself using the previous location predicted by (10) and (11) during occlusion. Formulation of Kalman filter is given by (10)- (14). Prediction: where, x t is predicted state at t th frame, A is state transition matrix, S t is posteriori error covariance matrix, Q is covariance matrix of dynamic noise, B is input noise and A is state transition matrix. Correction: where H is measurement matrix, Y t is the measurement to the Kalman from the main tracking algorithm. Depending on the condition this measurement may come from either of the three sources i.e.; i) main tracking algorithm ii) estimator module iii) Kalman filter self-prediction in the previous frame. Depending on these two conditions; i) max (yˆs) < T r or PSR (yˆs) < T psr ) and ii) max (y i ) > T t . Kalman filter continues to predict the next state during occlusion and send predicted state to predictor-estimator collaboration module.

D. PREDICTOR-ESTIMATOR COLLABORATION
Unlike existing techniques, collaboration module is proposed to handle long term occlusion, motion blur and clutter background. Most of the already existing methods model the target using its appearance. Major problem associated with these methods is their incapability of predicting the state of the object during occlusion. When the object re-enters the field of view of frame after occlusion, different tracking techniques have been proposed to recapture the object such as [3], [14]. Different from existing frameworks, our proposed scheme (AFAM-PEC) activate the predictor and estimator at the same time, when the object gets occluded i.e. max(yˆs) < T r or PSR (yˆs) < T psr .. During the occlusion period predictor starts predicting the location of the target and SVM based classifier starts estimating the position of the object. If estimated position by SVM based classifier satisfies where P kf represents the predictor and D rf represents the detector. yŝ is estimated position at estimated scale, R con is context regression model, R tar is target appearance regression model. the condition max (y i ) >T t , this position is considered a correct estimate and is given to Kalman filter as measurement to predict the next location. However, if estimated position does not satisfy the condition max (y i ) >T t , Kalman filter based predicted position is given to the estimator to estimate the next location in the next frame and same is given to Kalman filter to predict the next location in the next frame. This approach shows significant improvement in results in comparison to [14], [16], [17]. Flow chart of this module is given in Fig. 3.

IV. IMPLEMENTATION DETAILS
The complete flow of proposed tracking scheme is presented in Algorithm 1. The corresponding flow chart of novel detector-estimator collaboration module is presented in Fig. 3. In this paper, we computed multilayer features at fraction of cost using the technique presented in [59]. Histogram of the oriented gradient with 31 bins in histogram along with histogram of local intensities (HoI) with 6 × 6 windows using 8 bins is implemented. To cater fast illumination variations, HoI is applied on brightness channel and transformed brightness channel as given in [60] is implemented. Context regression model R con is trained using forty seven channels feature vector. Whereas target appearance regression model R tar is trained using HoG features with 31 number of bins only. Constant velocity model of Kalman filter is implemented. Gaussian kernel is used in both target appearance regression model and context aware regression model. Correlation in (2) and (3) is computed in Fourier domain. Detection is done by sliding window scanning fashion similar to [61]. SVM classifier is trained considering very large number of samples around the estimated location. Samples having overlap ratio with the target model bounding box greater than 50% are given positive labels, whereas samples having overlap ratio less than 10% are assigned negative labels. Regularization parameter in (1) is assumed to be 10 −4 , search window in frame is 180% of the target object size, width of kernel is set 0.1, learning rate β is considered 0.01. For scale handling, 21 number of scales are considered and α scale factor is considered 1.08. To turn on the SVM based detector and Kalman filter based predictor, threshold T r is considered 0.25. Detectors results are considered reliable only if the threshold T t > 0.5. The second threshold T t is considered 0.5. Threshold T a for updating of target regression model R tar  is 0.5. Most of the parameters are based on [3], [14], with slight variation or no variation at all. The proposed tracking scheme is implemented in MATLAB (2019) on intel core i7, 7th generation, 2.80 GHz processor, RAM 16GB, machine with 64bit windows 10 operating system.  [14] is unable to achieve 100% distance precision at threshold of 20-pxels.

V. RESULTS AND DISCUSSIONS
Our proposed tracking scheme is evaluated and compared on number of selected videos form benchmark datasets OTB50 [23], OTB100 [24], TColor-128 [25], and UAV-128 [26]. OTB50 contains 50 videos, OTB100 contains 100 videos, TColor contains 128 color sequence and UAV-128 contains 128 videos, captured using unmanned air vehicle. Each video has one or more object tracking challenges associated with it. We choose the videos having seven attributes namely; i) occlusion ii) scale variation iii) motion blur iv) fast motion v) out-of-plane rotation vi) deformation and vii) background clutter to support and evaluate our proposed tracker. Explanation of each attribute is given in TABLE 2. Qualitative evaluation is given in Fig.4a, Fig. 5a, Fig. 6a, Fig. 7, and Fig. 8. We evaluate our tracker quantitively using; i) Distance precision metric as per Fig. 4b, Fig. 4c, Fig. 5b, Fig 5c, Fig. 6b, Fig. 6c and TABLE 3, ii) Overlap success rate metric as per Fig. 9, Fig. 10 and TABLE 4. Processing time comparison is also given in TABLE 5. We compared our tracker on benchmark dataset videos with long term correlation tracker (LCT) [14], spatio-temporal context learning (STC) [16], and real time compressive tracking (CT) [17]. This paragraph explains the attributes of each video which also known as challenging aspects by visual tracking community. Six videos shown in Fig. 7 i.e. Jogging1, Jog-ging2, Walking2, Human3, Girl2, Skating2 are selected from OTB100 dataset. These six videos are also part of TColor-128 dataset. Whereas, Fig. 8 shows five videos i.e. Bike3, Car4, Car9, Busstation and Building3. Out of these five videos, four videos are part of UAV-123 Dataset and single video Busstation is from TColor-128 dataset. Hence total of six videos are from OTB100, seven video sequences from TColor-128 and four video sequences from UAV-123 are used to evaluate our propose AFAM-PEC tracker. Jogging1 and Jogging2 sequences have Occlusion, Deformation and Out of plane rotation. Walking2 sequence has attributes of scale variation, occlusion and low resolution. Girl2 and Human3 video sequence have maximum challenges i.e. 5. Challenges associated with Girl2 video sequence are namely, scale variation, occlusion, deformation, motion blur and out of plane rotation. Whereas Human3 video contains scale variation, occlusion, deformation, out of plane rotation and background clutter. Skating2 sequence has four attributes associated with it i.e. scale variation, occlusion, fast motion and out of the plane rotation. Bike3 contains Fast Motion, Occlusion and Out of Plane rotation. Car4 and Car9 has Occlusion and Scale Variation. Bustation video sequence has Clutter Background and Occlusion. Finally, Building3 video sequence contains Out of  [14] is unable to achieve 100% distance precision at threshold of 20-pxels.
the Plane Rotation. Therefore, a total of seven attributes are associated with these selected eleven videos. Each attribute is explained in Table 2.
Quantitative results in Table 3 and TABLE 4 show that our tracker perform well for long term occlusion challenge. For Jogging2 sequence, the proposed tracker gives distance precision of 100% at 20-pixel threshold and LCT [14] tracker gives the second highest precision of 97%. Whereas all the remaining trackers fail to track the object after occlusion. Likewise, on Jogging1 sequence, the proposed scheme again achieves 99% precision. On Walking2 sequence the proposed tracking scheme again gives 100% distance precision. Whereas all the remaining three trackers loses the target when the girl in the video sequence gets occluded with boy at frame number 202. On Girl2 sequence, the proposed tracking algorithm again achieves good performance having distance precision of 95% and all the other trackers fail to track the target object. On Human3 video sequence our tracker outperforms all the remaining three trackers by achieving distance precision of 99%. It is also worth mentioning that human3 video contains five challenging attributes out of total nine attributes given in [21]. Skating2 video sequence has extra challenging attribute of fast motion. Though the proposed tracker along with all the other trackers fail to track the target object but still our tracker achieves the second highest distance precision and tracks the target object for a greater number of frames than LCT [14]. On this sequence CT [17] tracker tracks the FIGURE 6. Results without predictor-estimator collaboration. 6(a) Shows qualitative analysis i.e. after the occlusion in frame number 234, LCT [14] misguides the Kalman filter based tracker. Kalman filter based tracker starts following LCT [14] and predicting false position.6(b) Distance precision plot for Walking2 sequence using Kalman filter based tracker which is taking measurement from LCT [14]. 6(c) Distance precision plot for Walking2 sequence using LCT [14] without incorporating Kalman filter based tracker.
target object more than any other tracker. Bustation3 video sequence selected from TColor-128 dataset contains severe occlusion and cluttering. Hence all the tracker loses the target very early while the proposed tracking scheme AFAM-PEC achieves the distance precision of 100%. On Bike3 video sequence all the trackers fail to track the target. The proposed AFAM-PEC achieves the highest precision of 38% among all. On Car4 video sequence again our proposed tacking scheme achieves the 100% distance precision outperforming the all other trackers. Similarly, on Car9 video sequence our proposed tracker and LCT [14] both gave the distance precision of 98% but CT [17] and STC [16] losses the target and gave the precision of less than 25%. Building3 sequence is relatively simpler without having much challenging aspects. So, all the trackers successfully track the target by achieving 100% distance precision. Mean distance precision is also given in TABLE 3. Our proposed AFAM-PEC achieves the highest mean distance precision of 85%, LCT [14] achieves the second highest mean precision of 54%, STC [16] achieves the third highest mean distance precision of 38% while CT [17] with lowest mean distance precision of 26%.
We first implemented Kalman filter based tracking algorithm, as Kalman filter is a measurement follower algorithm. The output of LCT [14] algorithm is given as measurement to Kalman to see the results. Fig. 4 shows the results for LCT [14] and Kalman based tracking algorithm. Kalman filter based tracking algorithm achieves distance precision of almost 100% but the LCT [14] and most of the traditional algorithms achieves less than 100%. This is because when object gets occluded, LCT [14] stop estimating correct position of the object and when the object comes out of occlusion LCT [14] tracker re-detects the target object as shown in Fig. 4. In our implementation Kalman based tracker continuously predicts the new state of the target object even during occlusion which increases the distance precision. To further investigate this behavior, these algorithms have been applied to other videos and the results are shown in Fig. 5 along with the distance precision plot. Frame number 62, 65 and 67 observe this behavior. Fig. 6 (Walking2 video sequence) represents another interesting fact that if the measurement (by baseline tracker in our case LCT [14]) given to Kalman filter is wrong then Kalman will predict the false state in next frame as it is a measurement follower. Now if the baseline tracker continues to give the wrong measurement to Kalman filter based tracker even after the occlusion of the target is over, Kalman filter will be predicting the false states and target will be lost. This phenomenon is clearly represented in Fig. 6, where after 234 th frame, baseline tracker misguides the Kalman filter and both the algorithm starts following wrong object. This behavior is corrected by proposing the algorithm which works by using the collaboration of the predictor and estimator. Table 3 shows the distance precision of 100% over this video sequence. To further strengthen our argument, Fig. 11 presents the peak to side lobe ratio and. This figure depicts that the proposed tracking algorithm achieves a higher PSR earlier than the LCT [14] algorithm. Distance precision plot of Jogging1, Jogging2 and Walking2 sequences are given in Fig. 4, Fig. 5 and Fig.6 respectively, which shows that the proposed tracking scheme outperforms the other three tracking algorithms.
Before qualitative analysis, let us analyze the overlap success rate metric. Fig. 9 shows the comparison of the proposed algorithm with each of LCT [14], STC [16], and CT [17]. Whereas, Fig. 10 gives the comparison of all four tracking algorithms on single plot. Table 4 shows the overlap success rate for various video sequences at threshold of 0.5. For Jogging1 video LCT [14] and proposed algorithm gives almost equal success rate of 97%. CT [17] and STC [16] gives success rate of 20% and 22% as they fail to track after occlusion. The algorithm proposed in this work achieves the success rate of 99% for Jogging2 video. Whereas, LCT [14] achieves 97% while other two algorithms achieve less the 20%. Similarly, on all the other remaining video sequences AFAM-PEC achieves the highest success rate, details are given in TABLE 4. Mean overlap success rate is also calculated. Proposed AFAM-PEC outperforms the other trackers by achieving mean overlap success rate of over 75%. LCT [14] achieves the mean success rate of 51%, STC [16] achieves 30%, and CT [17] achieves 20%.   [14], STC [16], and CT [17] on six challenging sequences selected from OTB50, OTB100 and TColor-128. First row to the last row: jogging1, jogging2, girl2, human3, walking2 and skating2 video sequences are presented respectively.
Proposed tracking scheme AFAM-PEC shows not much increase in computational cost while maintaining higher tracking efficiency in terms of mean precision and overlap success rate. For example, our proposed AFAM-PEC process 27.89 frames per second, whereas LCT [14] process 28.30 frames per second. It is difference of even less than one frame. Similar is the case on all the other videos. CT [17] and STC [16] loses the target in most of the video sequences, that is why TABLE 5 shows high FPS for CT [17] and STC [16].
For the qualitative analysis, the results of four trackers i.e. this paper (AFAM-PEC), LCT [14], STC [16] and CT [62], over eleven video sequences are presented in Fig. 7 and Fig. 8. Top to bottom rows of Fig. 7 contain Jogging1, Jogging2, Girl2, Human3, Walking2 and Skating2 sequences. Row wise  [14], STC [16], and CT [17] on five challenging sequences selected from TColour-128 and UAV-123 databases. First row to the last row: bike3, car4, car9, busstation, and building3 video sequences are presented respectively. analysis is given in this paragraph. In the first row all the trackers successfully track the object until frame number 71. While in frame number 79 the proposed algorithm is the only one to track the object exactly while all the others have window on pole instead of target object. At frame number 91, LCT [14] tracker successfully redetects the target. After this frame the proposed algorithm and LCT [14] successfully tracks the object till the end of video while other two algorithms fail to track after the occurrence of occlusion. Similarly, in the second row of Fig. 7, the proposed algorithm successfully tracks the object and redetect the object after occlusion, earlier than all the remaining three algorithms. LCT [14] shows second best behavior over this sequence by tracking the object successfully till the end, whereas remaining two algorithms fail to track the object when it reappears after the occlusion. The only issue with LCT [14] reported for Jogging1 and Jogging2 is the estimation of the position of object during occlusion. In the third row of Fig. 7, for Girl2 video sequence, all the trackers successfully track the object until occurrence of cluttering. It can be seen in frame number 98 after the cluttering that all the trackers are successful in tracking, but CT [17] fails to track. When the second challenge, associated with this video occurs i.e. full occlusion, all the trackers fail after the reappearance of the target object except the proposed tracker.
Tracker proposed in this paper successfully tracks the target object after occlusion which is visible in frame number 168. We run all the trackers over this video for 600 frames.  The fourth row of Fig. 7 contains the images from Human3 sequence, again at this sequence our proposed algorithm shows better results i.e. all the trackers lose the object in the start of sequence but the proposed tracker successfully tracks the object, which is visible in frame numbers 252, 301 and 410. All the trackers are tested over 590 frames of this sequence. In Walking2 sequence, the target object gets occluded at frame number 50 and after this frame, CT [17] and LCT [14] loses the target and starts following wrong object. STC [16] is still able to track the object after occlusion but fails to handle the scale properly, whereas our proposed scheme successfully tracks the object keeping the right scale. Last sequence shown in Fig. 7 is Skating2. Although not any tracker is able to track the object over full video sequence, but our proposed algorithm tracks the object over a greater number of frames than the state of the art i.e. LCT [14]. The proposed algorithm tracks the object until 125th frame and after this it starts drifting. whereas CT [17] tracks the object for the maximum number of frames. It can be seen in frame number 484 that all the trackers lose the target. First, STC [16] [14], STC [16] and CT [17] in figure 9(a), 9(b) and 9(c) respectively. Black lines represent the (AFAM-PEC), while yellow, green and red lines represent LCT [14], STC [16] and CT [17], respectively. Proposed scheme i.e AFAM-PEC clearly outperform all the other tracker on all the videos except skating2.
loses the target after this LCT [14] then our proposed tracker and at the end CT [17] tracker loses the target. Although our proposed tracker loses the target before CT [17] in this video sequence, but it has benefit of performing better then CT [17] on all the other videos which contain six challenges as per TABLE. 2 excluding fast motion. Top to bottom rows of Fig. 8 contain Bike3, Car4, Car9, Busstation, and building3 video sequences. These sequences are made using UAV. In Bike3 sequence target is relatively small as compared to other video sequences. In this video not any tracker is able to track the target correctly but still proposed tracker performs better and track the target correctly upto 43 rd frame. In second row of Fig. 8 Car4 video sequence is show. In this video our proposed AFAM-PEC and LCT [14] both track the target successfully till the end of video but proposed AFAM-PEC achieves better overall success rate which can be seen in frame numbers 270,274 and 600. STC [16] also able to track the target till the end of video but CT [62] fails to track the object after the occlusion, which is visible in frame number 235. In the third row of Fig. 8, all the trackers successfully track the car till the occurrence of occlusion in frame number 42. At occlusion LCT [14] struck while our AFAM-PEC successfully tracks the car even when it is occluded. After the occlusion STC [16] also fails to track correctly as per frame number 270. While proposed AFAM-PEC and LCT [14] track the object till the end of video sequence. Though visually it seems that both the trackers have similar performance but as per quantitative analysis our AFAM-PEC gives better distance precision and overlap success rate. In second last row of Fig. 8 AFAM-PEC outperforms all the other trackers  [14], STC [16] and CT [17]. Black lines represent the (AFAM-PEC), whereas yellow green and red lines represent the LCT [14], STC [16] and CT [17] respectively. FIGURE 11. Peak correlation score of AFAM-PEC and LCT [14] for jogging 2 video using blue and red color respectively. by successfully tracking the target after occlusion in frame number 51. All the remaining three trackers fail to track the object after occlusion in this video sequence. Last row of Fig.8 shows the building3 sequence. All the trackers successfully track the target because of simplicity of the video.

VI. CONCLUSION
In this study, adaptive correlation filter based tracking failure avoidance mechanism is presented. Kernelized correlation filter is used as baseline tracker. Failure avoidance mechanism is proposed and integrated with Kernelized correlation filter. In our proposed scheme we first address the occlusion detection problem by soughing two parameters form the response map i.e. i) peak to side lobe ratio ii) peak correlation value. These two parameters work together to detect the occlusion. Second, we incorporated Kalman filter based predictor and SVM based estimator to kernelized correlation filter. Third, we proposed collaboration module between predictor and estimator to avoid the tracking failure. We choose videos from the standard three datasets (OTB100, TColor-128, and UAV-123) having six challenging attributes to perform the experiments. With the help of experiments, it is shown by us that proposed approach (AFAM-PEC) performs better against state-of-the-art tracking algorithms in term of distance precision and overlap threshold. However, Kalman is continuously predicting the location of the object whether the object is is present in the frame or not. Efficiency of the tracker may be further enhanced by devising criterion to stop prediction of Kalman filter if the object is out of view/ occluded for some specified time.
KHIZER MEHMOOD received the B.Sc. degree in electronics engineering from International Islamic University Islamabad (IIUI), Islamabad, Pakistan, in 2010, and the M.Sc. degree in electrical engineering from the University of Engineering and Technology at Taxila, Pakistan, in 2013. He is currently pursuing the Ph.D. degree with International Islamic University Islamabad. His areas of interests are computer vision, visual object tracking, and image processing.
MARIA MURAD received the bachelor's degree in telecommunication engineering from the University of Engineering and Technology (UET), Peshawar, Pakistan, in 2010, and the master's degree in electrical engineering from the UET at Taxila, Taxila, Pakistan, in 2012. She is currently pursuing the Ph.D. degree in image processing with International Islamic University Islamabad (IIUI), Islamabad, Pakistan. She has been working as a Lecturer with International Islamic University Islamabad, since April 2011. Her research interests are signal and image processing.
HAMDAN AWAN (Member, IEEE) received the Ph.D. degree in computer science and engineering from the University of New South Wales (UNSW), Sydney, Australia, in August 2017. He stayed with York University, Canada, as a Post-Doctoral Fellow, for two years, from December 2017 to December 2019, where he worked on the DARPA'S Radio Bio Project. In December 2019, he joined the Telecommunications Systems and Software Research Group, Waterford Institute of Technology, Ireland, as an H2020 Research Fellow, where he is working on the FET-Open Gladiator Project. He has so far published more than 25 research papers in highly selective IEEE TRANSACTIONS and the IEEE/ACM conferences. He has also co-authored the article that received the Best Paper Award from the IEEE/ACM NanoCom Conference, in 2018. His major research interests include molecular communications, nano-networks, information theory aspects of biological communication, and computer vision. He has also been a recipient of Post Doctoral Writing Fellowship after the Ph.D. degree at the UNSW, for three months, from September to November 2017.