Robust Visual Tracking With Occlusion Judgment and Re-Detection

A detection algorithm is often used to remedy tracking failures in a typical single-target visual tracking algorithm. In practice, when a target is occluded for a long time, neither the tracking module nor the detection module can accurately predict the position of the target. To accurately locate the target, we first introduce the $\ell _{1}\ell _{2}$ loss function to reduce the sensitivity of correlation filter-based method to local occlusion. To solve the instability of algorithms based on the single feature in complex scene, we use the histogram of oriented gradient (HOG) features and color names (CN) features to train a filter respectively, and the fusion weights are calculated according to the difference between the response value of each filter and the expected response value. At the same time, we adaptively update the model online by calculating the sensitivity of different filters. We follow the re-detection idea in long-term tracking, the peak to sidelobe ratio (PSR) is used to judge the serious occlusion, and we use support vector machine (SVM) for re-detection after severe occlusion or target out-of-view. In this paper, 34 sets of sequences are selected to evaluate the proposed algorithm. The sufficient experimental results demonstrate that our algorithm has strong anti-occlusion ability and robustness performance. We compare our proposal with several state-of-the-art algorithms under all the sequences of OTB100, and our algorithm yields highly competitive performance for tracking.


I. INTRODUCTION
Visual tracking is one of the most basic problems in computer vision. In brief, given the target information of the first frame, the visual tracking task evaluates the target location in the subsequent frames. It is used in wide-range scenarios, such as vehicle navigation, human-computer interaction, automatic surveillance, to name just a few.
Although visual tracking has been studied for a few decades and great progress has been made, it is still a tough task to design a robust and efficient tracker due to difficulties for both foreground and background variations. Currently, the mainstream method of visual tracking is the discriminative correlation filter (DCF) method combined with deep The associate editor coordinating the review of this manuscript and approving it for publication was Abdullah Iliyasu .
learning [1]- [5]. But the tracking algorithms based on the traditional correlation filter (CF) have achieved top-ranked performance and drawn increasing attentions because of their superior computation and fair robustness to photometric and geometric variations [6]- [8]. The CF-based trackers convert the correlation operations in the spatial domain to the element-wise multiplications in the frequency domain, which substantially improve the complexity and the tracking speed [9]- [12].
Some evidences show that occlusion and deformation are still the two most difficult problems in visual tracking [13], [14]. When a slight and simple partial occlusion happened to a target during the visual tracking, the target appearance changes slightly, which induces a small error. In such scenes, robust tracking can be performed by some spatially regularized methods or reliable patch model [9], [10], [15]. Nevertheless, a severe occlusion or target out-of-view occurs, the error caused by occlusion and disocclusion tends to accumulate, which will be very large. When the target is out of sight, the appearance of the target is completely invisible, which results in the inability to obtain the appearance. The existing tracking methods cannot achieve a good performance for intricacy local occlusion challenge [15]- [17]. Some long-term tracking algorithms introduced re-detection strategies to solve the problem of the target out-of-view [18]- [20]. These methods use a global search or re-detection strategy to retrieve the target. Therefore, they also work well when a target is heavily occluded.
In this work, we follow the idea in long-term visual tracking and learn a support vector machine (SVM) detector based on [21] for re-detection. Considering the sensitivity of the conventional CF-based trackers to local occlusion, we introduce the 1 2 loss function. We combine the histogram of oriented gradient (HOG) feature and color names (CN) feature with the CF method to train two different models, and weight them based on the responses for a robust tracker. Meanwhile, we develop an occlusion judgment strategy and an adaptive online model update strategy for robust visual tracking, that will be carefully discussed in Section III.
The contributions of this paper can be summarized in the following three aspects: 1. Unlike other CF-based methods, we introduce 1 2 loss function to reduce the sensitivity to local occlusion, which is a great challenge for object tracking.

A new visual tracking framework combined with
re-detection is proposed to perform robust tracking, including a novel adaptive online model update strategy based on feature fusion. It iteratively conducts adaptive learning on a variety of features and can adapt to a variety of challenging scenarios. 3. We carefully select 34 sets of video frame sequences from the OTB dataset containing 11 interference factors, including the illumination variation (IV), scale variation (SV), occlusion (OCC), etc., to prove the advantage of our proposal. The sufficient results show that the proposed algorithm has a satisfactory antiocclusion ability and robustness.

II. RELATED WORK
In this work, we conduct our proposal based on the CF method combining the HOG features and the CN features.
In this section, we revisit the related works, including: A) correlation tracking, B) tracking-by-detection, and C) judgment occlusion based on the peak to sidelobe ratio (PSR) [22].

A. CORRELATION TRACKING
Correlation filter originated from the field of signal processing has been widely used in the field of target detection and recognition. In visual tracking, given the datasets, a correlation tracker trains a filter to recognize the target in the subsequent frames. Bolme et al. [17] first introduced the cor-relation filter into visual tracking. Henriques et al. [23], [24] further improved the performance of the CF-based tracking by using approximate dense sampling, ridge regression, and kernel trick. Subsequently, Danelljan et al. [25] and Dai et al. [10] respectively learned a specific regularization term to punish the filter with respect to a large background response and an adaptive spatial regularization to punish the spatial constraints. Danelljan et al. [9] improved tracking performance to a new level by learning continuous convolution filters for visual tracking. To further deal with the scale variation problem, three CF-based trackers, namely, the SAMF [26], DSST [27] and RAJSSC [28], have achieved good effects concerning the accuracy and real-time performance. With the recent development of increasingly more CF-based trackers [29]- [32], they have proved their good abilities with respect to their robustness. However, for severe occlusion and out-of-view challenges, these algorithms do not achieve the ideal performance.

B. TRACKING-BY-DETECTION
To alleviate the stability-plasticity dilemma with respect to online model update in visual tracking, Kalal et al. [33] proposed the mechanism by combining the tracking and detection, which helps perform long-term tracking. Ma et al. [18] decomposed the task of tracking into translation and scale estimation of object, and they also trained an online random fern classifier to re-detect objects in case of tracking failure. Hua et al. [34] trained an additional redetector for a significant geometric change of the object. As [35] stated, the observation model and feature extractor place important roles in visual tracking. Therefore, researchers have improved the performance of the feature extractor by selecting features [36], [37] and fusing the features [38]. Supancic and Ramanan [39] used self-placed learning to select reliable frames to extract additional training data as it progresses, which is more effective than a strong motion model. Overall, the tracking-by-detection mechanism is helpful for the longterm occlusion as well as the significant appearance change.
In our proposal, we train an SVM detector based [21] to deal with long-term occlusion as well as the significant appearance changes.

C. JUDGMENT OCCLUSION BASED ON THE PSR
We use the PSR which is mainly proposed to evaluate the compressed radar signal after compression to assist in detecting occlusion [22], [40]. The PSR is calculated as follows: where max represents the maximum value of the main lobe peak, µ represents the mean value of the Gaussian response map and σ represents the standard deviation of the Gaussian response value. When a target is occluded by the background, the Gaussian response value of the target will have multiple peaks that are exceeded by the other peaks, which will result in tracking failure. The difference between the response map VOLUME 8, 2020 of two consecutive frames is very small, which implies that the temporal information of the video frames can help the tracker to more accurately locate the target. The temporal information between video frame sequences can be represented by the sensitivity value of the PSR.
where P n means the average PSR of the response maps of n consecutive frames. Therefore, serious occlusion and partial occlusion are separated by the PSRs, and the sensitivity of the information of the temporal context is used to address the visual tracking problems except for serious occlusion.
In this paper, we combine the judgment occlusion and the re-detection strategy for severe occlusion.

III. THE TRACKING METHOD USING PSR-BASED OCCLUSION JUDGMENT AND SVM RE-DETECTION
In this section, we will detail our method in four aspects: A) 1 2 loss function, B) occlusion judgment strategy, C) re-detection strategy, and D) implementation details.

A. 1 2 LOSS FUNCTION
Given the feature map, a correlation tracker aims at learning the filter weights to regress the Gaussian label. A classical model based on a correlation filter solves ridge regression problems as follows: where f (x i ) is the regression function obtained by training the mapping function ϕ (x i ) of the feature space, y i ∈ R n×n is the desired Gaussian-shaped response, λ is the regularization term coefficient, and w is the filter template, || · || is the 2norm. The goal is to find a function f (x) = w T ⊗ x, where ⊗ is the correlation operation, that minimizes the error between the regressions to target y i and f (x).
When the target appearance significantly changes, the error in some feature dimensions may be very large, that's the reason resulting in the instability of the mean square error. Therefore, to improve the performance of the CF-based method which is sensitive to local occlusion and allow large errors to occur when the appearance significantly changes in filter learning process, we replace the loss function in conventional CF-based method using 1 2 -loss function with an appropriate sparsity [41]. Therefore, equation (4) is converted into the following formula: (5) where the λ and τ are the weight parameters. Equation (5) can be split into two subproblems, both of which have globally optimal solutions. Therefore, the problem in (5) can be solved by alternately optimizing the two subproblems until the objective function values converge. Where e i is the difference between the regression values and the expected response value. Its calculation formula is as follows: When we alternately optimize the two subproblems, we need to use the dual space to minimize ||f (X ) + e − y|| 2 2 +λ| |w| | 2 2 according to w. We denote the dual conjugate of w as α, The problem with respect to α has a closed-form solution denoted asα =ˆy −ê denotes the inverse Fourier transform,ŷ is the expected response in the Fourier domain, is the Hadamard product, k 1 denotes the first row 2 of the kernel matrix K and ξ is a shrinkage operator, defined as:

B. OCCLUSION JUDGMENT STRATEGY
In this paper, the PSR of the fusion response maps is used to judge the occlusion or out-of-view condition of each frame. Then, we process the frame according to the judgment result. When the target is severely occluded, the target appearance is mostly contaminated. At this moment, the PSR of the response map obtained by the correlation operation using the target appearance model is always low. Over time, the target in Fig.1 (c) has been seriously occluded by the environment, as reflected by the PSR = 5.957 in the 109th frame. At this point, the peak around the target is prominent, even exceeding the target peak, which causes the tracker's prediction to finally contain a target position error, and thus tracking failure occurs. Therefore, the PSR can correctly reflect the state of the target and is also an effective indicator for judging whether the target is occluded.
Due to the large variation range of the PSR, in order to find a better occlusion judgment criterion, this paper normalizes the PSR of the Gaussian expected response of the first frame of the video as a reference value, and the formula is as follows: where η is the similarity coefficient of the current frame's PSR and the reference PSR. When flag = 0, it is considered that the target being tracked is not occluded or is partially occluded; and when flag = 1, the target is severely occluded. At this moment, it is necessary to judge whether the target appears again and whether the model needs to be updated by the re-detection method.

C. REDECTETION STRATEGY
When the target is severely occluded or even fully occluded, the appearance information of the target is difficult to obtain. In this situation, it is not very reliable to use the tracker to FIGURE 1. In (a), the target is not occluded, and the PSR is always in a range of larger values, such as in the 77th frame when the PSR = 13.98. In (b) the target is partially occluded by the background, and the PSR = 9.859 in frame 106. Here, the peak of the response map begins to weaken, and there is a multi-peak situation, but the target peak has not been exceeded. Over time, in (c), the target has been seriously occluded by the background information, and the PSR = 5.957 in the 109th frame. In (d), the curve denotes the variation of the PSR through the girl2 sequence.
predict the position of the target. When the target appears again, the re-detection strategy can help tracker to detect the position where the target appears, that is, the object detection algorithm can determine whether the target is occluded, and can also obtain the candidate target position. The SVM is essentially a classifier that distinguishes between a target and the environment. Therefore, this paper uses an SVM classifier to judge if the target has reappeared in view and determine the candidate position of the target, which is very helpful for the subsequent tracking of the target occlusion.

D. IMPLEMENTATION DETAILS
The algorithm is mainly improved using LCT algorithm [18] and the Struck algorithm [21]. Here, the (x 0 , y 0 ) is the target initial position; target sz denotes the target scale; X t is the target status, which includes the position (x t ,ŷ t ) and scalê s t at time t; CF i is the appearance model; CF s is the scale model; ⊗ is the correlation operator; * is the multiplication operation; and HSvm denotes the SVM classifier. The whole algorithm consists of three parts: the prediction of the position of correlation filter model, the scale correlation filter model, and the SVM classifier. The correlation filter model for the prediction of the target position uses different features for training and updating. The HOG features and the CN features of the target are first extracted and denoted as feats, which are used to train two correlation filters, then we get two response maps using different filters CF i (i = 1, 2). And we fuse them as follows: where r t denotes the final fusion response map; r i is the response maps obtained by the correlation operations of the two features and corresponding filters; feat i denotes i th feature and CF i is the corresponding model; m i is the weight of r i , calculated as follows: To get m i , we calculate the d i in (12), which is the difference between the expected response value and real response value of the i th filter; ⊕ represents the shift operator; denotes that translating the peak of r i to the center of response map. We use the same method to get the weights calculated by the sensitivity method of the PSR are used to adaptively update the template. To obtain a more robust model, we use adaptive learning rate γ for per frame, such as follows: In this way, we can get a robust model. Subsequently, we get the accurate position prediction in the sequences. At the same time, the scale correlation filter is similar to the position correlation model, but the training of the model uses only the HOG features of the target. The number of scale pyramid layers is 33 layers, which is used to predict the scale of the target. The SVM detector is specifically designed to address the problem when the target is severely occluded. We minimize the convex objective to learn re-detector: where is the kernel map; δ i (t) = (feats i , t i ) − (feats i , t); (feats i , t i ) specifies the correct transformation of the object; w is the weight vector learned by the SVM; C is the coefficient of ξ i , following the Struck [21], we set C = 100; ξ denotes the slack variable in SVM. As a matter of fact, the HSvm learns a map f : F × T → R, F is the feature space, T is the transformation space. In our HSvm re-detector, the feats i are features sampled from the reliable location for re-detection. The procedure of re-detection is as follows: First, the occlusion is judged by the PSR of the response map of the target in the current frame. Then, we get the candidate position of the target through the SVM detector. Finally, the response value of the optimal position detected by the SVM detector is compared with the response value of the position predicted by the correlation filter, which then determines whether the target is out of occlusion, if it is, we will locate the position. Therefore, the model obtained through iterative online incremental training has better robustness.

Algorithm 1
The Proposed Algorithm 1 Input:(x 0 , y 0 ), target sz . 2 Output: X t =( x t , y t , s t ), CF i , CF s , HSvm. 3 Repeat: 4 Extract the feats (HOG, CN) from (x t−1 , y t−1 ) in the last frame. 5 Yield response map: Calculate the sensitivity S of the PSR by (2), the final position (x t , y t ) and final response value r t by (9); 7 Obtain the target size: s = feats x t ,y t ⊗ CF s , s t = target sz * ŝ t ; 8 if flag then: 9 r i , (x i , y i ) = HSvm(feats); 10 r max , (x max , y max ) = max (r i , (x i , y i )) ; 11 if r max > r t then: 12 Update CF i ; 13 else: 14 (x t , y t ) =(x t−1 , y t−1 ); 15 end 16 end 17 Update CFs and HSvm; 18 Until end of video frame sequence;

IV. EXPERIMENT RESULTS AND ANALYSIS
To prove the robustness and real-time performance of our proposed algorithm in the case of severe occlusion, we select six sets of video frame sequences with severe occlusion in the experimental verification of our algorithm. The results are compared and analyzed with those of two long-term tracking algorithms (TLD: tracking learning detection algorithms, and LCT: long-term correction tracking) and the Struck (structured output tracking with kernels) algorithm. The six test video frame sequences are coke, tiger2, girl, basketball, lemming, and liquor, which are 640 × 480, 640 × 480, 128 × 96, 576 × 432, 640 × 480, and 640 × 480 pixels, respectively. The number of frames is 4,958. The proposed algorithm is implemented using MATLAB on a win10 x64 computer with a 3.60 GHz i7 processor and 8GB of memory.

Experiment 1:
This experiment provides the results of different algorithms when the moving target is occluded. To verify the tracking performance of the algorithm when the moving target is occluded, the standard test video frame sequence ''coke'' is used for testing. The results of each algorithm are shown in Fig.2. The TLD, Struck, and LCT algorithms and ours can correctly predict the position of the target in Fig.2 (a) in which the target is not occluded. In Fig.2 (b), the moving target is severely occluded by the leaves. The TLD algorithm experiences a tracking failure, and the detection module scans the entire image via a cascade classifier consisting of a variance classifier, a random fern classifier, and a KNN classifier. Since the target is mostly occluded by  the environment and the detection time is so long, the confidence of the target position is very low. The Struck algorithm uses the prediction function learned by the online structure output SVM to predict the change of the target position and avoids the intermediate classification phase. The LCT algorithm sets a fixed threshold according to the response value of the target appearance model should determine whether the target is occluded and whether the target appearance model is to be updated. In this section, our algorithm uses the PSR of the current frame of the response map to judge whether the target is severely occluded for re-detection. The model judges whether the target is partially occluded prior to adaptive update according to the peak value of the current frame. In Fig.2 (c), both the TLD and LCT algorithms drift, and the Struck algorithm and ours can locate the target well.
Experiment 2: The target with deformation is occluded. To verify the performance of the proposed algorithm when the moving target is deformed, the video frame sequence ''basketball'' is used for testing. Fig.3 (a) shows the state of the target when it is not occluded by the environment. The TLD, Struck, LCT, and proposed algorithms can correctly locate the target. When the target is in the 19th frame, the target is deformed due to the pose change. Besides, it is severely occluded by another player's body. The TLD algorithm uses the pyramid optical flow algorithm to track the object to predict the target's motion direction. Due to the interference of the occluded object's motion, when the target appears in Fig.3 (c) again, the tracker treats the wrong moving object as the tracking target, resulting in a target tracking drift. The Struck algorithm uses the prediction function to predict the change of the target position between the current frame and the previous frame. Since the target is severely occluded, when the target reappears in Fig.3 (c), the accumulation of the position change error causes the tracker to no longer adapt to subsequent tracking, eventually resulting in tracking failure. The LCT algorithm utilizes the spatial-temporal context, and the model tracks the target. Therefore, when in Fig.3 (c) 42 th frame is reached, both the LCT algorithm and ours can accurately track the target. Experiment 3: The stationary target is occluded. To verify the performance of the proposed algorithm when the stationary target is occluded, the test video frame sequence ''liquor'' . Contrast test when the stationary target is occluded. In (a), the 720th frame shows that the target is not occluded by the environment. In (b), the target to be tracked is completely occluded by the background information, eventually losing the features associated with the tracker, causing the tracker to fail to find the target. The target is out of occlusion in (c).
is used for the test. The results of each algorithm are shown in Fig.4. (a) show that the target is not occluded by the environment. The TLD, Struck, LCT, and our proposal can correctly track the target. When tracking in the 728th frame in (b), the target to be tracked is completely occluded by the background information, eventually losing the features associated with the tracker, and causing the tracker to fail to find the target. The TLD and the Struck algorithms have not yet determined that the target is occluded, and so the occluded object is mistakenly regarded as the tracking target. Conversely, the LCT and the proposed algorithms successfully determine that the tracking target is occluded. In the 733th frame (the target is out of occlusion), the TLD and the Struck algorithms continue to erroneously track the wrong object, causing severe drift, which eventually leads to a tracking failure. The LCT algorithm and our algorithm can accurately track the target again.

B. COMPARATIVE ANALYSIS OF EXPERIMENTAL RESULTS
To better illustrate the superiority of the proposed algorithm, we compare it with related algorithms. The proposed algorithm, TLD, Struck, and LCT are tested using 6 different video frame sequences, and then we compare the results of the different algorithms for each sequence of video frames. We do not make any modifications to the parameters of the existing algorithms. The overlap precision (OP) and the distance precision (DP) of the above four different algorithms are shown in Table 1.
We follow the common tracking and judging criteria according to the target. 1) Success rate: First, the overlap rate (score) is calculated according to each frame's predicted area (ROI t ) and manually labeled area (ROI gt ). score = ROI t ROI gt ROI t ROI gt (17) Then, after setting different overlapping thresholds, the success rate under the different overlapping thresholds is statistically calculated. Finally, the final success rate (the mean) is Success rate and accuracy. The first column is the sequence of the video frames and the total number of frames. The following columns are the success rate and accuracy of the different algorithms in different video frame sequences, where red represents the best result, and blue is the SECOND-BEST result. ''-'' indicates that the algorithm's result in the video frame sequence is too bad, and so the data will be ignored during the statistical analysis to avoid any adverse effects on the overall results.
obtained by calculating the area under the curve or area under curve (AUC). 2) Accuracy: First, calculate the weighted average of the distance between the prediction center position and the ground truth position, which is the center location error (CLE). Then, calculate the accuracy according to the different error thresholds. Finally, the position error being no greater than 20 pixels is taken as the final precision of the precision map. Table 1 shows that the overall tracking performances of the TLD and the Struck algorithms are not so good, and the proposed algorithm's tracking performance is the best, followed by LCT. In the 6 video frame sequences, the TLD algorithm performs well only in the girl sequence because the other 5 video frame sequences are also affected by different factors such as deformation, illumination, scale, etc. The TLD algorithm uses the pyramid optical flow method that cannot conduct accurate visual tracking in complex scenes, and so the robustness is not so good. The Struck algorithm uses only Haar to calculate the integral map of the feature, and cannot stably predict the position of the target under the influence of various factors, especially when the target is occluded. Struck still takes the predicted value with the highest probability as the correct location, and the accumulation of wrong tracking information eventually leads to errors. The LCT algorithm judges the occlusion of the target through the response value of the target appearance model, which can avoid the wrong learning of the model to some extent. Therefore, it can stably track the target in a long target tracking time. However, since the LCT uses only the gray feature of the target and its illumination invariant gray feature to train, when the target pose changes or there is similar object interference (such as in the tiger2 and liquor video frame sequences), the tracking performance is not ideal. Our approach uses the idea of LCT to train using multiple features fusion and uses the SVM redetection strategy to track the target in a complex scene with occlusion. Compared with LCT, the comprehensive performance has a certain improvement, which is mainly due to the selection of the target features and the PSP-based occlusion judgment strategy in the model training process. Center location error and FPS. The proposed algorithm has the smallest center position error in the ''coke'', ''tiger2'', ''girl'', ''basketball'', and ''lemming'' video frame sequences, and the effect for the ''liquor'' sequence is worse than that of the LCT algorithm. However, the average center position error is the smallest at 11.91 pixels. In ''tiger2'', the FPS is the largest at 13 FPS. In ''coke'' and ''basketball'', the FPS of the LCT algorithm is the largest, and it is followed by the proposed algorithm. In ''girl'', ''lemming'', and ''liquor'', the FPS of TLD is the largest, and the proposed algorithm of this paper is second. the LCT has the largest average frame rate at 19.84 fps, and our algorithm in this paper is the second at 16.68 fps. Therefore, the comprehensive performance of the proposed algorithm is the best. Table 2 compares the CLE and the FPS (Frames Per Second) of four different algorithms. (The real-time performance of the algorithm is calculated using the total time consumption of the tracking algorithm and the total number of frames of the video sequences). The larger the FPS is, the higher the real-time performance of the algorithm. The first column of Table 2 represents different video frame sequences, the other columns represent the CLE and the FPS of the different algorithms, and last row represents the average CLE and average FPS of the different algorithms. The best result for each line is marked in red, and the second-best result is marked in blue. Table 2 provides the qualitative analysis of the TLD, Struck, and LCT algorithms, and ours. To better illustrate the tracking performance of the proposed algorithm in various complex scenarios, this paper quantitatively analyzes the selected 34 video frame sequences. The experimental results are shown in Fig.5. In Fig.5, we can see that compared with other tracking algorithms, the success rate and accuracy of the proposed algorithm are the highest at 0.625 and 0.764, respectively; therefore, the robustness of the proposed algorithm is the best.

C. STATE-OF-THE-ART COMPARISON ON OTB100
To be more persuasive, we compare our algorithm with other state-of-the-art methods related to this paper on all sequences in OTB100. It has 100 video test sequences, which contain 11 challenges, including illumination variance, scale variation, occlusion, deformation, motion blur, fast motion, in-plane rotation, out-of-plane rotation, out-of-view, background clutter and low resolution. We follow the two criteria, the success rate and the precision, which are described in detail in Section IV.B. To evaluate the performance, we compare our algorithm with 7 state-of-the-art tracking methods relevant to our proposal, including LMCF [42], SRDCF [25], Staple [7], STAPLE_CA [43], Struck [21], TLD [33], LCT [18]. In general, the precision and success rate of our VOLUME 8, 2020 algorithm are 1.8% and 0.7% higher than those of the second best algorithms in terms of the two indices. In the comparison with Struck, our algorithm achieves 17% and 16.1% improvements in precision and success rate, respectively. Compared with LCT, we can see that our algorithm obtained 2.2% and 0.7% improvements in precision and success rate, respectively. As shown in Fig.6, our method is superior to the compared algorithms in both precision and success rate under all sequences of OTB100, which demonstrates the highly competitive performance of our proposal.

V. CONCLUSION
In this paper, we propose an occlusion judgment tracker based on the CF framework. To solve the instability of the algorithms based on the single feature in complex scenes, we extract the HOG and CN features to train the model and track the target. Therefore, our algorithm has the advantage that the traditional method does not have before and after occlusion. Furthermore, we introduce the 1 2 loss function to reduce the sensitivity of CF-based methods to local occlusion. Next, we propose an adaptive online model update strategy based on the sensitivity value S of the PSR to get a robust appearance model. As a complement of the target out-of-view, the PSR is used to determine whether the target is severely occluded, and then detect the disocclusion by the SVM. According to the experimental results, compared with the existing related algorithms, our algorithm has certain advantages and robust performance. However, with respect to deep learning-based tracking algorithms, the accuracy of our algorithm needs to be improved. This is mainly because the traditional features are not comprehensive enough in the training and learning of the model.