A Novel Incremental Multi-Template Update Strategy for Robust Object Tracking

In the field of correlation filter object tracking, the traditional template-update method easily causes template drift, so it performs poorly in complex scenes. To enhance the robustness of the template, a novel incremental multi-template update strategy is proposed in this paper. We find that reliability varies among all historical filters and that highly reliable filters are key to achieving accurate tracking. The incremental multi-template update strategy combines the local maximum-reliability filter template with the historical filter template incrementally, which is obviously different from the traditional update method. We apply this strategy to two trackers with superior performance. The experimental results of three test benchmarks, including the VOT2016, OTB100 and UAV123 datasets, show that the performance of our trackers is superior to that of the state-of-the-art trackers.


I. INTRODUCTION
Object tracking an important area in the field of computer vision, and it is also one of the most difficult areas to study. Object tracking refers to selecting an object in an image or video sequence and then determining the position, movement trajectory, and morphology of the object in subsequent sequences or frames according to the characteristics of the object [1]. Object tracking has been widely used in various fields, such as intelligent security, unmanned driving, smart home, and robot navigation [2]- [6]. With the advent of 5G, the prospects of object-tracking applications will be even more extensive, and the applications of object tracking will certainly bring new surprises.
In recent years, experts have proposed many methods for tracking objects. These tracking methods can be divided into two types: generative methods and discriminative The associate editor coordinating the review of this manuscript and approving it for publication was Jeon Gwanggil . methods [7]. A generative method refers to describing an object to be tracked in a video through an objectrepresentation method in computer vision and then extracting a corresponding object feature from a current frame containing the object to establish an object template [8], [9]. The method then searches the subsequent frames for the area most similar to the object template and gradually iterates to finally achieve the positioning and tracking of the object in the subsequent frames [10]- [17]. A discriminant method refers to applying both the object template and background information to the tracking system. The method sets the feature information of the object as positive samples and the background information as negative samples. Then, machine-learning methods are used for classification training. Next, the discriminant method uses the previously trained classifiers to make classification judgments in subsequent frames and determines the optimal object area [18]- [22]. The discriminative method is more accurate than the generative method because it takes both the background and foreground into account [21], [23]- [25]. Bolme et al. introduced the method of correlation filtering into the discriminant method, which greatly improved the accuracy of the discriminant method [26]. High-speed tracking with Kernelized Correlation Filters (KCF), proposed by Henriques et al., increased the richness of samples by introducing cyclic matrix displacement and improved the real-time performance of the tracking algorithm by using nuclear-related methods for training and detection in the frequency domain [27]. Martin et al. proposed efficient convolution operators for tracking (ECO) [28], which greatly improved the robustness and accuracy of object tracking and effectively reduced the amount of computations in the tracking process through convolution operations of factorization, sample space generation by compression, and sparse updating of the template.
Template updates are a critical part of the tracking performance in the correlation filters tracker. Numerous scholars have attempted to update templates [29]- [35]. Niu et al. proposed that the template be regularly updated unless no occlusion occurs [36]. However, it is difficult to accurately judge occlusion, so it is easy to omit information in this update method. Wang et al. proposed that the CF template be updated unless the HOG and color score are both above the average and that the histograms of the foreground and background as well as the sample templates used for redetection should be updated as well [37]. However, this method has a poor effect when the target and background information are similar. Yuan et al. proposed an adaptive structural convolutional filter model (ASCT) to enhance the robustness of deep regression trackers [4]. ASCT can effectively improve the robustness of the tracker through adaptive weighted fusion of local filters. Feng et al. proposed using 3 classifiers to simultaneously track the target and update the models of different classifiers according to different environments [38]. However, this update method greatly increases the amount of computations.
The importance of different historical templates has not been considered in previous DCF algorithms [39]- [41]. However, the reliability of object tracking is different at different times in the tracking process, so we believe that the reliabilities of the templates generated in the process of tracking are different. Differently reliable templates have different effects on object tracking. To improve the robustness of the correlation-filtering tracking algorithm, a multi-template adaptive updating strategy is proposed. In this strategy, we treat different historical templates with different reliability. The contributions of this strategy mainly include the following three points: 1) The reliabilities of historical filters are analyzed.
We make an important assumption: each historical filter has a different reliability. If this assumption is true, then the reliability of the filter will certainly be reduced when occlusion occurs. Through analysis, we find that high-reliability filter templates are the key to achieving accurate tracking.
2) We propose a novel multi-template update strategy. In contrast to the traditional updating methods, this strategy innovatively merges the local maximum-reliability filter into the tracking-filter template and can greatly enhance the robustness of the template. Furthermore, though memory is added only to a filter algorithm, the tracking performance can be improved. 3) An adaptive-learning rate, which is set by the degree of occlusion of the object, is used to update the tracking-filter template. The degree of occlusion of the object can be calculated by the reliability difference of adjacent frames. The adaptive-learning rate can overcome the template drift to some extent in complex scenes, such as occlusion.

II. RELATED WORK
In this section, we introduce the improved method of the template into the related previous field of filtering target tracking.
The quality of the template determines the performance of the tracker. Previously, modification of the tracker template mainly included template updating and feature fusion. Many trackers have greatly improved the tracking performance by improving the classic tracker template.
First, we introduce related work on template construction. MKCF [42] proposed an approach to fusing the features of different types by means of multiple-kernel learning. MKCF makes full use of the discriminative invariance of the power spectra of various characteristics through multiple-kernel learning, thereby effectively improving the performance of the tracker. MKCFup [43] [44]. MFFT uses an adaptive weighting function based on the response map of different characteristics to eliminate the noise of different characteristics. This method makes full use of the positioning information in different features and effectively improves the robustness of tracking and hence the performance of the tracker. Yuan et al. proposed a target-focused convolutional regression model for visual object-tracking tasks (TFCR) [6]. TFCR uses the target-focus loss function to effectively reduce the influence of background noise on the response map of the correlation filter tracking image frame and improves the tracking accuracy. The MvCFT [45] tracker performs tracking by fusing multiple view models and selecting highly distinguishable functions. MvCFT effectively improves the robustness of the tracker by making full use of the characteristics of different features.
We now introduce the related work of template updating. Bibi et al. proposed Multi-Template Scale-Adaptive Kernelized Correlation Filters (MTSA) [46]. MTSA shows that it is possible to incorporate multiple multiple dimensional templates to compute the optimal filter taps. By reformulating the kernel-correlation problem and by using fixed-point optimization, MTSA demonstrated that using multiple templates improves the performance of a baseline tracker that uses only one template and a heuristic update scheme. Wang et al. proposed a preserving multi-cue correlation filter tracker (MCCT) [47]. During the tracking process, MCCT selects the most reliable of many experts as the tracking result and updates the template adaptively according to the robustness of the tracking.
Although previous template-transformation methods have effectively improved the performance of the tracker, these transformation methods ignore the importance of different historical templates. These trackers generate a filter template for each frame during the tracking process, and the reliabilities of these filter templates are different. If these templates are simply linearly integrated into the tracker template, then it is easy to cause pollution of the tracker template. Therefore, we believe that timely fusion of the filter template of the highly reliable historical frame into the current tracking template can enhance the robustness of the tracker when the template is updated in a complex environment.

III. BASELINE APPROACH: ECO
In this work, we adopt the recently introduced Efficient Convolution Operators for Tracking (ECO) as our baseline. The ECO algorithm is an excellent representative of correlation-filter tracking algorithms. The performance of the ECO algorithm is comparable to that of many excellent recent deep-learning tracking algorithms.
The ECO algorithm analyzes previous related filtering and tracking algorithms, summarizes three factors affecting the object-tracking performance and proposes corresponding solutions. These three factors are the template size, the training-set size, and the template update. The corresponding solution of ECO is as follows: (1) ECO uses the factorized convolution operator method to reduce dimensions by extracting feature subsets.
The ECO algorithm decomposes the filters of D channels into a product of a D * C matrix P and the filters of C channels, where D C. In this way, calculating the score according to the filter (f) of the corresponding layer can simplify Equation (1) into Equation (2) where J{x} is the result of the feature map converted into a continuous domain by interpolation. The matrix P can be learned in the first frame. Therefore, the objective function is: ECO transforms the original D-dimensional filter into a C-dimensional filter through a factorized convolution operator, which greatly reduces the computational complexity.
(2) The ECO uses the method of the generative sample space template to merge similar samples into a component. ECO training samples are then selected from multiple components.
The ECO algorithm classifies the samples by a Gaussian mixture template (GMM) to obtain different components, each of which has a weight and a feature mean. A new sample is given when component fusion is updated, and a new component m is initialized. Here π m = γ , µ m = x j , where γ is the learning rate. If the number of templates exceeds the limit L, then the template with the weight π l below the threshold will be discarded; otherwise, the two closest components k and l will be merged into a common component n. The merging method is as follows in Equations (4) and (5): Therefore, the objective function can be simplified as: ECO uses this method to simplify the training set and guarantee sample diversity.
(3) ECO sets the update interval (Ns) of the template to 6. The update method is: where F t is the current frame template. F t-1 is the template of the previous frame and µ c is the learning rate of the current frame.

IV. OUR STRATEGY
In this paper, ''multi-template'' refers to the filter template of the current frame, the tracking template of the previous frame and the filter template with the highest local historical confidence. Among these, the filter template with the largest local history confidence is the incremental template. We believe that the more reliable the tracking template, the higher the accuracy of tracking the target. Moreover, the reliability of the filter template close to the current frame t has a more significant impact on the tracking accuracy. In this strategy, the location of the object is determined based on the response map of the current frame t and the tracking reliability of the current frame t is calculated based on the response map. Then, the filter template F M ax of the most reliable frame is searched in the range t∼t-k-1. We use F M ax as the local maximum reliable template in the range of t ∼ t-k-1. Concurrently, the possibility of occlusion in the current frame is judged according to the reliability. F M ax is fused into the tracking template according to its learning rate µ max at the subsequent template fusion. Among them, the learning rate of F M ax is adjusted according to the occlusion condition. The baseline algorithms for this strategy are ECO and ECO-hc. Figure 1 shows the structure of the proposed tracking algorithm.

A. RELIABILITY ANALYSIS OF FILTER TEMPLATES
The formula used to calculate the response map in the discriminative correlation filter (DCF) is: where w is the Fourier transform of w. In Equation (8), w is a feature for extracting the t-th frame according to the search area. F t−1 is the tracking template of frame t-1, and F t−1 is obtained by fusing the filter template of the historical frame. The tracked object position is determined according to the maximum value of the response map G.
In previous related filtering algorithms, different historical filter templates FC 1 , · · · , FC t-2 and FC t-1 were not treated differently. In fact, the relative importance of different historical filter templates obviously varies. The more reliable the filter template is, the more important it is. For example, when the object is affected by the occlusion factor, the current filter template has a smaller benefit on object tracking. However, when the object is affected by occlusion, the reliability of the filter is generally low. Therefore, the weighted peak sidelobe rate (WPSR) is used to measure the reliability of the filter template. The value range of WPSR is [0, 1]. The formula used to calculate WPSR is: In Equation (9), R t is the response map of the t-th frame, max(R t ) represents its maximum value, and R t is the response map after removing the center position from the 11 × 11 pixel range. mean(R t ) represents the average of the response, and Std(R) is the standard deviation of the frame response. The ratio max(R t )−mean(R t )

Std(R)
is the traditional peak-to-sidelobe ratio (PSR), and max(R t ) is equivalent to the weight. Therefore, Equation (9) can be considered to be the weighted peak sidelobe ratio. Figure 2 shows the reliability of different historical templates for skating2. Because the female dancer in frame 293 is not occluded, the WPSR value reaches a local maximum. The importance of filter templates can be denoted by their reliability. If the value of WPSR(t) is large, then it is indicated VOLUME 8, 2020 that the reliability of filter template is relatively large, and its importance is also relatively large. Conversely, a little WPSR(t) value indicates that the reliability and importance of the filter template is relatively small. Through observation of a large amount of data, it is found that the larger the WPSR(t) is, the more reliable it is in the tracking process.

B. INCREMENTAL MULTI-TEMPLATE UPDATE STRATEGY
In previous correlation-filtering tracking algorithms, some or all the filter templates of historical frames are fused into the tracking template to make better use of the historical information. The template fusion method of previous algorithms is shown in (7). The fusion template in (7) is also a multi-template update. Here, Equation (7) can be expanded into (10): However, the traditional update method in (10) has obvious shortcomings. This method does not take into account the effectiveness and importance of different historical filtering templates. When the tracking reliability is low, if a filter template with low reliability is still fused into the tracking template in this update method, then the tracking template will be polluted. It is easy to cause tracking failure if the pollution source cannot be eliminated after the tracking template is contaminated.
We propose that updating the template with the local maximum reliability can effectively improve the robustness of the template, especially for complex scenes such as occlusion. This assumption is reasonable because the local highly reliable tracking template is closer to the appearance and characteristics of the real object, and it can resist the negative effects of occlusion. Therefore, a new method is proposed to update the tracking template: This is a method of incremental updates of the template, leading to the incremental fusion of the filter template with the highest local confidence into the tracker template. In (11), F t is a tracking template for the current frame t. F t-1 is a filter template for t-1 frames. F Max is the most reliable filter template in the k-frame range from the t-th frame to frame (t-k-1), that is, the filter template of the frame with the highest WPSR score in this k frame range. FC t is the filter template for the current frame. µ M ax is the learning rate of the filter template F Max . µ c is the learning rate of the filter template FC t of the current frame.
To ensure that the fused template will not drift, we have taken the following two measures: (a) Equation (11) uses an adaptive weighted fusion method. In the process of tracking, we judge whether the target is in a complex environment by calculating the confidence of the current frame. According to the target confidence score, we adaptively adjust the weight of the weighted fusion in (11).
(b) When the incremental template is updated, we choose the filter template with the largest local historical confidence as the increment. This template is closer to the current frame, and its similarity to the current frame is greater than to the historical frame, so its ability to distinguish the target in the current frame is stronger. Additionally, the maximum confidence in the local historical frame also ensures that the filter template is more reliable.
The physical meaning of this update strategy is that it uses the incremental update template for the local maximum reliability template, which is completely different from the other traditional update templates and has the following advantages: (a). The incremental update template ensures the richness of the filter template source, which includes not only the historical template F t−1 and current template FC t but also the local maximum reliability template F Max . In fact, the traditional update method in (10) is a special case of our multi-template update strategy.
(b). One only needs to add an extra auxiliary memory space for the local maximum reliability template F Max in our update strategy, which does not cause a significant increase in space complexity and uses less extra memory space.
(c). The incremental update template can enhance the robustness of the filter and overcome template drift to some extent because the filter F Max represents the high reliability filter template.
The traditional template update method fuses only the filter template of the previous frame into the tracker template of the current frame, as shown in (10). If only the filter template with the highest local confidence is merged into the tracker template of the current frame when the template is updated, then the target information will be lost. If the fusion of the filter template with the highest confidence in all the historical frames into the tracker template of the current frame can also improve the performance of the tracker, then it can also be regarded as a special case of our strategy. That is, the template with the highest local confidence is extended to the template with the highest global confidence.

C. THE STRATEGY OF THE ADAPTIVE LENRNING RATE
During the tracking process, if F Max is always fused according to a small learning rate, then the ability of the tracking algorithm to correct errors will be reduced when a template error is encountered. Alternatively, if F Max is always fused according to a large learning rate, then it is easy to cause template drift. To solve this problem, this paper adopts a strategy of adaptively adjusting the learning rate of the local optimal template. This strategy determines the learning rate of F Max according to whether the object is affected by occlusion. In the past, most algorithms used a fixed threshold to determine whether the tracking object was affected by occlusion, which will not be effective when the object's background is complex. Therefore, we propose a new method of discrimination.
In general, when an object is affected by occlusion during object tracking, the reliability of the tracking will decrease sharply. Sometimes, a background change during tracking will also cause the reliability of two adjacent frames to decrease suddenly. To eliminate such interference fluctuations, we determine whether the t-th frame is affected by the object occlusion based on the difference of the reliability scores of the current frame t, frame t-2, and frame t-3. If the reliability score WPSR(t) < τ 1 and |WPSR(t) − WPSR(t − 2)| > τ 2 of the t-th frame or the reliability score WPSR(t) < τ 1 and |WPSR(t) − WPSR(t − 3)| > τ 3 of the t-th frame, then the t-th frame is considered to be affected by object occlusion.
Existing algorithms cannot accurately determine the effect of occlusion on the object. Although our judgment method has high accuracy, it is still far from 100% accurate. Blocked templates can adversely affect tracking, but missing the correct template can easily lead to tracking-template drift. Now, F Max has higher accuracy. This paper considers that the fusion tracking object with an F Max increment can not only ensure the richness of the template but also ensure the accuracy of the tracking template when the object is affected by occlusion. However, if F Max is fused into the trace template every time the template is updated, the template overfitting can easily occur. Therefore, this strategy increases the learning rate of F Max when it is affected by occlusion; otherwise, the F Max learning rate µ M ax is decreased. Therefore, the adaptive learning rate can be set To a great extent, the learning rate µ M ax can be adaptive according to the influence of occlusion. The adaptive learning rate can adjust the updating parameters to overcome the occlusion template drift.

V. EXPERIMENT
To accurately analyze the performance of our proposed strategy, we selected the datasets VOT2016 dataset [48], UAV123 [49] and OTB100 dataset [50] in the field of object tracking to test it. The experiments in this paper were all completed with an Intel (R) Core (TM) i7-6700 CPU @ 3.40 GHz running the 64-bit Windows 10 Professional Edition system and MATLAB 2016a.
The VOT2016 dataset consists of 60 videos compiled from a set of more than 300 videos. This article uses the four main evaluation methods in the VOT2016 challenge as quantitative evaluation indicators to evaluate the tracking performance. The four indexes are Expected Average Overlap (EAO), Accuracy (Average Overlap during active tracking), Robustness (failure rate) and Average Overlap (AO). AO is similar to EAO but not exactly the same. EAO will redetect after the object, and the object-prediction boxes do not overlap for 5 frames, while AO does not have such a restart mechanism. The trackers are ranked using the area-under-the-curve (AUC) score in AO. The greater the value of Robustness, the worse the stability in the VOT2016 challenge. The calculation VOLUME 8, 2020 formula of robustness is as follows: where F(i, k) is the number of failures of the i-th tracker in the k-th repeat. We refer to [48] for details. UAV123 is a dataset of videos captured by low-altitude drones. UAV123contains 123 video sequences, with frames exceeding 110k. As a dataset of drone photography, UAV123 is dominated by small objects. In this paper, the accuracy plot is the percentage of the number of frames between the actual object position and the predicted object position within different thresholds and the total number of frames. The success plot represents the percentage of the number of frames with the overlapping rate of the tracking result of frame t and the real object within different thresholds of the total number of frames. We refer to [49] for details.
OTB100 is a commonly-used benchmark library in the field of target tracking. OTB100 contains 100 video sequences containing various complex scenes, including Fast Motion, Illumination Variation, Occlusion, etc. The main evaluation indicators of OTB100 are the success rate and precision rate. We refer to [50] for details.

A. IMPLEMENTATION DETAILS
ECO and ECO-HC are our baseline trackers. We used our strategy to improve the benchmark trackers, which were termed the IMUS (Incremental Multi-template Update Strategy) and IMUS-hc (hand-crafted feature version IMUS) tracking algorithms.
In the IMUS, we apply the same feature representation as ECO, a combination of the first (conv-1) and last (conv-5) convolutional layer in the VGG-m [51] network, along with HOG [52] and Color Names (CN) [53]. In this paper, we believe that the range k of the local frame is better within the range of 18 to 20. When testing the VOT2016 dataset, we set k = 18 in the IMUS. The learning rate of the local optimal template F Max in the IMUS was α 0 = 0.017, α 1 = 0. The learning rate µ c was set as 0.012 for the IMUS. We normalized the value of WPSR (t). The range of values for WPSR (t) is [0,1]. In Section 3.3 we set the thresholds for determining occlusion: τ 1 = 0.325, τ 2 = 0.125, τ 3 = 0.175. The other main parameters are the same as for ECO.
In the IMUS-hc, we applied the same feature representation as ECO-hc, along with HOG and Color Names (CN). We found that the value of k of the local frame was better within the range of 18 to 20. When testing the VOT2016 dataset, we set k = 20 in IMUS-hc. The learning rate of the local optimal template F Max in IMUS is α 0 = 0.018, α 1 = 0. The learning rate µ c = 0.012 of the current filter template FC t for the IMUS. In this experiment, the judgment parameters of occlusion are the same as those set in the IMUS. The other main parameters are the same as ECO-hc.

B. INCREMENTAL TEMPLATE ANALYSIS
In this experiment, the choice of the incremental template in the incremental multi-template update strategy has an important impact on the performance of the tracker. In this section, we discuss the choice of the incremental template in the incremental multi-template fusion strategy.
To study the effect of the incremental template on the tracker in the strategy, we selected the global and local optimal templates as the incremental templates. Except for the initial frame, the template with the highest global confidence has higher reliability but is relatively different from the current frame. The template with the highest local confidence is different from that with the highest global confidence. The template with the highest confidence in a small range closer to the current frame has a smaller gap with the current frame and has higher reliability as well.
In Table 1 and Table 2, ''FC t '' is the effect of not using the incremental multi-template update strategy. ''FC t + F Max_global '' uses the template with the highest confidence among all historical templates except the initial frame as the increment in the incremental multi-template update. ''FC t + F Max_local '' is to select the template with the highest confidence among multiple historical templates close to the current frame as the increment in the incremental multi-template update.
It can be observed from Tables I and II that the performance of the tracker is significantly improved after adopting the incremental multi-template fusion strategy. The performance of the tracker is best after the local optimal template is used  as the incremental template in the incremental multi-template fusion strategy. The template with the maximum global confidence can improve the performance of the tracker to a certain extent by virtue of its high reliability. The template with the largest local confidence is more similar to the target of the current frame, so it improves the performance of the tracker more significantly. From the above ablation experiment, we can observe that our strategy has a more obvious lifting effect on the tracker using the hand-crafted feature.

C. VOT DATASET EVALUATION
Here, we analyze the effect of our strategy on the VOT2016 dataset. Table 3 shows the performance of our trackers compared to the most advanced trackers.
Compared with the baseline tracker, our tracker is significantly better. Table 3 shows that the EAO score of the IMUS is 0.38. Compared with the baseline tracker (ECO), our strategy achieves a relative gain of 2.7% in EAO. The accuracy score of the IMUS is 0.55. Compared with the baseline tracker (ECO), our strategy achieved a 1.9% relative gain in accuracy. The IMUS's Robustness score was 10.17. Compared to the baseline tracker (ECO), our strategy has significantly reduced robustness. The IMUS has an AO score of 0.45, which has a clear advantage over the baseline tracker (ECO). Table 3 shows that the EAO score of the IMUS-hc is 0.33. Compared with the baseline tracker (ECO-hc), our strategy achieves a relative gain of 3.1% in EAO. The accuracy score of the IMUS-hc is 0.54. Compared with the baseline tracker (ECO-hc), our strategy achieved a 1.9% relative gain in accuracy. TIMUS-hc's robustness score was 10.17. The robustness of our strategy is significantly improved compared with that of the baseline tracker (ECO-hc). Our strategy achieves a relative robustness gain of 2.7%. The IMUS-hc achieves an AO score of 0.41, which has a clear advantage over the baseline tracker (ECO-hc). Compared with state-of-the-art trackers other than the baseline tracker, our tracker has an absolute advantage. These state-of-the-art trackers include: UDT_plus [54], UDT [54], SiamFC [55], DNT [56], HCF [57], Staple [23], SRDCF [22], DSST [25]. As shown in Table 3, among the deep feature trackers that are compared, UDT_plus provides the best results, with an EAO score of 0.30 in the experiment baseline. The EAO value of the tracker of the IMUS using this strategy is improved by 26.7% compared to UDT_plus. Among the compared trackers using hand-crafted features, Staple provides the best results, with an EAO score of 25% in the experiment baseline. The EAO of the tracker of the IMUS-hc with the strategy proposed in this paper is 3% higher than that of Staple. Among the deep feature trackers compared, DNT provides the best results with an AO score of 43% in unsupervised experiments. The AO value of the IMUS is 2% higher than that of DNT. Among the compared trackers using hand-crafted features, SRDCF provides the best results, with an AO score of 40% in unsupervised experiments. The AO of the IMUS-hc is 1% higher than that of SRDDF. Figure 4 shows the expected overlap scores for the baseline in the VOT2016 dataset in unsupervised experiments. As shown in Figure 4, the expected overlap scores of the IMUS are greater than those of other state-of-the-art trackers. The IMUS-hc has good performance as well. Figure 5 shows the accuracy and robustness scores of the IMUS, IMUS-hc, and eight state-of-the-art algorithms affected by occlusion in the experiment baseline. As shown in Figure 5, the IMUS and IMUS-hc are more robust than the other algorithms. Figure 6 shows the order of the failures after different factors influence the experiment baseline. Obviously, the ability of the IMUS and IMUS-hc to address occlusion effects exceeds that of other algorithms. Figure 7 shows examples of unmanned VOT2016 results, which indicate that the IMUS and IMUS-hc algorithms are better than the other state-of-the-art trackers. Because our   strategy uses the local maximum reliability filter to update the template, it can overcomes the template drift to a great extent. Therefore, the IMUS and IMUS-hc algorithms can  achieve better tracking performances than other state-of-theart trackers.
From Figures 8 and 9, we can observe that our IMUS tracker and IMUS-hc tracker have excellent precision performance. Between them, our strategy improves the IMUS-hc tracker with the hand-crafted feature more obviously. Compared with ECO-hc, our IMUS-hc tracker improved precision by 1.1%, and its success rate greatly improved as well. Compared with siamRPN, our tracker IMUS-hc increased precision by 1.6% and the success rate by 1.2%. Compared with MKCFup, our IMUS-hc tracker improved precision by 8.1% and success by 5.5%. Compared with MMTS, our IMUS-hc tracker improved precision by 10.4% and success by 14.0%. From the experimental results, we observe that our strategy improves the performance of the tracker (IMUS) using the hand-crafted feature more significantly.
As shown in Figures 11 and 12, the precision score of the baseline tracker improved significantly after our policy was applied to it. The IMUS and IMUS-hc scores are significantly higher than those of the baseline trackers ECO and ECO-hc. As shown in Figure 11, the precision scores of IMUS and IMUS-hc are 0.753 and 0.735, respectively. The precision  score of the best performing GCT among the eight stateof-the-art trackers, except the baseline tracker, is 0.732. It can be clearly observed that GCT 's precision score has a gap with the IMUS and IMUS-hc. As shown in Figure 12, the success scores of the IMUS and IMUS-hc are 0.634 and 0.610, respectively. Therefore, there is little difference in performance between GCT and the IMUS-hc. However, the success scores of the IMUS have a clear advantage over those of GCT. Figure 13 show precision plots of the UAV123 dataset, respectively. We find that the IMUS and IMUS-hc algorithms have advantages in some challenging attributes, such as background clutter, camera motion, fast motion, partial occlusion and illumination variation. In the success plots of Figure 14, the IMUS and IMUS-hc algorithms also achieved good performance in some challenging attributes, such as background clutter, camera motion, partial occlusion, low resolution and illumination variation.

F. DISCUSSION
Our tracker is effective in complex environments. In the correlation filter tracker, the quality of the template determines the tracking effect. In complex environments, the template's impact on the tracker's performance is more pronounced. The template of the related tracker is constantly updated during the tracking process. Therefore, when the target is in a complex environment, it is inevitable to fuse part of the environmental information into the template. Therefore, we fuse the incremental template of the frame with accurate local tracking and little influence of background information into the tracking template. In this way, the effect of complex environments on tracking is effectively eliminated. For example, as shown in Figures 11 and 12, when the target moves rapidly, the background is cluttered, the viewing angle changes, etc., our tracker is significantly better than other state-of-theart trackers. Of course, in individual cases, our tracking is slightly worse than other advanced trackers, for example, when the goal is beyond the field of vision. We believe that the main reason for this finding is the selectable range of the incremental template, as we showed in the ablation experiment. In different scenes, the targets and backgrounds change at different speeds. In practical applications, the selectable range of the incremental template can be adjusted according to the speed of change of the target and background in the scene. The range we choose in this article can guarantee superior overall performance.
Qualitative and quantitative experiments on the VOT2016, OTB100 and UAV123 dataset show that our strategy more obviously improves the effect of trackers using the hand-crafted feature. We believe that this improvement is due to the high robustness of the depth feature itself, so the performance improvement of the tracker using the depth feature is not particularly obvious. The robustness of the hand-crafted feature is poor, so our strategy improves the effects of trackers using the hand-crafted feature more significantly.

VI. CONCLUSION
We propose a novel incremental multi-Mtemplate update strategy that incrementally fuses the local maximum reliability template into the update of the tracking filter template. This incremental template-update method can enhance the robustness of the tracker and overcome template drift. This strategy also improves the accuracy of object tracking by adjusting the learning rate of the local maximum-reliability template adaptively according to the occlusion state of the object. Qualitative and quantitative experiments on the VOT, OTB and UAV datasets demonstrate that the novel tracking strategy performs favorably against other state-of-the-art trackers in terms of success rate and precision plots. KEWEI