A Multi-Information Fusion Correlation Filters Tracker

In recent years, trackers based on correlation filters have attracted more and more attention due to the impressive tracking accuracy and real-time performance. However, in real scenarios, the tracking results are often been interfered with by the occlusion, illumination variation, appearance variation and background clutter. In order to find a tracker with better tracking performances, this paper proposed a multi-information fusion correlation filter tracker, which uses channel and spatial reliabilities and time regularization information on samples for filter training, and which not only extends the target search areas but also has a stronger ability to track the targets with significant appearance variations. Thus, results from extensive experiments conducted on OTB100, VOT2016, TC128, and UAV123 data sets show that our tracker with only directional gradient histogram (HOG) and color name (CN) features, performs favorably against the state-of-the-art trackers in terms of tracking precision, tracking success rate, tracking accuracy, and A-R rank.


I. INTRODUCTION
The problem of target tracking has received a significant contributions in recent times due to the rapid developments of artificial intelligence technologies [1]- [3]. Target tracking refers to the continuous search of the target position and scale in subsequent video frames, given the target position and scale in the first frame. Currently, target tracking technology has made substantial progress as a result of improved computer hardware performances and the introduction of new target tracking algorithms. However, as numerous adverse factors occur in real scenarios, such as target occlusion, scale variation, illumination variation, background variation, and appearance variation, etc., it is still a major challenge for a tracker to achieve high-precision, high success rate, and reasonable robustness.
Correlation filter trackers train classifiers by minimizing errors. Thus, by extracting the target information and correlating with correlation filters, a group of target-possible The associate editor coordinating the review of this manuscript and approving it for publication was Mu-Yen Chen . response values is obtained, and the position with the highest response value is taken as the center of the target. Furthermore, in order to release the computational burden, the Fast Fourier transform is often used to transfor-m the loss function of the tracker into the frequency domain.
Various studies have argued that, in recent times, correlation filter-based trackers are widely used in the field of tracking because correlation filter-based trackers have more efficient computational capacity and more robustness than other trackers [3]- [5]. Bolme et al. [6] proposed a minimum output sum of squared error filter (MOSSE) tracker, which trains classifiers with gray-scale features of the target in the initial frame, and the correlation filters are used for target tracking for the first time. Henriques et al. [7] on their part added dense sampling and kernel trick based on MOSSE, and the dense sampling reduced the redundancy of training samples by shifting the image vector with a cyclic matrix. Furthermore, based on circulant structure tracking with kernels (CSK), the directional gradient histogram (HOG) feature, which is more robust to the change of illumination was introduced to kernelized correlation filter (KCF) [5], with the aim of achieving a better tracking performance. Unlike the Gauss kernel function used by KCF, with the help of linear kernel that has advantages in multi-channel feature merging, discriminant correlation filters (DCF) gain faster-tracking speed.
At present, many scholars are still studying and improving DCF-based tracking models. In their study, Danneljan et al. [8] proposed spatial regularization DCF (SRDCF) algorithm by imposing a spatial penalty on DCF coefficients to resolve the boundary effect caused by cyclically shifted samples. Li et al. [9] also suggested spatiotemporal regularized correlation filter (STRCF) by introducing time regularization into SRDCF that deals with the boundary effect without lost of efficiency, and at the same time, having stronger ability to deal with the targets with large appearance variations and occlusion.
In introducing channel and spatial reliabilities into DCF, Lukežic et al. [10] were of the view that a discriminative correlation filter with channel and spatial reliabilities (CSR-DCF), effectively usually track the irregular shaped targets. Additionally, in an adaptive spatial regularization correlation filter proposed by Dai et al. [11], two correlation filter (CF) models with complex features are used to locate the target and whereas the other filter correlations with shallow features are used to estimate the target scale. Consequently, to solve the drawbacks of the DCFs-based trackers that some negative effects are produced by the generated samples and the response map is vulnerable to noise interference, Yuan et al. [49] proposed a target-focusing convolutional regression model for visual object tracking. Therefore, to enhance the robustness of deep regression trackers to complicated situations, (e.g., occlusion, background clutter, and deformation), Yuan et al. [50] further proposed an adaptive structural convolutional filter model. Considering that the appearance model is easily disturbed by noise in the tracking algorithms with a single feature, Yuan et al. [51] proposed a multiple feature fused model into a correlation filter framework for object tracking.
Although many improvements have been made to the correlation filters and good target tracking results have been achieved, the currently available correlation filter trackers still fail to completely solve the boundary effect caused by the cyclically shifted samples used training correlation filters.
To solve the above-mentioned problems, this paper suggested a multi-information fusion correlation filters tracker to boot the tracking performance and robustness of the tracker, in which the channel and spatial reliabilities and time regularization information of samples are used together to train correlation filters for the first time, whereas only one or two of them are used in previous correlation filter trackers. Furthermore, in the proposed tracker, the spatial reliability is used to adjust the filter to the areas suitable for target tracking, which is effective in overcoming the limitation of the tracking target rectangle; the channel reliability is used to weight the response of each feature channel to emphasize its contribution in target location to better located the target; and the time regularization is helpful at some extent in dealing with the boundary effects and improving the tracking robustness to the target with large appearance variations and much occlusion. At the same time, the alternating direction multiplier method is used in this paper to solve our object function to improve the time performance of our tracker. Extensive experiments conducted on four universal data sets OTB100 [12], VOT2016 [13], TC128 [52], and UAV123 [2] with multiple attributes video sequences suggest that our tracker performs favorably against many state-of-the-art trackers in terms of precision rate, success rate, tracking accuracy, A-R rank, pixel error, and overlap rate.
The main contributions of this paper are listed as follows: 1) By combining the time regularization with the channel and spatial reliabilities of samples for the first time, a multiinformation fusion correlation filters tracker was proposed in this study. Thus, the use of the spatial reliability allows searching the target in larger areas, and the channel reliability is helpful in better locating the target; meanwhile, the time regularization can alleviate the influence of boundary effects caused by cyclically shifted samples and improve the tracking robustness to the target with large appearance variations and much occlusion.
2) The alternating direction multiplier method (ADMM) [21] was used to solve the filter and the Lagrange operator iteratively after the augmented Lagrange equation of our tracker being decomposed into sub equations related to the filters, and it significantly reduces the computational complexity of our tracker.
The remainder of this paper is structured as follows: Section II reviewed literature in relation to object tracking; the contents of the proposed tracker are then presented in section III; in Section IV, we evaluated the proposed tracker on publicly available data sets and finally, section IV concludes the study.

II. RELATED WORK
This section reviewed literature on object tracking with specific focus on correlation filter-based tracker and non-correlation filter tracker.

A. CORRELATION FILTER-BASED TRACKER
In recent years, the CF-based tracker has widely been used in the field of visual tracking. Considering that the tracking will be adversely affected by the target surrounding environment, in the learning stage, Mueller et al. [15] added context information to the CF for the first time, which significantly improved the tracking performance. In view of these, the CF filters are controlled by the salient regions on the feature map, which leads to model degradation, Sun et al. [17] introduced a local response consistency regular term to emphasize the equal contribution of different regions.
In a related study to solve the problem of target tracking drift and even failure caused by background clutters or target appearance variations, Li et al. [17] proposed train correlation filters with background patches selected by affinity VOLUME 8, 2020 propagation to maximize the edge between foreground and background, while at the same time using a multi-level target scale variation supervision mechanism to adjust the target scale. The interdependence between the different features of the tracking target and the spatial constraints between each part was used by Zhang et al. [18] to examine multiple correlation filters to give full play to the advantages of correlation and particle filters to effectively track targets with scale variations and occlusion. Liu et al. [19] on the other hand proposed a template matching via mutual buddy similarity and memory filtering tracker to match targets with reciprocal k-nearest neighbors in complex situations, and the representative and reliable results are selected to learn different types of templates in memory filtering scheme.
To improve the feature extraction, sample training process, and tracking performance of the traditional kernel correlation filter, Yang et al. [20] proposed a joint correlation filter tracker with multi-feature and scale adaptation, which consists of two parts: position and scale correlation filter trackers. Zhang et al. [21] proposed a scale adaptive tight correlation filter to solve the problems that the background interference caused by large scale training samples and target representation errors caused by the appearance variations of training samples.
To solve the problem that the part-based tracker has a poor tracking performance on partially occluded targets, which is caused by the target overall appearance ignorance, Ruan et al. [22] integrated a part-based strategy into the CF framework and proposed a multi-part correlation tracker with triangle structure constraints (MCTTC) by constructing multiple CFs with the global and local appearance of the target. Wang et al. [23] proposed an effective framework for multithreaded analysis, in which multiple experts were constructed with discriminant correlation filters, and the most appropriate expert was selected to track the target.
Recently, it has become the choice of many researchers to combine CF models with deep features for target tracking. The convolutional neural network (CNN), CN, and HOG features of the target were used by Danelljan et al. [24] to generate the feature maps to train CF models. A CF model with single-scale robust deep features was used by Dai et al. [11] to locate the target accurately. Danelljan et al. [25] used continuous convolution filters to combine feature maps with different spatial resolutions. Li et al. [9] combined the output of the conv3 layer in the VGG-M network with the HOG and CN features to train correlation filters. Sun et al. [17] used the deep features which were the outputs of conv1 from VGG-M and conv4-3 of VGG-16, HOG and CN features to train correlation filters [26].
He et al. [27] combined autocorrelation and crosscorrelation with convolution neural networks to represent the target features to jointly exert the advantages of CF and CNN, and obtained excellent tracking performance. To enhance the recognition and tracking abilities of correlation filters to the occluded and deformed targets, Pu et al. [28] constructed a spatial reliability map with deep features by using convolution neural networks and introduced time regularization to train DCFs.

B. NON-CORRELATION FILTER TRACKER
Although several studies suggest the preference for CF-based trackers, some non-correlation filter trackers have also shown good tracking efficiency. In their study, Li et al. [29] proposed a multi-stream deep similarity learning network to learn a strictly offline similarity comparison model, which could still effectively identify the target even if it is interfered with background clutters and appearance variations. Bhat et al. [30] on the other hand proposed a particle filter target tracking algorithm based on multi-feature fusions, in which the color distribution in the particle filter framework is robust to the target with scale variation and partial occlusion and the KAZE (a Japanese word that means wind) feature of the target structure are used to track the target.
In a related study to design a target tracking model with effective online observation and model updating capabilities, Huang et al. [31] proposed representing the target features with the combination of direction gradient change and color histograms, while the single hidden layer feed-forward neural network and recursive orthogonal least-squares algorithm are used as target observation models. Aiming at the high time complexity of the Siamese trackers when used to estimate the scale and angle of the tracking target, Lee [32] proposed a single shot Siamese network that could estimate the size and angle of the target with a single search area. Li et al. [33] proposed a lightweight particle filter tracking method that not only retains the robust tracking ability of particle filters, but also reduces the time cost in sampling with the use of correlation filters.
Inspired by anchor free detectors, Chen et al. [34] opined that Siamese box adaptive target tracking network, composed of Siamese network backbone and multiple boxes adaptive heads, could be appropriately used to resolve the problem of accurate estimation to the target scale and aspect ratio by transforming tracking into classification regression. Danelljan et al. [35] in their contribution argued that probability regression formula for target tracking can model the label noise caused by incorrect annotations and ambiguities, thus leading to an improved the tracking performance. Also, Xu et al. [36] in analyzing the unique characteristics of the target tracking problem, suggested a set of practical target state estimation criteria by designing a full convolution Siamese ++ tracker (SiamFC++) consisting of classification and target state estimation branch (G1), no fuzzy classification score (G2), no prior knowledge tracking (G3) and estimation quality score (G4).
Considering the situation where full convolution Siamese network based on template matching cannot capture the time variation information of the target and background clutter, Li et al. [37] proposed a gradient guidance network to update the template of the current frame with the discrimination information of the gradient. In addition, Li et al. [38] put forward a tracking algorithm, in which two complementary trackers run in parallel, and between these two trackers, the Bayesian tracker (B-tracker) with adaptive learning rate solves the problem of target appearance variations; the S-tracker, which is the tracker with an improved incremental subspace learning method, solves the problem of target occlusion.
To deal with the problem of illumination variation and occlusion in visual tracking, Li et al. [39] suggested to use only the bright pixels to compare the similarity between candidate and training samples, and to update the model with an online strategy after getting a new target. Moreover, Li et al. [40] in resolving the problem of performance decrease of a single classifier tracking whenever the target is occluded, suggested that a group of related classifiers should first be derived with the combination of particle filters and sample sets, and then a classifier query mechanism should be established to select the appropriate classifier to track the target in the next frame.

III. OUR PROPOSED TRACKER
In this section, we discuss the following in turn: correlation filters (CF) [5], spatially regularized discriminative correlation filters (SRDCF) [8], spatial-temporal regularized correlation filters (STRCF) [9] and discriminative correlation filter with channel and spatial reliability (CSR-DCF) [10]. Finally, the correlation filter tracker with channel and spatial reliabilities and time regularization proposed in this paper is discussed.
By minimizing the sum of the square differences between the channel correlation output and the expected output (ground truth) g ∈ R dw×dh , the optimal filter is obtained in the learning stage.
H is a Hermitian transpose, * represents the convolution operator, and λ is a regularization constant. As the CF model suffers from the unwanted influences of boundary effects caused by circulant shifted samples, which are used for filter learning, its tracking performance is unavoidably degraded.

B. SPATIALLY REGULARIZED DISCRIMINATIVE CORRELATION FILTERS
To mitigate the undesirable boundary effects in the CF model, Danelljan et al. [8] proposed spatially regularized discriminative correlation filters (SRDCF) with spatial constraints.
In SRDCF, a larger size of the image channel feature f d is taken to retain more real information of the target, and then punishes the samples far from the target center through a spatial weight coefficient w. SRDCF is developed by minimizing the following indicators: where ''·'' denotes the Hadamard product, * stands for the convolution operator, w is the spatial regularization matrix, f d is the channel feature, and h d and g are the target template and desired output, respectively. Although SRDCF [8] can effectively suppress the adverse boundary effects, the spatial regularization on multiple images will destroy the structures of the circulant matrix, resulting in a higher computational burden.

C. SPATIAL-TEMPORAL REGULARIZED CORRELATION FILTERS
By introducing temporal regularization to SRDCF [8], Li et al. [9] simplified the SRDCF reduced multiple samples with the spatial-temporal regularized correlation filters (STRCF) into a single sample, and the problem of the large amounts of calculation in the SRDCF [8] model was solved.
arg min where f t−1 denotes the CFs utilized in the (t − 1)-th frame, and µ denotes the regularization parameter. Meanwhile, the second term in formula (3) denotes the spatial regularization, and the third term denotes the temporal regularization. STRCF [9] can adaptively balance the trade-off between aggressive and passive model learning, and has more robust tracking performance in the case of large variations in the appearance of the tracking target.

D. DISCRIMINATIVE CORRELATION FILTER WITH CHANNEL AND SPATIAL RELIABILITY
To alleviate the unwanted boundary effects in the CF model, Lukežic et al. [10] introduced a dual variable h c to the CF model and constrained h c − m h = 0; here, m is the spatial reliability map, which identifies pixels in the filter that should be ignored in learning. The augmented Lagrangian form of CSR-DCF is whereÎ is a complex Lagrange multiplier, µ > 0, and for compact notation, h m = m h is defined. VOLUME 8, 2020 At the target positioning stage, the channel reliability is computed as the product of the learning channel reliability ω d = ζ max(f d × h d ) and detection channel reliability ω (det) d = 1 − min(ρ max2 /ρ max1 , 1/2), where ρ max2 /ρ max1 is the ratio between the second and the first major modes in the response map.

E. OUR PROPOSED TRACKER
In order to solve the boundary effects caused by cyclically shifted samples used for correlation filters training as much as possible, and obtain better tracking performance for the targets with appearance variation and occlusion, in this paper, we proposed a multi-information fusion correlation filter tracker, in which the channel and spatial reliabilities and time regularization information of samples are used for correlation filter training, and the channel and spatial reliabilities are refer to the corresponding contents of CSR-DCF [10]. m ∈ [0, 1] dw×dh is the spatial reliability map with elements m ∈ [0, 1] which indicate the learning reliability of each pixel. In CSR-DCF [10], from the perspective of probability model, Lukežic et al. [10] suggest that the reliable probability of pixel x conditioned on appearance y is where the first term on the right is the appearance likelihood value, which is computed by the target foreground and background color histograms; the second term is the probability of the high reliability area where the object located, whose value is determined by the distance between the pixel x and the object center; the third term can be regarded as a prior probability, which is determined by the sizes of the extracted foreground and background models. In a Markov random field, the spatial consistency of labeling m is achieved by using (5) as unary terms.
In multi-channel correlation filters, considering the different importance of each channel to filter training, Lukežic et al. [10] suggest that it is necessary to weight the filter h of each channel, and the weight is determined by the product of learning and detection reliabilities. The learning reliability of each channel is determined by the product of a discriminative feature channel f d and a filter h d , i.e., ; the detection reliability is determined by the ratio between the second and the first major modes in response map, i.e., w (det) d = 1 − min(ρ max2 /ρ max1 , 1/2); finally, the weight of each channel is the normalized product of w d * w The augmented Lagrangian function of our object function is where λ and γ are regularization parameters, µ is the constraint penalty factor, and h c is a dual variable with constraint h cm h ≡0.
Let h m = m h andĥ m = √ DFMh, then, equation (6) can be rewritten as where F is an orthogonal matrix composed of Fourier coefficients. Equation (6) can be iteratively minimized by the alternating direction multiplier method (ADMM) [14]. In each iteration, the following sub-problems are solved: The closed-form solutions of equations (8) and (9) can be obtained from the partial derivative values of L toĥ c andh equal to 0 in equation (7), respectively.
In equation (7), We set the first, second, third, fourth and fifth items on the right of the equal sign as L 1 , L 2 , L 3 , L 4 and L 5 , respectively. After many derivations, we got the following equation: From equation (10), the closed solution of the equation (8) is got, i.e., The partial derivative values of L toĥ c is shown as the following equation: ∂L From equation (12), the closed solution of the equation (9) is got, i.e., The Lagrange multiplier Î and constraint penalty µ are updated as the following equations (14) and (15), respectively: In Algorithm 1, we give a brief overview of our proposed tracking framework.

IV. EXPERIMENTS
All our experiments are implemented on OTB100 [12], VOT2016 [13], TC128 [52], and UAV123 [2] data sets in MATLAB R2018a on a PC with 3.6 GHZ Intel Core i7 processor and 8 GB RAM. Data was analyzed with precision plots of one-pass evaluation (OPE), success plots of OPE, accuracy rate, A-R (accuracy and robust ) rank, overlap rate, and pixel error to compare with some state-of-the-art trackers.
In conducting the experiment, we set the reliability map estimation parameter at α min = 0.05, the histogram adaptation rate at η c = 0.04, the correlation filter adaptation rate at η = 0.02, the regularization parameter at λ = 0.01, the step-size parameter at ϒ= −1, and the augmented Lagrangian optimization parameters at µ = 5 and β = 3. These parameters remain constant in the experiments. More detailed parameters setting information can be found in the code of CSR-DCF algorithm which is the main foundation of our tracker's code.

A. COMPARISON OF TRACKERS
The study compared the proposed tracker with some stateof-the-art trackers, with the aim of fully demonstrating the tracking performance of the proposed tracker. Specifically, the proposed tracker has been compared with the following trackers.
AutoTrack [41] in which the spatially local response map variation was introduced as spatial regularization to make DCF mainly learn the reliable part of the object; ARCF [42] in which background patches are added as negative training samples to expand the target search areas and resolve the boundary effects, where the ARCF_H [42] is the tracker with HOG feature, and the HOG, CN and greyscale features are used in ARCF_HC [42]; HOG feature based BACF [43] in which the negative samples generated by foreground real shifts are obtained through zero padding operation to include larger search areas and many real backgrounds; C-COT [25] in which the deep neural network VGG-net is used to extract the target features, and the feature maps of different resolutions are interpolated into the continuous space domain through an implicit interpolation mode.
CF2 [44] in which HOG feature in the KCF is replaced with deep convolutional features extracted by conv3-4, conv4-4 and conv5-4 layers in VGG-Net; CSR-DCF [10] in which channel and spatial reliabilities are introduced and the standard HOG and CN features are used to train correlation filters; color statistical features based DAT_USABLE [45], in which distractor-aware tracking (DAT) calculates the color histograms of the foreground and background to obtain their color probability models; the CNN, HOG and CN features based ECO [24], which improved the C-COT [25] by reducing the parameters of the DCF.
Extract foreground and background histograms of the object ground-truth area in the current frame.

4.
Calculate foreground prior with foreground and background histograms. 5. Calculate spatial reliability map m with foreground prior. 6. Extract HOG and CN features f t of object. 7. Calculate filter h t and dual variable h(^)c by Eqn. 11 and Eqn. 13. 8. Update Lagrange multiplier Î by Eqn. 14. and constraint penalty µ by Eqn. 15. 9. Calculate response with features f t and filter h t . 10.
Calculate per-channel learning reliability w d with response. 11. Construct tracker and the object ground-truth g t is output as the object bounding-box. 12. else 13. Extract HOG and CN features f t of the object in previous frame with previous tracker. 14. Calculate response with features f t and previous filter h t−1 . 14. Find the position of the maximum response rc max . 15. Calculate displacement distance of the object center with rc max and get the new center of the current frame. 16. Calculate the bounding-box of the object with the new center and previous frame tracker. 17.
Extract foreground histogram and background histogram of the bounding-box area. 18 [46] in which a correlation filter tracking method for joint group feature selection across both channel and VOLUME 8, 2020 spatial dimensions is proposed, and CN, HOG, intensity channels (IC) and CNN features are used; MCCT [23] in which multiple independent DCF-based experts are used to track the target, and each of them is constructed with different combinations of deep and HOG features, and experts in MCCT_H are constructed with different combinations of CN and HOG features; SCT4 [47] in which the decomposition and integration of attention modulations are used to track the target; STAPLE [48], in which two complementary features HOG and COLOR are used to learn the target; STRCF [9] in which HOG and CN features are used, and it can carry out DCF model learning and updating simultaneously, wherein the STRCF_Deep [9] is the STRCF with CNN features.

B. THE OTB100 DATA SET
The OTB100 data set [12] also known as OTB2015 [12], contains various types of tracked targets, and includes 100 fully annotated video sequences with 11 different attributes, such as illumination variation (IV), scale variation (SV), occlusion (OCC), deformation (DEF), motion blur (MB), fast motion (FM), in-plane rotation (IPR), out-of-plane rotation (OPR), out-of-view (OV), background clutters (BC) and low resolution (LR). Accuracy rate and success rate based on precision plot and success plot are the commonly used evaluation indicators for the OTB data set. The precision plot is used to show the percentage of frames, whose tracking bounding box center positioning error is better than the given threshold, to the total number of frames; whereas the success plot is used to show the percentage of frames, whose tracking bounding box overlap rate is greater than the given threshold, to the total number of frames. The center positioning error on the other hand is the average Euclidean distance between the center of the predicted target bounding box and the center of the artificially labeled ground truth. whiles tThe bounding box overlap rate refers to the ratio of the intersection and union between the target bounding box estimated by tracking algorithm and the ground truth. Thus, the study evaluated the tracking performances of all trackers in this paper on OTB100 with precession plots of OPE, success plots of OPE, overlap rate, and pixel error. The results of the experiment are shown below: Figure 1 shows the precision plots of each tracker for video sequences with different attributes on the OTB100 data set [12] under different location error thresholds. It could be seen from Fig. 1 that when tracking targets in OCC and BC video sequences, the tracking precisions of our proposed are the best, which are 3.0% and 2.0% higher than CSR-DCF [10]; for the video sequences with OV, our proposed ranks second only to GFSDCF [46] in terms of tracking precision; for targets in the video sequence with DEF, the tracking precision of our proposed tied with ECO [24] for the second place; and when tracking targets in the video sequences with IV, SV, MB, and FM, although our tracking precisions have fallen to some extent, the worst ranking of our proposed in tracking precisions is sixth, which is still better than some state-of-theart trackers, such as STRCF [9], BACF [43], AutoTrack [41], and MCCT_H [24]. Figure 2 also shows the success plots of each tracker to video sequences with different attributes in OTB100 data set under different overlap thresholds. Results from Fig. 2, suggest that when tracking targets in video sequences with OCC, DEF and BC attributes, the AUC scores of our proposed all rank first, and 0.026, 0.035 and 0.027 higher than the tackers in the second place, meanwhile, 0.046, 0.035 and 0.027 higher than CSR-DCF [10], respectively; for the video sequences with OV, FM, and SV, our proposed ranks second only to GFSDCF [46] in AUC scores, and 0.020, 0.009 and 0.056 higher than CSR-DCF [10] respectively; when tracking targets in the video sequences with OPR, IPR, IV and MB attributes, the AUC scores of our proposed all ranks third and all higher than CSR-DCF [10]; for the video sequences with LR attributes, the AUC score of our proposed ranks fourth, but still 0.045 higher than CSR-DCF [10]. On the whole, the tracking success rate of our proposed on OTB100 [12] ranks second with an average AUC score of 0.856, which is only 0.010 lower than the first-ranked GFSDCF [46], but 0.033 higher than the sixth-ranked CSR-DCF [10]. Figure 3 shows trackers' overlap rates and pixel errors on some OTB100 sequences. Table 1 shows some trackers' average overlap rates and pixel errors on all OTB100 sequences, here we just list 11 trackers out of 17 with better performances for clearly present the figures, and it can be seen from Fig. 3 and Table 1 that our proposed tracker has better performances in these two metrics with 67% and 13.0 pixels separately.
It could also be observed from Figure 4 that the tracking results of some trackers on some OTB100 [12] frames suggest that the targets are more difficult to track. However, for the targets in sequences with SV, OCC, DEF, MB and BC attributes, results in Fig. 4, show that our proposed tracker has better tracking performances than AutoTrack [41], BACF [43], CSR-DCF [10], MCCT [23] and STRCF [9], etc.

C. THE VOT2016 DATA SET
The VOT2016 data set [13] as used in the 2016 VOT (Visual-Object-Tracking) challenge, contains 60 challenging public image sequences, which include toys, faces, vehicles, animals  and many other common target categories, and all with different attributes, such as camera_motion, empty, illu_change, motion_change, occlusion and size change. In addition, the part of the image sequences in the VOT2016 data set are the same as those of the OTB data set. In order to demonstrate the tracking performances of our proposed tracker, VOLUME 8, 2020    we implemented the tracker on VOT2016 [13]. Consequently, the tracking accuracy rate, A-R rank, and AR plot, overlap rate, and pixel error values were measured to evaluate the tracking performances of all trackers. Table 2 presents tracking accuracy rates and A-R rank values of the targets in the video sequences of the VOT2016 data set [13]. Thus, result from Table 2 show that our tracking accuracy rates to targets all rank first in almost all attributes VOLUME 8, 2020  video sequences, except the size_change, with 3.9%, 3.0%, 6.7%, 1.9% and 16.7% higher than CSR-DCF [10]. Furthermore, results in specific, to tracking targets in video sequences with occlusion, show a tracking accuracy rate of 16.7% higher than CSR-DCF [10], thus, indicating that the time regularization information of sample is really helpful in improving the tracking performances to the occluded target. In addition, results suggest that our tracker ranks second  only to GFSDCF [48] in tracking accuracy rate to the targets with size_change, and also 1.6% higher than CSR-DCF [10]. In general, our tracker performs favorably against the stateof-the-art trackers on VOT2016 data set [13] in tracking accuracy rate, and 5.7% higher than CSR-DCF [10]. It can further be observed from Table 2 that the A-R rank of our tracker exceeds the tied second trackers CSR-DCF [10] and GFSDCF [46] by 3.7%. VOLUME 8, 2020   Table 3 shows some trackers' average overlap rates and pixel errors on all VOT2016 [13] sequences, and it can be seen that our proposed tracker ranks second and third in average overlap rates and pixel errors with 50.2% and 43.19 pixels separately. Figure 5 also show 11 trackers' overlap rates and pixel errors on some VOT2016 [13] sequences, thus implying that our proposed tracker has a standout overlap rate and pixel error performances.
From Figure 6, the AR plots for mean, camera_motion, empty, illum_change, motion_change, occlusion, and size_change of all trackers used in this paper, suggest that our tracking accuracy rates to the targets in almost all attributes video sequences in VOT2016 [13] all ranked the first, except the size_change, followed by GFSDCF [46] and CSR-DCF [11], which is consistent with all trackers tracking performances in Table 2. It can also be observed that all abscissa values in Fig. 6 indicate that all the tracking robustness values of our tracker to all attributes targets are not as good as other trackers, although they are all higher than 0.8. As a result, most of the trackers are at the expenses of tracking accuracy, and only the tracker proposed in this paper can maintain the balance of tracking accuracy and robustness at the same time accurately and robustly track the targets. Figure 7 shows the object tracking results of some trackers on some VOT2016 [13] challenging frames. Thus, the tracking results in frames with camera_motion, illum_change, occlusion and size_change as displayed in Figure 7, suggest that our proposed outperforms some state-of-the-art trackers, such as CSR-DCF [10], STRCF [9], AutoTrack [41] and STRCF_Deep [9].

D. THE TC128 DATA SET
The TC128 [52] collected 78 new visually challenging videos on the basis of OTB50 data set, contains 128 color sequences with ground truth and challenge factor annotations, such as IV, SV, OCC, DEF, MB, FM, and IPR, etc. The targets to be tracked in these 128 sequences are diverse, including pedestrians, basketballs, ships, cars, cups, animals, toys, fish, kites, and airplanes.   Figure 8 shows the precision plots of each tracker for video sequences with different challenge factor annotations under different location error thresholds. As can be seen from Fig. 8 that when tracking targets in IV, OCC, and OV video sequences, the tracking precisions of our proposed are the best and with margin 0.8%, 0.4% and 4.6% to CSR-DCF [10]; our proposed tracker ranks second in BC, MB, and SV video sequences; and when tracking targets in the video sequences with FM, IPR, and OPR, our proposed tracker ranks third; for the targets in sequences with DEF and LR, although the tacking precisions of our proposed have fallen to some extent, it still ranks the forth and is better than some state-of-theart trackers. On average, the tracking precision of our tracker ranks first. Figure 9 shows the success plots of each tracker. Results from Fig. 9, suggest that when tracking targets in BC, MB, DEF, IV, IPR, LR, OCC, and OV video sequences, the AUC scores of our tracker all rank first; for the video sequences with FM, OPR, and SV, our tracker ranks second only to GFS-DCF [46] in AUC scores, and 0.04, 0.087 and 0.098 higher than CSR-DCF [10] respectively. On the whole, the tracking success rate of our proposed tracker on TC128 ranks second with an average AUC score 0.785 and a margin 0.013 to the second GFSDCF [46] and 0.096 to the eighth CSR-DCF [10]. Figure 10 shows the overlap rates and pixel errors of some state-of-the-art trackers on some TC128 video sequences. It could be seen from Fig. 10 that our proposed tracker have higher overlap rates and lower pixel errors. Table 4 shows the average overlap rates and pixel errors on all TC128 seqences, and we can see that our proposed tracker all ranks first in average overlap rates and pixel errors with 64.0% and 20.82 pixels separately. Figure 11 shows some trackers tracking results in some TC128 frames. The tracking results suggest that for the  targets in sequences with BC, MB, DEF, IV, IPR, LR, OCC, our proposed tracker all tracked the targets more accurately and outperforms some state-of-the-art trackers, such as CSR-DCF [10], STRCF_Deep [9], and GFSDCF [46], etc.

E. THE UAV123 DATA SET
The UAV123 data set [2] obtained by unmanned aerial vehicles (UAV) at low altitude, contains 123 challenging videos with ground truth and 12 kinds of challenge factor annotations, such as IV, SV, Partial Occlusion (POC), Full Occlusion  (FOC), Out-of-View (OV), FM, Camera Motion (CM), Similar Object (SOB), Aspect Ratio Change (ARC), Viewpoint Change (VC), BC, and LR. The videos of UAV123 data set are basically shot from top altitude to bottom, and some videos in this data set are shot in real scenes, while others are constructed in virtual environments. Since each video contains more video frames, thus, it is often used to evaluate the long-term tracking performance of a object tracker. The main targets in the data set are pedestrians, ships, airplanes, and cars, in addition, many small targets are contained in it. These factors put forward high performance requirements for the tracker to be tested. Figure 12 shows the precision plots of each tracker. It could be seen from Fig. 12 that when tracking the targets in SV, SOB, POC, LR, IV, FOC, FM, CM, BC, ARC, VC, and OV video sequences, our tracker ranks second only to GFSDCF [46] and CSR-DCF [10]; when tracking the targets in OV video sequences, although our tracker just rank forth, it is still better than most state-of-art trackers. On average, our tracking precision of ranks the second with 84.1%. Figure 13 shows the success plots of each tracker. As can be seen from Fig. 13 that the AUC scores of our tracker ranks first when tracking the targets in BC video sequences with a margin 0.04 to CSR-DCF [10]; when tracking the targets with LR and FOC annotations, our tracker ranks second; for the video sequences with with SOB, SV, OV, IV, FM and CM annotations, our tracker ranks third; though the success plots of our tracker to the targets with POC, ARC, and VC annotations are poor, the worst ranking of our tracker is fifith, which is still better than trackers like CSR-DCF [10] and STRCF [9], etc. On the whole, the tracking success rate of our tracker on UAV123 ranks forth with an average AUC score of 0.644 with a margin 0.048 to CSR-DCF [11]. Figure 14 shows the overlap rates and pixel errors of some state-of-the-art trackers on some UAV123 video seqences. It could be seen from Fig. 14 that our proposed tracker have higher overlap rates and lower pixel errors. Table 5 shows the average overlap rates and pixel errors on all UAV123 seqences, and we can see that the average overlap rate of our proposed tracker ranks forth, and our pixel error ranks second. Results from Fig. 14 and Table 5 suggest that our tracker have better tracking performance on UAV123 data set. Figure 15 shows the tracking results of some trackers in some challenging UAV123 frames. It could also be observed from Fig. 15 that for the targets in sequences with BC, LR, FOC, etc., our proposed tracker all successfully and accurately tracked the targets, which fully demonstrated its outstanding object tracking abilities.

V. EXPERIMENTAL ANALYSIS
Experimental results on the OTB100 [12], VOT2016 [13], TC128 [52], and UAV123 [2] data sets, suggests that the proposed correlation filters tracker with spatial and channel reliabilities and time regularization can effectively solve the boundary effects by making full use of the spatial, channel and temporal information of samples. Especially for the targets with significant appearance variations, the tracking performances of our proposed tracker are better than some state-of-the-art trackers, such as GFSDCF [46], CSR-DCF [10], MCCT [23] and C-COT [25].
Moreover, the experimental results demonstrated that the tracking precision, success rate, overlap rate, and pixel error of our proposed are better than CSR-DCF [10] in most video sequences in the OTB100 [12], VOT2016 [13], TC128 [52], and UAV123 [2] data sets, thus, suggesting that the time regularization information of samples is effective in alleviating the boundary effects and improving the tracking performance of correlation filters. It can further be infer that, it is important to combine the time regularization information with channel and spatial reliabilities of samples to train correlation filters with more target tracking ability.

VI. CONCLUSION
In this paper, by introducing the time regularization information of samples into the correlation filters with sample spatial and channel reliabilities for the first time to train correlation filter to alleviate the boundary effects and improve the object tracking ability and robustness, we proposed a multi-information fusion correlation filters tracker, that is different from the common correlation filter trackers. In addition, the use of alternating direction multiplier method (ADMM) to solve the objective function of the proposed tracker reduces its time complexity. The paper further demonstrated that our proposed tracker with HOG and CN features performs favorably against some state-of-the-art trackers, such as STRCF, CSR-DCF, MCCT, and AutoTrack in terms of tracking precision, success rate, tracking accuracy, A-R rank, overlap rate, as well as pixel errors with extensive experiments on OTB100, VOT2016, TC128, and UAV123 data sets. In particular, it should be noted that our proposed tracker has better tracking performances for targets with more significant appearance variations. Finally, it is thus suggested that future studies introduce deep features into the proposed tracker to further improve its object tracking performance.