Vehicle Tracking on Satellite Video Based on Historical Model

Vehicle tracking on satellite videos poses a challenge for the existing object tracking algorithms due to the few features, object occlusion, and similar objects appearance. To improve the performance of the object tracking algorithm, a historical-model-based tracker intended for satellite videos is proposed in this study. It updates the tracker by using the historical model of each frame in the video, which contains plenty of object information and background information, so as to improve tracking ability on few-feature objects. Furthermore, a historical model evaluation scheme is designed to obtain reliable historical models, which ensures that the tracker is sensitive to the object in the current frame, thus avoiding the impact caused by changes in object appearance and background. Besides, to solve the drift issue of the tracker caused by object occlusion and the appearance of similar objects, an antidrift tracker correction scheme is proposed as well. According to the comparative experiments conducted on satellite videos dataset SatSOT, our tracker produces an excellent performance. Moreover, sensitivity analysis, varying criteria comparative experiments, and ablation experiments are conducted to demonstrate that the proposed schemes are effective in improving the precision and success rate of the tracker.

the computer. Due to the diversity of moving objects and the video equipment, it is necessary to adapt the object tracking algorithm to different data sources. For example, the ordinary video [1]- [3], the unmanned aerial vehicle (UAV) video [4], [5], the thermal infrared video [6], the synthetic aperture radar video [7], [8], and the satellite video [9]- [21], which is an emerging type of space-based video data in recent years. The satellite video has demonstrated such advantages as wide shooting range, high resolution, and the capability of continuously monitoring the target objects on land.
The object tracking on satellite videos has a wide range of applications in national defense, environmental protection, disaster prevention, and traffic monitoring [12], [22]. Our study focuses on the vehicle tracking, which is vital for traffic and military spying. Besides, due to the wide shooting range of satellite videos, the long-distance tracking is achievable, which is conducive to analyzing the driving motives and trajectories.
There are some issues when the vehicles are tracked on satellite videos. The first one is a small number of features and textures due to the small size of the object. Then, the vehicle is often occluded by trees, bridges, and other obstacles. Besides, similar objects often appear around the tracked object. Thus, it is a challenge to apply the existing object tracking algorithms to satellite videos.
According to the principle followed by the algorithm, the existing object tracking algorithms can be divided into two categories: generative models [23]- [26] and discriminative models [27]- [33]. Generative models focus on the characteristics of the object itself and track the object by conducting iterative search for the similar object frame by frame. Some classical tracking algorithms, such as MeanShift [23], CAMShift [24], particle filter [25], and optical flow [26] are classed as generative models. Differently, discriminative models focus on the difference between the object and the background. It takes background information into consideration, e.g., MOSSE [27], CSK [28], KCF [29], and STC [30]. With the widespread application of deep learning, the object tracking algorithms combined with deep learning have been gradually developed. Some algorithms extract deep convolutional features, e.g., C-COT [34] and ECO [35], both of which use the VGGNet [44], whereas others use end-to-end object tracking methods, e.g., CFNet [45], SiamFC [46], and MDNet [47]. Given the lack of background information, the generative algorithm only models the object itself, which is unreliable for the few-feature objects on satellite videos. By contrast, the discriminative algorithm introduces background information, which makes it advantageous over the generative algorithm. In particular, the correlation filter (CF), which is one of the discriminative algorithms, is widely used in object tracking on satellite videos due to its high speed and high precision. Although the algorithm based on deep learning is equally effective, it is disadvantaged by heavy computational workload and slow speed. In some cases, the few and weak features of objects make it inappropriate for neural networks to learn. In general, CF-based algorithms are superior to generative algorithms in accuracy and are superior to deep-learning-based algorithms in speed, which makes them have greater advantages on the satellite videos with larger image sizes and fewer features. For this reason, the CF is applied in this study.
For most of the current CF algorithms, the focus is on ridge regression. By learning the features of the region of interest (ROI), they can classify the object and background to locate the object. When the filter is updated, most algorithms use a single ROI in the current frame and ignore the historical information of the previous frames, which leads to the waste of information. Nevertheless, historical information contains a great deal of the object information and background information required for modeling the object. Especially, background information could be inputted into CF as negative samples, which could improve the tracking performance for the few-features object. For this reason, a historical-model-based tracker for satellite (HMTS) videos is proposed in this study, as illustrated in the flow diagram (see Fig. 1). Historical models (HMs) are applied to update the tracker. However, not all HMs are suitable due to changes in object appearance and background during the tracking process. Hence, a scheme is used to find the HMs that are best suited to the current frame. In addition, since the vehicle is often occluded by obstacles or there are similar objects appearing around the vehicle, a strategy is adopted to detect vehicle state and prevent the drift of the tracker. The peak and kurtosis (PK) of the response map of the CF [50] are employed to monitor the vehicle state. Additionally, Kalman filter (KF) is adopted to correct the tracking result to avoid drift. To sum up, the contributions of this study are as follows: 1) In this article, we designed a novel HMTS videos to improve the tracking performance on small-size objects, thus addressing the trace failure caused by a lack of features.
It improves the acquisition of object information by utilizing historical information, ensures the reliability of tracking, and reduces historical information waste. 2) We proposed an HM evaluation scheme to avoid the impact caused by changes in object appearance and background. The scheme relies on cross-correlation functions to measure the similarity between the response map and the ideal regression targets. HMs are scored by this scheme to identify the best-suited ones for the current frame. 3) We constructed a tracker correction scheme to prevent the tracker drift caused by object occlusion and similar objects appearance. The scheme uses KF to estimate the motion of the object and PK for the response map evaluation. Based on PK, the trajectory of the tracker is corrected by the prediction function of KF. The rest of this article is organized as follows. In Section II, the relevant work is summarized. Section III presents the design of the proposed tracker HMTS. In Section IV, the experiments are detailed, including configurations, experimental design, experimental analysis, and experimental results. Finally, Section V concludes this article.

A. CF-Based Object Tracking
The emergence of CF marks a key milestone in the development of object tracking algorithm. By introducing the Fourier transform method to convert the originally complex matrix operations into calculations in the frequency domain, it accelerates the calculation process. At the same time, CF increases the use of background information, which improves the accuracy of tracking. In the earliest CF algorithm MOSSE [27], the least squares method is used to distinguish between the background and the object. Then, kernel CF is used by CSK [28] and KCF [29] to introduce circulant matrix, which increases the number of samples, thus improving the accuracy of training. Following the KCF, scholars have successively improved the KCF framework to develop many algorithms based on CF. There are three main approaches to improving the CF. The first one is to apply a suitable feature extraction algorithm [34]- [37]. In [36], color names (CNs) are used, whereas in [34], deep convolutional features are used. The second one is to adapt to the object scale change by introducing a scale-adaptive mechanism [38], [39]. The mechanism is effective in increasing the success rate of the tracker. The last one is to improve the structure of the filter [40]- [43], e.g., BACF [41] adds the spatial regularization term, and AutoTrack [43] adds the temporal regularization term.

B. Object Tracking on Satellite Video
The existing object tracking algorithms have been widely used for the tracking of objects on satellite videos. The traditional generative model is applied in [9] and [10]. Besides, CF algorithm is used in [11]- [17]. Moreover, deep learning is adopted in [18]- [21]. At present, traditional generative model is rarely adopted for satellite videos, which is attributed mainly to its low accuracy and slow speed. By contrast, the discriminative model has been more commonly used. A major solution is to improve CF, which can be achieved from three perspectives. The first one is to change the feature extraction algorithm, the second one is to use motion model, and the last one is to introduce tracker state monitoring scheme. To change the feature extraction algorithm, Shao et al. [12] combined LK optical flow method and histogram of oriented gradient (HOG) feature extraction algorithm to extract features, whereas Wu et al. [15] combined Hu CF and median filter to extract features. Due to the rotation invariance of Hu invariant moment, this algorithm performs well in tracking rotation objects. In [17], Gabor filter is employed to extract features, which leads to an excellent performance for the objects with textures. In terms of motion models, KF is frequently used to predict the position of the moving object [14]- [16], which improves the robustness of the algorithm. As for the tracker state monitoring scheme, Wang et al. [17] proposed the tracking status monitoring indicators (TCMIs) based on the Bayesian framework. If the object is occluded, TCMI can be relied on to guide the tracker, with updating terminated to prevent model drift. In addition, Xuan et al. [16] used the peak value of response map to monitor tracker states and guide the updating of the tracker.

C. Benchmark Dataset Based on Satellite Video
In addition to the improvement of algorithm, the benchmark dataset based on satellite video has also been proposed to make the verification of algorithms more reliable. Yin et al. [48] constructed a large-scale satellite video dataset with a wide range of annotations. At the same time, a benchmark was proposed to evaluate algorithms for their performance, e.g., the multiobject tracking algorithm and the single-object tracking algorithm on satellite videos. Zhao et al. [49] put forward another densely annotated satellite video dataset, which is purposed to evaluate the single-object tracking algorithm.

III. PROPOSED METHOD
In this section, it will be explained how to apply the HM, cross-correlation function, PK, and KF for CF-based object tracking. First, a tracker HMTS based on history model is proposed to track the object. Then, an HM evaluation scheme is introduced to validate each HM, so as to choose the HMs that are sensitive to the object in the current frame. Finally, a scheme is introduced to correct the tracker by using KF and PK.

A. Overall Architecture
In our tracker, kernel CF (KCF) [29] is taken as the baseline. Suppose there is a vectorized feature map X, X ∈ R D . KCF solves the following ridge regression problem: where w represents a filter; X(i) refers to the ith circular shift of X; y(i) indicates the regression target for X(i); λw 2 2 denotes a shrinkage penalty, which controls overfitting; λ is referred to as the penalty parameter; and the superscript T refers to transposition.
By mapping the linear inputs X(i) to nonlinear space ϕ(X(i)), the filter w can be expressed as a linear combination of ϕ(X(i)). In nonlinear space, as opposed to linear space, the feature map can be simply split into the object and background. Additionally, the kernel trick can be used to express w T X(i) as where K(·) refers to the kernel function (e.g., radial basis function) used to compute the dot-product of nonlinear space; K XX represents the kernel matrix with elements K XX (i, j) = K(ϕ(X(j)), ϕ(X(i))); the ith rows of K XX are referred as ; and α is the vector of coefficients α(i) (i = 1, · · · , D). Similarly, w 2 2 can be expressed as Thus, (1) can be reformulated as The solution of (4) is expressed as where y is defined as y = [y(1), · · · , y(D)] T . I refers to a D × D identity matrix. The filter w is replaced by α. K XX + λI can be reformulated as circulant matrix C(K XX + λδ)(δ = [1, 0, · · · , 0] ). K XX refers to the first row of K XX . By using the discrete Fourier transform (DFT), (5) can be expressed asα where the hat denotes the DFT of a vector. When KCF filter is used, the filterα T −1 is updated by (6) refers to a target-centered patch of last frame T − 1 whose target is located in the center. Suppose that a new patch x T is sampled in the current frame T , and the location of the target is detected by where F −1 denotes the IDFT operator.Kz T x T is computed by x T and sample modelz T . The position of the target is determined by the position of the maximum value of R T . Typically, the classical KCF does not use the filterα derived from (6) directly when an object is detected. Instead, linear interpolation is performed to update the filterα T as follows: where β represents the coefficient of interpolation. Similarly, the sample modelz T is also derived from linear interpolation where γ is also referred to as the coefficient of interpolation. By using (8) and (9), the tracker can be updated at a fixed learning rate.
Updating the filter with a fixed proportion can make the filter retain part of the historical information and avoid the interference from new information. However, the fixed learning rate also causes the tracker to treat all the historical information indiscriminately, which causes the waste of historical information. Meanwhile, since the initial value of the tracker is obtained from the first frame, the information contained in the first frame will be exponentially attenuated when the tracker is updated [see (8) and (9)]. In case of a low learning rate,α 1 and z 1 may account for a larger proportion in the filter and sample model, respectively, in the early stage of tracking, which has a significant impact on the tracker. Therefore, the tracker is capable to recognize the object, which is similar to the first frame. Hence, when the feature of the object or the background changes, the tracking may fail.
To make full use of historical information, our tracker retains the filter and the target-centered patch for each frame as HMs , where s i represents a score of each HM. As for the approach to score learning, it will be detailed in Section III-B. Suppose that a new patch x T is sampled in the current frame T , then the filterα T is calculated bŷ The sample modelz T is calculated bȳ where linear interpolation is still performed to update our filter and sample model at the learning rate β and γ, respectively. At each frame,α T −1 andż T −1 are calculated. They both consist of the HMs. The score s i (i = 1, · · · , T − 1) leads toα i (i = 1, · · · , T − 1), which is more sensitive to x T accounting for a larger proportion inα T −1 . In the same way, s i (i = 1, · · · , T − 1) leads to z i (i = 1, · · · , T − 1), which is more similar as x T accounts for a larger proportion iṅ z T −1 . In these operations,α T −1 andż T −1 classify the object and background in x T with high precision. In the meantime, it makes full use of the HMs, with the historical information treated differently. In addition, when the feature of the object or background changes, the scores can be used to reduce the impact of the initial value by adjusting the proportion of the initial value dynamically, because the scores will change in each frame. After the filterα T and the sample modelz T are obtained, the response of x T can be calculated by The position of the maximum value of R T is that of the object.

B. HM Evaluation Scheme
In Section III-A, it is proposed to update the tracker by using HMs. In order to improve the capability of the tracker to distinguish between the object and the background, it is necessary to improve the sensitivity of the selected filters to the object. Meanwhile, the selected sample models must be similar to the object. In this section, an HM evaluation scheme will be introduced to obtain the appropriate filters and sample models by scoring the HM.
The cross-correlation function is applied to measure the correlation of two signals in signal analysis. The discrete cross-correlation function is expressed as follows: where φ 1 and φ 2 represent signals. S(τ ) is referred to as the similarity of two signals when one of them is applied to a τ -step discrete circular shift. The higher the value of S(τ ), the higher the similarity of the two signals. The maximum value of S(τ ) (τ ∈ (−∞, +∞)) can be taken to represent global similarity. Meanwhile, the DFT is used to define the discrete cross-correlation function as where S represents the vector of S(τ ) (τ ∈ (−∞, +∞)) and the superscript H refers to conjugate transposition.
By taking the regression targets and the response as signals, the global similarity of them can be calculated. In theory, the better the performance of the tracker, the higher the global similarity between the response and the regression targets. According to this theory, an HM evaluation scheme is designed to score the HMs.
In frame T , after position detection is completed, a new targetcentered patch z T can be obtained and a new filterα T can be calculated by (6). HMs are updated to . In order to obtain s i (i = 1, · · · , T ), each HM is combined as a tracker in the first place. Subsequently, the responses can be received on z T Then, the discrete cross-correlation function is applied on R i (i = 1, · · · , T ) and regression targets y The score s i (i = 1, · · · , T ) for the ith HM in frame T is given by where max(·) denotes the maximum of the vector; Rank(max(S i )) indicates the index of max(S i ) in the set {max(S i )} T i = 1 , which is ranked in descending order; and θ refers to the threshold.
By using this scheme, the HM, which performs better in distinguishing between the object and background in z T , is given a higher score. Meanwhile, since the HM with low global similarity is irrelevant to confirming the object location, the scores of this kind of HMs are set to 0. All scores are used to update the tracker [see (10) and (11)].
Under this scheme, the historical information can be applied flexibly. Besides, the filter and the sample model are made more robust for object tracking.

C. Antidrift Tracker Correction Scheme
It is a challenge to prevent tracker drift for object tracking. In general, tracker drift occurs when object occlusion happens and similar objects appear. In these cases, a significant deviation can arise between the maximum position of the response map and the ground truth of the object. Besides, the response map fluctuates significantly or shows multiple peaks. Therefore, it is necessary to monitor the tracking state and correct trajectory in the process of tracking. In this section, it will be explained how the KF and the PK of the response map can be applied to correct the tracker.
Assume that the ground truth of the vehicle is represented by {p t } T t = 1 . The motion of the vehicle can be expressed as dynamic equation p t = f t (p t−1 , u t ). In the process of tracking, the position detected by our tracker can be expressed as {d t } T t = 1 . The relationship between detected position and ground truth is expressed as measurement equation d t = h t (p t , v t ), where both v t and u t are the noises that are independent and uniformly distributed. If u t and v t are Gaussian while dynamic equation and measurement equation are linear, KF is applicable to estimate the motion of the vehicle [51].
The dynamic equation and measurement equation of KF are expressed as where A denotes the state transition matrix and B denotes the measurement matrix. The aim of KF is two-fold. One is correction and the other is prediction. Correction means correcting the measured position, in our case is the detected position d t , to an estimate of the ground truthp t . Prediction means usingp t to predict the ground truthp t+1 in the next frame. Herein, the hat denotes the estimated value, and the bar¯denotes the predicted value. The way of correction is defined as The way of prediction is defined as where G t denotes the Kalman gain; C t indicates the covariance matrix of error between ground truth and estimation (p t −p t ); C t represents the covariance matrix of error between ground truth and prediction (p t −p t ); V t refers to the covariance matrix of v t ; U t stands for the covariance matrix of u t ; and I denotes an identity matrix.
In the current frame, the detected position d t is inputted into the KF to obtain a predicted positionp t+1 for the next frame. If the tracker is found to have drifted in the next frame, the detected position d t+1 will be replaced by the predicted positionp t+1 . To measure the reliability of tracker, the high-confidence criterion proposed by Han et al. is adopted [50]. It relies on PK of the response map to detect whether the tracker has drifted. The two measurement thresholds are defined as where P tr and K tr represent peak threshold and kurtosis threshold derived from the historical average values with certain ratio δ 1 and δ 1 , respectively; max(R i ) refers to the maximum of response that denotes the peak value; and BK i denotes the kurtosis value. Ideally, the response map resembles a Gaussian distribution, with a single peak and smooth surroundings. When the drift occurs, the response map exhibits multiple peaks and fluctuates significantly, with the PK in decline (see Fig. 2). Therefore, if the PK of the response map at the current frame reaches above the PK threshold, it can be considered that the response map at the current frame is reliable and no drift occurs to the tracker. If one indicator falls below the threshold, the position detected by the tracker is unreliable and it is necessary to replace it with the position predicted by KF. Meanwhile, the correction of KF and the update of tracker will be terminated until the tracker returns to normal. Fig. 2 shows the effect of antidrift tracker correction scheme. When PK decreases, the tracker with the scheme ceases to be updated and KF predicts the position of the object. Since the incorrect information is not accepted by tracker, the tracking returns to normal when the object is no longer obscured. On the contrary, the tracker without the scheme keeps updating and cannot find the position of the object. When the object appears again, the tracker becomes ineffective. This scheme is effective in avoiding tracker drift through the combination of KF and PK. Under the strategy of updating termination, the tracker and KF avoid receiving false information, which ensures the reliability of the tracker and the accuracy of KF. When the tracker stops working, the prediction function of KF can be used to predict the trajectory reasonably and ensure the continuity of the trajectory. The full algorithm is detailed in Algorithm I.

IV. EXPERIMENTAL RESULTS AND ANALYSIS
A. Experimental Setups 1) SatSOT Dataset: As a densely annotated satellite video single-object tracking benchmark dataset, the SatSOT dataset [49] covers four categories of objects, including vehicle, train, plane, and ship. Herein, the videos of vehicle are used, involving 65 objects, 19 948 frames, and 9 challenges. The object is annotated by a bounding box and recorded with upper-left coordinate, the width, and height of the bounding box. Table I lists the description of nine challenging attributes given by the article [49].
2) Evaluation Metric: The evaluation metric used in the experiments is one-pass evaluation (OPE) [1], [2], with two evaluation indicators involved to perform quantitative evaluation, i.e., precision and successful rate. By plotting precision plot and success plot, the performance of algorithms can be visualized.
Precision is an indicator used to evaluate the error between the detected position of the tracker and the ground truth. The location error of the tracker can be obtained by calculating the Euclidean distance between the center point of the tracking box and the center point of the ground truth. Typically, there is a threshold of location error (e.g., 20), and the percentage of frames whose location error is smaller than this threshold is referred to as the tracker's precision. In the meantime, precision plot is created. The horizontal axis of the precision plot is the location error threshold, and the vertical axis is the precision. Success rate is an indicator used to evaluate the overlap between the tracking box and the ground truth. Suppose that the tracking box is R t and the ground truth is R 0 , then the overlap of tracker is expressed as S = |R t ∩ R 0 |/|R t ∪ R 0 | , where |R t ∩ R 0 | refers to the number of pixels at the intersection of R t and R 0 , and |R t ∪ R 0 | refers to the number of pixels at the union of R t and R 0 . The success rate is expressed as the percentage of frames whose overlap exceeds the overlap threshold. Also, the success plot is created. The horizontal axis of the success plot represents the overlap threshold, and the vertical axis indicates the success rate.
Based on this evaluation metric, the location error threshold is set to 5 pixels to sort the algorithms on precision. As for success rate, the average overlap score (AOE) is introduced as the basis for sorting, which is an average of overlap. Hence, all algorithms are evaluated from two perspectives.
3) Implementation Details: Our experiments are performed on a PC with 2.40-GHz CPU, and our tracker is implemented in MATLAB 2018a. The sample patch is 3.5 times the size of the object's ground truth. Besides, the standard deviation of the Gaussian surface, which is used to fit the regression targets, is 0.125. By comparison, the standard deviation of the kernel function is 0.6. The penalty parameter λ is 1e-4, whereas the learning rates β and γ are both 0.02.
The features extracted in the tracker are grayscale feature and CN [52]. Additionally, the principal component analysis is conducted to simplify the features to ensure the computational speed of the tracker. The features of the vehicle on satellite videos are weak and the size of the object is tiny. Normally, the texture feature is not suitable for representing the vehicle, e.g., HOG and local binary patterns [53]. This is mainly because these features must be extracted from the minimal cells, e.g., 4 pixels. Whereas, the object is too tiny to be divided into cells, so that it may make the loss of information and be meaningless. As stated by Shao et al. [12], the texture feature is ineffective for object tracking on satellite videos. As a result, simple features, such as grayscale and CN, are much more reliable.

B. Sensitivity Analysis and Varying Criteria Comparison
In this section, we first conduct a sensitivity analysis to determine the most effective usage of the HM evaluation scheme. Then, we conduct varying criteria comparative experiments to compare PK with other high-confidence criteria.
In the sensitivity analysis, we vary the start frame ϑ of the HM evaluation scheme and test the sensitivity of ϑ. Besides, we test the sensitivity of θ in (17). When test ϑ, θ and the other parameters in the tracker are fixed. When test θ, ϑ is set with the optimal one received from the last test and the other parameters in the tracker are also fixed. Table II shows the OPE results of two tests. The tracker works best when ϑ = 10 and θ = 70.
In the varying criteria comparative experiments, we compare the PK with other high-confidence criteria, including the average peak-to-correlation energy (APCE) [54], the peak, and the kurtosis. We designed four trackers, and Table III lists the OPE results of them. HMTS_APCE uses APCE as the high-confidence criterion. HMTS_PEAK and HMTS_KURTOSIS employ the peak and the kurtosis of the response map as the criterion individually. They are compared with our tracker HMTS, which employs both the peak and the kurtosis of the response map as the criterion. From the OPE results, it can be found that high-confidence criterion can indeed improve the precision and robustness, and the PK is more suitable for our application.

C. Ablation Experiments
As mentioned earlier, our tracker treats KCF as the baseline, with the HM used to improve the update strategy of KCF. At the same time, KF is introduced for motion estimation, and the PK of the response map is used to detect whether the tracker drifts. To verify whether the HM, KF, and PK are effective in improving tracker performance, ablation experiments were conducted.
We design four trackers. Table IV lists the composition of them. Among these trackers, KCF_CN uses only CF, HMTS_NKFPK does not rely on KF and PK to correct the trajectory, and HMTS_NHM does not require HM for updating. They are compared with our tracker HMTS. The OPE results of these four groups of trackers are shown in columns 5 and 6 of Table IV.
From the performance of HMTS_NHM, it can be found out that the introduction of KF and PK improves the precision and success rate of the algorithm, which is because KF and PK terminate the tracker updating and predict the trajectory of the object when the tracker drifts. From the performance of HMTS_NKFPK, it can be discovered that using HM plays no role in improving the precision and success rate, which is mainly because the baseline tracker tends to drift when the object is occluded and similar objects appear. Since the antidrift scheme is not used, the tracker does not stop updating. At the same time, the HM evaluation scheme scores HMs based on the drifting frame, and these HMs render new filter unreliable. Consequently, the tracker is less likely to return to normal, which reduces the performance of HMTS_NKFPK without KF and PK control.
Our tracker HMTS combines HM, KF, and PK. Also, the HM evaluation scheme keeps scoring HMs based on the correct sample. In this way, the updated filter will become more reliable. Hence, our tracker can take full advantage of HM.
In summary, the introduction of HM, KF, and PK is verified as effective. Our tracker shows a significant improvement over the baseline on the precision and success rate.
1) Overall Results: The overall OPE results are shown in Fig. 3. Our tracker HMTS ranks top on both indicators. The precision of HMTS is 0.7253, and the AOE of HMTS is 0.4344. Compared with the classical CF-based tracker KCF, our tracker HMTS improves the precision and AOE by 0.05 and 0.0331, respectively.
Meanwhile, as shown in the ranking list, CF-based trackers, e.g., KCF, MCCT, ECO-HC, STAPLE, and CFME, are advantageous over end-to-end networks based tracker, e.g., SiamFC and transformer-based tracker, e.g., STARK. That is to say, CF-based trackers are more suitable for vehicle tracking on satellite videos.
2) Attributes-Based Results: Overall, HMTS has excellent performance on most of challenges. Since the antidrift tracker correction scheme are effective on solving the object occlusion  Tables V and  VI). When these challenges arise, the antidrift tracker correction scheme plays a dominant role in preventing drift. When these challenges are addressed, the HM supports the tracker in returning to normal rapidly and precisely.
Meanwhile, tiny object is the object with a small size. There are few features on itself, so it is better to use background information as negative sample. The HMs make HMTS collect rich background information, which is favorable to tracking tiny object. Hence, our tracker has an excellent performance on TO (see Tables V and VI).

E. Qualitative Comparison
In order to better demonstrate the performance of each tracker, qualitative experiments are carried out, with the tracking box of each tracker at each frame displays the video frame. Partial performance is shown in Fig. 4. 1) Qualitative Analysis: From video car_01 and car_24, it can be seen that HMTS is effective in solving the occlusion issue while most of trackers drift when hitting the obstacle. In video car_01, the object is occluded at frame 173, only CFME and HMTS overcome this challenge. The reason is CFME and HMTS use the motion estimation scheme, whereas other trackers do not perceive the disappearance of the object and collect false information when object is occluded. Meanwhile, the time of occlusion is long, and the object moves a long distance. Hence, when the object occurs again, the object is not in the ROI of them, and it is difficult for them to detect the object after collecting so much false information. On the contrary, the antidrift tracker correction scheme makes HMTS stop collecting information and predict the object's position when object is being occluded. Even with long occlusions, the object will be in our ROI. This scheme not only makes HMTS reliable but also leads to a continuous trajectory.
Meanwhile, from video car_08 and car_26, it can be found out that the performance of our tracker is superior in case of SOB, TO, IV, etc. Although the feature of object itself is insufficient to identify it, HM is beneficial for HMTS to collect the background information. As a result, HMTS can ensure accurate tracking.
2) Inferior Cases: Although HMTS is verified as the optimal tracker, there are some inferior cases within the tracking process (see Fig. 5).
HMTS does not use the scale-adaptive mechanism, which leads to the tracking box cannot be adjusted along with the change of the object size, as shown in Fig. 5(a). It results in a lower success rate. Besides, the prediction function of KF is limited. If the velocity and orientation of the object change during occlusion, the KF prediction is likely to be unreliable and may result in drift, as illustrated in Fig. 5(b).

V. CONCLUSION
In this article, a tracker (HMTS) is proposed for vehicle tracking on satellite videos. In addition, an HM evaluation scheme is proposed for the evaluation of HMs by using cross-correlation function. Besides, to prevent tracker drift, an antidrift tracker correction scheme is proposed. Our tracker is compared with 16 state-of-the-art trackers. As demonstrated by the quantitative and qualitative experiments, HMTS produces an excellent performance. Additionally, sensitivity analysis, varying criteria comparative experiments, and ablation experiments reveal that HM, KF, and PK are effective in improving the accuracy and robustness of the algorithm.