ML-Based Forward Symbol Timing Offset Estimation for Burst Signals

In order to ensure the performance of burst signal demodulation, forward symbol timing offset estimation algorithm is usually used for symbol timing recovery. Aiming at the shortcomings of common forward symbol timing estimation algorithms, a new algorithm based on maximum likelihood estimation joint trigonometric polynomial interpolation is proposed in this article, which is suitable for the data of four samples per symbol and can resist large frequency offset. Hereafter, in order to make it more suitable for engineering practice, optimize it into an improved data-aided (DA) forward symbol timing offset estimation algorithm with multiple characteristics, e.g., moderate frequency offset capture range, insensitive to shaping coefficient, relatively low complexity, excellent estimation performance, flexible algorithm structure and less sample data required for calculation. The simulation results show that the performance of the improved algorithm in this article can approach the modified cramer-rao bound (MCRB) within the frequency offset capture range, under the conditions of low signal to noise ratio (SNR) and small shaping coefficient.


I. INTRODUCTION
Burst signals are widely used in various communication fields such as satellite communication and shortwave communication due to its high transmission rate, strong anti-interception ability and high identification difficulty. The common phaselocked structure timing offset estimation algorithm has the ''Hangup effect'' [1], which makes the loop lock-in time uncertain. Therefore, in burst communication, although the phase-locked algorithm has very high timing estimation accuracy [2] after lock-in, the receiver generally still uses the forward symbol timing estimation algorithm with slightly worse estimation accuracy but faster estimation speed for timing offset recovery. The current common forward timing offset estimation algorithms can be roughly divided into two categories [3]: data-aided (DA) algorithms and non-dataaided (NDA) algorithms.
Among them, the NDA algorithm has received widespread attention due to its simple algorithm structure and no need to insert auxiliary information into the transmission data to assist timing offset estimation. The current common NDA algorithms include the square algorithm [4] which is very influential and suitable for the data of four samples per The associate editor coordinating the review of this manuscript and approving it for publication was Wu-Shiung Feng.
symbol. Then, a variety of nonlinear algorithms obtained by further expansion of the square algorithm [5], including absolute-value nonlinearity (AVN ), fourth-law nonlinearity (FLN ), square-law nonlinearity (SLN ), logarithmic nonlinearity (LOGN ) algorithms, etc. More specifically, SLN and FLN have relatively poor performance, AVN and LOGN have comparatively high estimation accuracy and are relatively stable, but LOGN needs to know the signal to noise ratio (SNR) of data samples, which has high limi-tations in actual use. Therefore, in comparison, AVN has the relatively high practicality. The common NDA algorithms that are suitable for the data of two samples per symbol include Lee [6], Wang et al. [7], and Zhu et al. [8] algorithms, etc. Among them, Lee is a biased estimation, Wang is an optimized derivation which improves Lee into an unbiased estimation. Though Zhu has slightly better performance at low SNR, but the received signal needs to be further processed before calculation and the structure of Zhu is more complicated than the previous two algorithms. Therefore, in comparison, Wang has the relatively high practicality. However, the above NDA algorithms all have the following defects: (i) In the case of low SNR and small shaping coefficient, the estimation performance is always very poor; (ii) A large data sample size is required to obtain high-precision timing offset estimation results. Although the current common DA algorithms have relatively excellent estimation performance than NDA algorithms, they receive little attention due to their complex structure, no resistance to frequency offset interference and need to insert auxiliary information into the transmission data to assist timing estimation. However, in burst communication, the sender generally inserts pilot sequence of known structure into the data, so that the receiver can distinguish burst signal by burst detection.
Therefore, the DA algorithm can also use the pilot sequence as auxiliary information for timing offset estimate without consuming additional bandwidth to insert auxiliary information. The current common DA algorithms are always based on the idea of searching the maximum value of the likelihood function [9], but the computational complexity of the search process to obtain high-precision estimation results will be very high. Although there is already a simplified DA timing algorithm (hereafter, it's abbreviated as SML) with lower computational complexity [10], but its shortcoming of inability to resist the influence of frequency offset leads to excessively high requirements on the frequency offset preestimation capability of the signal receiver, which will lead to great restrictions on the use of engineering practice.
Therefore, aiming at the various shortcomings of the aforementioned current common forward symbol timing estimation algorithms, this article proposes a new forward symbol timing estimation algorithm based on maximum likelihood estimation joint triangular polynomial interpolation Furthermore, in order to make it more suitable for engineering practice, improve it into a new form which is the final algorithm in this article (the ''IML 3 '' described below).
The simulation results show that the final algorithm has the following performance advantages: 1) Appropriate frequency offset adaptability. Highperformance symbol timing estimation can be performed within the frequency offset capture range; 2) Relatively low algorithm complexity. Under the premise of only a small loss of timing offset estimation ac-curacy, the likelihood function and trigonometric polynomial interpolation function are simplified in multiple stages, which significantly reduces the computational complexity of the algorithm; 3) Excellent estimation performance. Under the conditions of low SNR and small shaping coefficient, it has performance characteristics close to the modified cramer-rao bound (MCRB) [11]. Even if the self-noise influence makes the performance curve gradually move away from MCRB at high SNR, it already has a high estimation accuracy at this time. Therefore, the algorithm performance is very consistent with the current development trend of high-speed and low-power transmission of burst signals; 4) Flexible algorithm structure. The algorithm complexity is proportional to the frequency offset capture range. In actual use, it can be flexible choose whether to use a structure with lower computational complexity or a structure with a broader frequency offset capture range; 5) Burst detection joint symbol timing estimation. The likelihood function can be regarded as an improved burst detection algorithm, and the timing estimation result is obtained by trigonometric polynomial interpolation using this value. That is to say, the algorithm can complete burst detection and symbol timing estimation during the estimation process; 6) Small data sample size required for calculation. It only required thirty-two symbols or even shorter length of the pilot sequence to have the above-mentioned multiple performance advantages.

II. ALGORITHM MODEL A. MAXIMUM LIKELIHOOD ESTIMATION
Suppose the expression of the baseband signal received by the receiver is: where {c i } is an independent and identically distributed data symbol sequence, v, θ and τ are respectively the frequency offset, phase offset and symbol timing offset to be estimated, T is the symbol period, n c (t) is the additive white gaussian noise, h(t) is the baseband shaping function that satisfies the first criterion of Nyquist, such as the ''Better Than'' shaping function with excellent anti intersymbol interference characteristics [12] and other excellent methods. But in fact, the square root raised cosine rolloff function is still the most commonly used. The joint log-likelihood function of frequency offset, phase offset and timing offset can be expressed as: Herein, N is the total length of the pilot sequence. For solving the maximum likelihood estimation, (3) and (2) have the same and unique solution. Therefore, when solving the maximum likelihood estimation, (2) can be rewritten as (3) without affecting the estimation result.
In a nutshell, assuming that α k, , then the likelihood function can be expressed as: where 1 n N − 1. However, owing to the structure of the above formula is too complex and difficult to solve, it needs to be simplified. Ignore the influence of teeny interference terms and constant terms on the likelihood function, then the above formula can be simplified to: Herein, Re(x) is the real part of x. Eliminate the influence of frequency offset on search the maximum value of likelihood function, then the following formula can be obtained: Since the above formula is only a simple accumulation of the conjugate differential correlation results, therefore the above formula has weak anti-interference ability. So that if it is used for timing recovery, the algorithm's anti-noise performance will also be poor. Therefore, in order to enhance the channel adaptability of (5), author chooses to optimize the algorithm by ''multi-level overlay smoothing'', the improved maximum likelihood function can finally be expressed as follows [13] (hereafter, it's abbreviated as IML 1 ), which is actually an improved burst detection algorithm based on the principle of conjugate differential correlation that can resist large frequency offset.
In theory, finding the maximum value of the log-likelihood function shown in the above formula will correspondingly obtain accurate timing offset information. And because the above formula is insensitive to frequency offset, therefore, it can estimate the timing offset with high-performance for signals with large frequency offset. However, the frequency offset range of engineering practice data usually not very broad. Thus, in actual practice, many complex calculations performed by the above formula to resist large frequency offset are often wasted. In order to simplify the calculation, one strategy is to convert the correlation calculations into discrete fourier transformation (DFT) and inverse discrete fourier transform (IDFT) calculations [14], but this method is only suitable for orthogonal frequency division multiplexing (OFDM) signals.
Another strategy is to use the idea of packet delay conjugate differential correlation to simplify (5) to a general algorithm with moderate frequency offset capture range, relatively low complexity and acceptable accuracy loss compared with the above formula within the frequency offset capture range. The idea of improvement is the same as rewrite (2) to (3), as long as the maximum likelihood estimation is guaranteed to have the same and unique solution as (3). Divide the leading sequence into N /L groups, i.e., each group contains L data samples, then (3) can be rewritten as follows: Herein, 1 n N L − 1, α is consistent with that shown in (3). The above formula is also complicated and difficult to solve, it's needs to be simplified and the simplification idea is consistent with (4). Assuming that u k,m = c * (k+mL) r((k + mL)T +τ ), then the likelihood function can be expressed as: It can be seen that when L = 1, (8) has the same structure as (4), otherwise, when Lṽ is in the proper range, k=0 u k,m is insensitive to frequency offsetṽ, and the influence of frequency offset can be temporarily ignored, then (8) can be simplified to the following form: Obviously, frequency offset capture range of (9) is inversely proportional to L and, the smaller the L, the less sensitive it is to the influence of frequency offset.
When L and the frequency offset v are both small, as shown in Fig.2, assuming that γ = 1−cos(2πLṽT ) (1−cos(2πṽT ))L 2 (γ is symmetric aboutṽ = 0 and the normalization standard of v is the signal bandwidth in this article), then γ can be ignored due to it can be approximated as a constant. Therefore, eliminate the influence of frequency offset on likelihood function, then the formula can be expressed as below: To expand the capture range of the frequency offset algorithm and enhance the channel adaptability of the algorithm, as proformed in (6), use ''multi-level overlay smoothing'' to VOLUME 8, 2020 improve it. Then (10) can be extended to (11) (hereafter, it's abbreviated as IML 2 ), which is actually an improved burst detection algorithm based on the principle of packet delay conjugate differential correlation [15].  (1)), the solution of (11) has the characteristics of that ψ = |ψ| e j(2πnvT ) , which has shown in (12). And η 0 increases with the increase of SNR and pilot sequence length, and for the general case η 0 ≥ 1/2. Even if ψ is outside the range, this property will gradually become distorted rather than rapidly deteriorate. Moreover, the frequency offset v has slow change characteristics for the particular signal, within the observation interval N , v can be regarded as a fixed value and n also has only |ψ Re | + |ψ Im | |ψ| = | cos(2πnvT )| + | sin(2πnvT )| (12) where ψ Re is the real part of ψ and ψ Im is the imaginary part of ψ, and the minimum value of (12) is 1.
As shown in Fig.3, the result of (11) has the function value similar to the ''sinc'' function, that is why it is very suitable as a burst detection algorithm. Therefore, if (11) within the range of (τ − η 1 T /2, τ + η 1 T /2) (η 0 η 1 2), due to the reduced correlation between signal and pilot sequence, the peak value of burst detection has decayed sharply to a  pretty small value. The self-noise effect caused by taking the value produced by the above replacement operation into the subsequent trigonometric polynomial interpolation [16] to estimate the timing offset may be counteract with the selfnoise effect, which is caused by the simplification of the likelihood function and the simplification of the trigonometric polynomial. In addition, this feature is also reflected in the subsequent performance simulations of this article. As a consequence, (11) can be simplified as follows without basically losing the accuracy of timing offset estimation (hereafter, it's abbreviated as IML 3 ). To sum up, the function value still similar to the ''sinc'' function and, as shown in Table.1, the complexity of the algorithm is reduced again.

B. TRIGONOMETRIC POLYNOMIAL INTERPOLATION
As mentioned in the previous section, when the likelihood function achieves maximum value, the maximum burst detection peak value and the accurate estimation value ofτ will be obtained. However, by observe the likelihood function derive from the signal that is four samples per symbol, the maximum detection peak corresponding to the signal cannot be directly obtained. Moreover, if the maximum value of the likelihood function is obtained by point-by-point search, the computational complexity is extremely high. Therefore, in order to reduce the computational complexity, an efficient mode is to use the idea of interpolation to determine the maximum value of likelihood function. There are various interpolation methods such as Farrow interpolation, Newton interpolation, trigonometric polynomial interpolation, etc. Trigonometric polynomial interpolation is suitable for the new algorithm, because it has been demonstrated that this approach can produces a simple implementation structure, reduces computational delay and, for practical signals, can improve the interpolation performance.
Although the algorithm structure of using four samples to assist to perform trigonometric polynomial interpolation to estimate the timing offset is relatively simple, the estimation performance can still be optimized properly. However, for a signal with four samples per symbol, compared with four samples auxiliary interpolation, the computational complexity of the sixteen samples auxiliary interpolation increases more, and the reduction of the correlation between the farther symbol and the pilot sequence will lead to the reduction of the reliability of its interpolation. Therefore, the algorithm in this article chooses to use eight samples auxiliary timing offset estimation. Compared with the common interpolation method based on four samples, it can significantly improve the precision of timing offset estimation and the channel adaptability in the case of only waste few computational complexity. The specific formula is as follows: where 0 ≤ µ < 1, T is the symbol period, N 0 is the number of interpolation symbols, P is the oversampling multiple of signal, (µT /P) is the value of likelihood function, which is actually the burst detection peak value corresponding to each sample in engineering practice, W kn N 0 = e −j2πkn/N 0 , and the coefficient c k is actually the DFT of N 0 points: Herein, k = − N 0 2 + 1, · · · , N 0 2 . According to the spectral characteristic of ''sinc'' function, the influence of direct current component and harmonics above second order on the result can be ignored. Hence, (14) can be rewritten as: It can be seen from the above analysis that the algorithm in this article chooses to use eight samples to assist interpolation for data with four samples per symbol after weighing the limits of algorithm performance and complexity, i.e. N 0 = 8, P = 4. Therefore, the final timing offset estimation formula can be expressed as (using the result of IML 1 as an interpolation sample for symbol timing estimation, then the estimation process can be abbreviated as IMLT 1 . Similarly, the corresponding estimation process of IML 2 can be abbreviated as IMLT 2 and the corresponding estimation process of IML 3 can be abbreviated as IMLT 3 ):

III. SIMULATION EXPERIMENTS
Owing to the frequency offset is positive or negative won't affect the performance of the algorithms that are mentioned above. Therefore, the following simulation diagrams are obtained by using data with frequency offsetṽ ≥ 0 for simulation. Simulation 1: Performance comparison of IMLT 1 , IMLT 2 and IMLT 3 . The ''simulation environment'' of Si-mulation 1 is to perform 10, 000 times Monte Carlo simulation using the data of four samples per symbol, quadrature phase shift keying (QPSK) modulation, multiple frequency offsets, arbitrary phase offset, rolloff coefficient of 0.15, timing offset of 0.1875T , total pilot sequence length of sixty-four symbols which is divided into thirty-two groups, etc. VOLUME 8, 2020 As the simulation shown in Fig.4(a) and Fig.4(b), the simulation results show that, compared with IMLT 1 , although IMLT 2 and IMLT 3 have a moderate frequency offset sensitivity, but their computational complexity is rela-tively lower (as shown in Table.1) and they have sufficiently high estimation accuracy in the frequency offset capture range which can meet the actual use requirements. Among them, IMLT 3 is a further simplification of IMLT 2 . It has a symbol timing estimation accuracy similar to that of IMLT 2 while significantly reducing the computational complexity. Moreover, under the conditions of in the frequency offset capture range, small shaping coefficient and low SNR, both of IMLT 2 and IMLT 3 can achieve performance close to MCRB. Although multiple simplifications in the derivation of the likelihood function and the interpolation of the trigonometric polynomial have lead to the effect of self-noise, this effect is only gradually significant under the condition of high SNR. But when the SNR is high, even if the timing offset estimation performance is partly lost, there is still a high estimation accuracy. Therefore, for the receiver, IML 3 can be used to efficiently replace IMLT 1 and IMLT 2 in most cases.
Simulation 2: Performance comparison of IMLT 3 under different timing offset. As shown in Fig.5, it is a performance simulation that is using multiple pieces of data with no frequency offset, various different timing offset and other conditions consistent with the ''simulation environment'' of Simulation 1.
Obviously, under the condition of fixed SNR, IMLT 3 has basically the same high-performance timing offset estimation results for different timing offset and it can achieve performance close to MCRB when the SNR is low. As the mentioned above, although in the case of high SNR, the influence of self-noise becomes more pronounced as the SNR increases, but the estimation accuracy still slowly improves with the increase of the SNR. Therefore, regardless of the SNR, IMLT 3 is always a high-performance timing offset estimation algorithm that can satisfy engineering practice.
Simulation 3: Simulation to verify the anti-frequency offset performance of IMLT 3 . The simulation conditions of Simulation 3 are consistent with the ''simulation environment'' of Simulation 1, except that the length of the pilot sequence and the number of groups are indeterminate.
As shown in Fig.6(a) and Fig.6(b), the number of groups are sixteen but the total length of the pilot sequence are thirty-two symbols and sixty-four symbols respectively. It is observed that although both of them have the same number of groups, but each group contains two and four symbols respectively, resulting in the former's anti-frequency offset performance almost twice that of the latter. The former can has high-performance timing offset estimation results when the normalized frequency offset is less than about 0.12, while the latter can only has high-performance timing offset estimation results when the normalized frequency offset is less than about 0.06. It can also see that although in the above frequency offset capture range, both of them are less affected by frequency offset, but when the normalized frequency offset exceeds the capture range, the estimation accuracy of the two are gradually affected by the frequency offset.
Compare Fig.6(a) and Fig.6(c), although each group contains only two symbols, but due to the longer observation length of the latter and the latter has more series (number of groups) when using ''multi-level overlay smoothing'', so that the latter's anti-frequency offset performance is slightly better than the former. In short, ''multi-level overlay smoothing'' can enhance the anti-frequency offset performance and channel adaptability of the algorithm, and the more the series, the more obvious the optimization effect. But overall, frequency offset sensitivity of IMLT 3 is still proportional to L (the number of samples in each group). The less elements in each group, the stronger of ability to resist frequency offset and the higher the algorithm complexity. Constrained by the optimal receiver bandwidth, the performance of IMLT 3 can already meet the requirements of engineering practice for the accuracy of timing offset estimation in most cases.
Simulation 4: Performance analysis and comparison of various forward timing offset estimation algorithms. The following results are obtained from simulation using multiple pieces of data whose other conditions are consistent with the ''simulation environment'' of Simulation 1, except that AVN and Wang use 256 samples for calculation and the rolloff coefficient is indefinite. The difference between IMLT 3 , IMLT f and IMLT s shown in Fig.7 is only that when performing trigonometric polynomial interpolation, IMLT 3 uses eight samples assist interpolation, while IMLT f only uses four samples assist interpolation and IMLT s uses sixteen samples assist interpolation.
The performance comparison of various algorithms is shown in Fig.7. When there is no frequency offset interference, the performance of SML is better than that of IMLT f , mainly because of the self-noise of IMLT f is larger than the self-noise of SML. And because IMLT 3 uses eight samples assist interpolation to reduce the influence of self-noise, thus IMLT 3 has slightly better timing offset estimation accuracy than SML. And when there is a slightly larger frequency offset interference, the disadvantage of SML's inability to resist frequency offset makes it unable to estimate timing offset. However, IMLT 3 can still perform high-performance timing offset estimation results within the frequency offset capture range, and due to self-noise is tiny, so that IMLT 3 still has a much higher estimation accuracy than IMLT f . On the contrary, although IMLT s uses sixteen samples to assist interpolation, but the performance of the algorithm is worse than that of IMLT 3 due to the decrease of interpolation reliability.
Compared with IMLT 3 , the two NDA algorithms, AVN and Wang, have relatively poor estimation performance. Even if the samples of 256 symbols are used for estimation, the estimation performances of AVN and Wang are still far away from MCRB in the case of low SNR and small shaping coefficient. Moreover, their estimated performance deteriorates sharply with the decrease of the shaping coeffi-cient, which is very unsuitable for the current high-speed and low-power transmission communication requirements. Oppositely, IMLT 3 is insensitive to the shaping coefficient, therefore, the performance of IMLT 3 can be close to MCRB in the case of low SNR and small shaping coefficient within the frequency offset capture range.

IV. CONCLUSION
An improved DA forward symbol timing offset estimation algorithm for the data of four samples per symbol is proposed in this article, and in order to make it more suitable for engineering practice, it has been simplified and improved in multiple stages. The new algorithm uses the idea of maxi-mum likelihood estimation joint trigonometric polynomial interpolation to easily realize high-performance estimation of timing offset and overcomes some shortcomings of common forward symbol timing offset estimation algorithms. When there is no frequency offset interference, it can obtain the performance close to MCRB under the conditions of low SNR and small shaping coefficient. Moreover, when the frequency offset is in the frequency offset capture range, the performance of the new algorithm has only acceptable loss compared to the case of no frequency offset. These algorithm characteristics make the new algorithm very suitable for applications in the current common burst communication with high-speed and low-power transmission characteristics.
HENG WANG received the bachelor's degree in computer science and technology from Zhengzhou University, in 2018, where he is currently pursuing the master's degree with the School of Information Engineering. His research interests include satellite communication networking and communication signal analysis and processing.
HUA JIANG is currently a Master's Supervisor and a Doctoral Supervisor with the School of Information Engineering, Zhengzhou University. He is also a Senior Member of the Chinese Institute of Electronics and the Leader of the Outstanding Teaching Team at universities in Henan, and enjoys the special government allowance of the State Council. His research interests include communication signal processing, signal detection, and electronic countermeasure research.
KEXIAN GONG received the Ph.D. degree from the Department of Signal and Information Processing, PLA Information Engineering University. He is currently a Master's Supervisor and a Doctoral Supervisor with the School of Information Engineering, Zhengzhou University. His research interests include wireless communication signal analysis and processing, channel coding, target monitoring, and electronic countermeasures.