Algorithm and Performance Analysis for Frame Detection Based on Matched Filtering

Frame arrival detection is the first crucial task for a digital communication receiver. In this paper, we propose a new frame arrival detection method based on the concept of matched filtering. The proposed training sequence consists of three repeated subsequences, each including a few repeated segments and exhibiting high sparsity in frequency domain. The proposed detection method includes the following two stages. The first matched filtering stage employs the subsequence as the filter coefficients, which matches the frequency sparsity of the received subsequence and could greatly improve the output signal to noise ratio. The second stage adopts delayed autocorrelation on the filtered signal to detect the presence of training sequence. It is demonstrated that the proposed method could outperform the conventional methods in terms of both missed detection and false alarm probability. We derive the theoretical analysis for both missed detection and false alarm performance. We further extend our performance analysis and numerical evaluation to the system with carrier frequency offset (CFO). The results illustrate accuracy of the analytical results and robustness of the proposed method against practical levels of CFO.


I. INTRODUCTION
Frame arrival detection, sometimes referred to as coarse time synchronization, means detecting the arrival of burst data frame and determining the approximate frame starting position [2]. This is the first crucial task for a digital communication receiver. There are two key evaluation metrics of frame arrival detection: the probability of missing a training sequence (missed detection probability) and falsely detecting a training sequence when none is there (false alarm probability).
There have been a number of studies reported on time synchronization for frequency selective channels in the past few years. Schmidl and Cox [3] presented a classic synchronization algorithm using two repeated training subsequences based on delayed autocorrelation criterion, referred to the 'S&C' method in the following. Using a sliding window with half the length of the training sequence, the timing The associate editor coordinating the review of this manuscript and approving it for publication was Rongbo Zhu . metric function attains its maximum at the correct estimation point. S&C method works well at moderate and high SNR, but for low SNR, the performance of missed detection and false alarm may not be balanced satisfactorily. Moreover, the timing metric of S&C method exhibits a plateau, which causes some uncertainty in determining the exact frame starting point. The structure of the training symbol was further modified for getting a sharper peak in [4]- [6]. For example, two methods have been presented in [4] to reduce the uncertainty due to the timing metric plateau. Minn et al. [5] proposed an improved algorithm for S&C method to eliminate the influence of cyclic prefix (CP), by introducing different sign patterns to the identical parts. Park et al. [6] introduced a new training symbol structure with conjugate symmetry. However, there is a large sub peak on both sides of the correct timing sample in [6], which degrades the performance of time synchronization especially in a low SNR multipath channel. Ch Kishore and Reddy [7] have introduced the weighting of time-domain preamble in timing metric to yield a shark peak at the correct symbol boundary. Ren et al. [8] proposed a new VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ timing metric without notable sidelobes by exploiting pseudo noise (PN) weighting. The authors in [9] have proposed the product of the autocorrelation and cross-correlation metrics to improve the fine timing accuracy. Ruan et al. [10] proved that using more preambles can improve timing estimation. The authors in [11] not only used the delayed autocorrelation function but also took advantage of differential normalization as well. The tome-domain symmetry of Zadoff-Chu sequences has been exploited in [12] to provide higher correlation gain. In [13], the preamble sequences are designed with the consideration of high spectral compactness, robust initial synchronization and low peak to average power ratio (PAPR) waveform simultaneously. Another kind of methods for timing recovery has exploited the inherent correlation between CP and the last part of the symbol in orthogonal frequency division multiplexing (OFDM) systems [18], [19]. For example, Van de Beek et al. proposed a time frequency joint synchronization algorithm using CP to estimate the time and frequency deviation by calculating a maximum likelihood function [18]. The start of frame can be also blindly estimated by exploiting the characteristics of CP [20]- [23].
On the other sides, it is demonstrated in [14] that sufficient statistics for detection of a periodic preamble do not exist, and conventional methods may not be optimal. Therefore, a new method is then presented based on the idea of fourth-order statistics to improve detection performance [14]. The work in [15] then improved the detection performance by exploiting the fourth-order differential normalization functions. The generalization to even higher order statistics for coarse timing estimation is further developed in [16]. Zhen et al. [17] further exploited fourth-order statistics in frame timing estimation for IEEE 802.11p standard.
Almost all of the above works have not specifically considered the challenging situation with low signal-to-noise ratio (SNR). Note that low operating SNR is a direct result of high penetration loss due to the extended coverage of certain deployments (for example, the basement of a building). The reliable frame arrival detection at low SNR condition is the main focus of this work, and a two-stage frame arrival detection method is proposed. The proposed training sequence consists of three repeated subsequences, each including a few repeated segments and exhibiting high sparsity in frequency domain. In the first matched filtering stage, the subsequence serves as the filter coefficients with high frequency selectivity, which matches the frequency sparsity of the received subsequence and could greatly enhance the output SNR. Then, the delayed autocorrelation can be calculated on the filtered signal to detect the presence of training signal in the second stage. It is demonstrated that the proposed method could outperform the conventional methods in terms of both missed detection probability and false alarm probability. We provide the theoretical analysis in terms of both missed detection and false alarm performance, which well predict the simulation results. Through theoretical analysis and simulation results, we also show that the performance of the proposed method is quite robust to the existence of carrier frequency offset (CFO) between transceivers. Moreover, for very large CFOs, a multibranch CFO compensation procedure is further proposed to maintain the detection performance.
The remainder of this paper is organized as follows. In section II, the considered system is introduced and S&C method proposed by Schmidl and Cox is briefly described. The proposed two-stage method is developed in Section III. Theoretical performance analysis is derived in Section IV. The proposed method under the effect of CFO is investigated in Section V. Simulation results are provided in Section VI. Section VII concludes the paper. Preliminary results of this work have been presented in [1].

II. SYSTEM MODEL AND S&C FRAME ARRIVAL DETECTION METHOD
Consider a packet-switched radio communication system with a random access protocol. This essentially means that the receiver has no priori knowledge about packet arrival time. Frame arrival detection (timing synchronization) should be completed shortly after the start of the reception of a packet. Usually, the data packet is preceded with a known training sequence (the so-called preamble). The training sequence and corresponding frame arrival detection method should be carefully designed to provide reliable system performance.
A well-known frame arrival detection method is the so-called S&C method. S&C method employs a training sequence consisting of two identical halves in time domain to achieve frame synchronization. The training sequence is appended before the data transmission with CP and would remain identical after passing through propagation channel in the absence of noise. It applies correlation between the first half and the second half of the received training signal and hence it is robust against channel distortions. When the correlation window position is correct, the phases of all product pairs in the correlation can be aligned and the correlation attains its maximum.
Specifically, suppose that the half length of the training sequence is N (excluding CP) and the receiver is equipped with M antennas. Denote the nth received basedband sample at the mth antenna by r m (n). Then, the sum of the product pairs can be expressed as [3] where d is the starting point in a sliding window of 2N samples. This window slides along in time at the receiver to detect the arrival of frame. The received energy for the first and second halves in the sliding window can be respectively 40560 VOLUME 8, 2020 calculated as The timing metric function can be defined by which represents the magnitude of correlation coefficient between first and second halves in the sliding window. The metric in (4) can be called as delayed autocorrelation.
In the presence of frame arrival, the timing metric will reach a plateau when there is no inter-block interference within the sliding window. In frequency selective fading channel, the length of timing metric plateau equals CP length minus the channel length. In practice, since the receiver has no idea whether there is an ongoing transmission, the sliding timing metric can be compared with a predefined threshold. The timing metric larger than the threshold indicates a frame arrival. Otherwise, no frame arrival is detected.

III. PROPOSED TWO-STAGE FRAME ARRIVAL DETECTION
In general, there are two issues to consider when evaluating the performance of detection problem. First, the probability of missing a training sequence and not detecting the signal, i.e., the missed detection probability. Second, the probability of falsely detecting a training sequence when none is there, i.e., the false alarm probability. Though S&C method is effective and widely used for frame arrival detection, it may not provide reliable detection performance under low SNR condition. In this section, we present the proposed two-stage detection method, which could outperform S&C method in terms of both missed detection probability and false alarm probability, especially under low SNR condition. As shown in Fig. 1, the proposed training sequence consists of CP and three repeated subsequences. Each subsequence is of length W and includes K small repeated segments. Then, the length of each segment can be expressed as P = W /K . The proposed frame arrival detection method is composed of two stages. In the first matched filtering stage, the subsequence serves as the filter coefficients to perform filtering on the received signal. Then, the delayed autocorrelation is calculated on the filtered signal to detect the presence of the training signal.
The motivation of the first matched filtering stage can be described as follows. The frequency components of the subsequence will mainly distributed around the P dominant interleaved frequencies due to its repeated structure. It is expected that, in frequency domain, the subsequence should exhibit high sparsity with relatively large K . When the subsequence serves as the matched filter, the frequency response of the filter system in the first stage also exhibits high frequency selectivity. This high frequency selectivity of the filter system perfectly matches the frequency sparsity of the training subsequences and suppresses the white noise in the null space of the training signal. Hence, the SNR is substantially enhanced after the matched filtering in the first stage. The basic idea of the first matched filter stage is depicted in Fig. 2.
Moreover, without the effect of noise, owing to the CP appended before the first subsequence, the three subsequences still remain identical after passing through the frequency selective channel. Similarly, the first subsequence could be regarded as the additional ''CP'' in the proposed matched filtering stage, which guarantees that the last two subsequences could maintain the equality after the linear convolution with the matched filter. This then enables the typical delayed autocorrelation algorithm in the second stage, where the presence of frame can be detected by sliding the window with length of two subsequences along in time.
Mathematically, let us denote the subsequence by s = [s 0 , s 1 , · · · , s W −1 ] T . The subsequence can be generated as s = F H c, where F denotes the unitary W × W DFT matrix and c ∈ C W ×1 is the corresponding frequency domain pilot vector. Here, c only has P equi-amplitude non-zero pilot tones which takes cyclically equi-spaced positions (with a spacing of K tones). We consider the transmitter has one antenna while the receiver has M antennas. The propagation channel between transmitter and the mth receive antenna is modeled as where L is the channel length. Here, h m,l denotes the lth tap of the channel at the mth receive antenna. In this paper, we assume the channel length is no greater than the length of each segment, i.e., L ≤ P. Let S C ∈ C W ×W and S L ∈ C W ×(2W −1) stand for the circulant and linear convolution matrices with subsequence s, respectively, that is, The received signal corresponding to the three subsequences at the mth antenna after CP removal can be expressed as where n m denotes the corresponding additive white Gaussian noise (AWGN) noise expressed as Performing matched filtering with the training subsequence on the received training signal, we obtain the filtered signal corresponding to the last two subsequences after removing the first W samples as where n with r T 1,m ∈ C W ×1 and r T 2,m ∈ C W ×1 . From (8) and (9), it is evident that r 1,m and r 2,m remain identical except for the difference of noise. Here, the role of r 1,m and r 2,m is similar to the two identical halves employed in S&C method. Hence, the timing metric based on delayed autocorrelation can be directly applied to the filtered signal to detect the presence of frame.
Specifically, for the mth receive antenna, denote the nth received sample before and after matched filtering by r m (n) and r (MF) m (n), respectively. There holds r Let d stand for the starting point in a sliding window of 2W samples. Then, similar to S&C method [3], the timing metric in the second stage of the proposed method can be expressed as where In practice, the window of length 2W is slid along in time, and a frame arrival is supposed to be detected if the metric exceeds a preset threshold.
In summary, the whole procedure of the proposed method is illustrated in Fig. 3.

Remark:
Here we should emphasize that, there are a few number of works [7]- [9] that have directly absorbed the cross-correlation between the received signal and the preamble in the designed timing metric. However, the main purpose of exploiting cross-correlation in these work lies in reducing the uncertainty due to the timing metric plateau of S&C method. They are not motivated by the concept of linear filtering and are completely different from our idea. To the best of our knowledge, the idea of matched filtering with high frequency selectivity to greatly enhance the output SNR has not been reported yet among all the existing frame detection works during the past decades.

IV. THEORETICAL PERFORMANCE ANALYSIS
In this section, we provide theoretical performance analysis for the proposed two-stage detection method in terms of both missed detection and false alarm probability. The general frequency selective Rayleigh fading channel is assumed. The corresponding analysis for S&C method is also provided. Here we should note that, for the best of our knowledge, the missed detection probability of S&C method under frequency selective Rayleigh fading channel has not been reported before.
Denote the preset threshold of timing metric by η. The SNR condition is defined as ρ = σ 2 s /σ 2 n , where σ 2 s and σ  Then, the missed detection probability of the proposed two-stage method can be given by the following integral: where (x) = 1 √ 2π x −∞ e −t 2 /2 dt denotes the cumulative distribution function (CDF) of the standard normal distribution, while y(Z ) and V (Z ) are relevant to the system parameters as given below: Proof: See Appendix A. Note that both y(Z ) and V (Z ) depend on Z , and the integral of (11) may not be obtained directly in closed form. Nevertheless, in the following we develop an effective closed-form approximation for (11).
(1−η) . The missed detection probability of (11) can be approximated in closed form as follows: Proof: See Appendix B. In order to get more insight, we further consider the asymptotic case with sufficient large W and ρ. Specifically, we consider ρ L. Then, we can transfer (14) into the following asymptotic version: Note that some higher order infinitesimals have been ignored in (15) with sufficient large ρ and W . The following observations can be made from (15): First, we observe that the diversity gain of the detection is ML. This implies the missed detection probability can drop more quickly as SNR increases with a larger ML. Second, as expected, a higher threshold increases the magnitude of √ η 1− √ η and thus could further degrade the detection performance. Third, to some extent, the matched filtering with parameter K would equivalently improve SNR by a factor of K . This clearly demonstrates that the proposed two-stage method benefits from the repeated structure of training subsequence.

2) FALSE ALARM PROBABILITY
Lemma 3: The false alarm probability of the proposed two-stage detection method can be expressed as Proof: See Appendix C. It can be seen from (16) that when P false is plotted in the logarithmic scale, the false alarm probability curve linearly decreases with the increase of the threshold η, the number of antennas M and the segment length W /K of the training subsequence. Moreover, from comparison between (15) and (16), we observe that, increasing η, on one hand, decreases the false alarm probability P false , on the other hand, would increase missed detection probability P miss . Hence, as expected, there is a tradeoff between missed detection and false alarm probability to determine the threshold η. Similar observation can be made for the parameter K .

B. S&C METHOD
In this subsection, we provide the theoretical expressions for missed detection and false alarm probability of S&C method under a Rayleigh fading channel. Following the similar steps in Appendix A and B, we obtain the following Lemma 4. The detailed proof is omitted due to the space limitation.
Lemma 4: Given a preset frame arrival detection threshold η and SNR condition of ρ = σ 2 s /σ 2 n , the missed detection probability of S&C method can be expressed as where (1−η) . With sufficiently large W and ρ, we obtain the following asymptotic missed detection probability by ignoring the VOLUME 8, 2020 higher order infinitesimals: It is seen that the S&C method also achieves a diversity gain of ML. The comparison between (15) and (18) indicates that, given the same threshold η, the proposed two-stage method can achieve much lower missed detection probability as compared to S&C method. Moreover, following the similar steps in Appendix C, we obtain the following Lemma 5. The detailed proof is omitted due to the space limitation.
Lemma 5: The false alarm probability of S&C method can be expressed as Hence, there also exists a tradeoff between missed detection and false alarm probability to determine the threshold η in S&C method.

C. PERFORMANCE COMPARISON BETWEEN S&C AND THE PROPOSED METHOD
It is evident that there exist the tradeoff to determine the detection threshold η in both S&C and the proposed method. Given a required false alarm probability P false , according to the asymptotic results, the thresholds of S&C and the proposed method can be expressed as Here, for fairness, the length of training sequence in S&C method is set as 2N = 3W , such that the total training sequences of S&C and the proposed method keep the same length. Then, according to (15) and (18), the asymptotic missed detection probability of the proposed method is lower than that of S&C method, provided that the following condition is satisfied: Then, substituting (20), we can rewrite (21) into This indicates that, given the same required false alarm probability P false , the proposed method could outperform S&C method in terms of asymptotic miss detection performance, as long as P false is above the threshold on the right hand side of (22). Let us consider the parameters adopted in later simulations as an example. In the case of M = 2, W = 128 and K = 8, the right hand side of (22) equals 3.3 × 10 −18 ≈ 0, which is far below the required false alarm probability in practice. Hence, we may conclude the proposed method outperforms S&C method in terms of asymptotic detection performance.

V. PROPOSED TWO-STAGE FRAME ARRIVAL DETECTION FOR SYSTEMS WITH CFO
It is well-known that, in addition to time synchronization, frequency synchronization is another important issue, which has also attracted a lot of attentions recently [24]- [27]. In practice, CFO naturally appears between the transceivers due to the frequency mismatch of the local oscillators. The CFO estimation and compensation should be performed after the coarse time synchronization, i.e., the frame has been correctly detected. In other words, the frame detection should be carried out in the frequency asynchronous situation [15]- [17]. In this section, we show that the proposed frame arrival detection scheme works in the frequency asynchronous situation and could exhibit some robustness to the CFO between the transceivers.
Denote f as the CFO in Hz between the transceivers and T s as the sampling interval. Then, the normalized CFO can be defined by φ = fT s . We define the X × X diagonal matrix E X (φ) = diag(1, e j2πφ , · · · , e j2π(X −1)φ ) which represents the phase shift arising from CFO. As usual, we assume that the RF channel selection filter is designed not to affect the desired received signal in the presence of CFO. Then, in the presence of CFO, we can re-write the received signal of (6) into Denote S L,2W as the 2W × (3W − 1) linear convolution matrix with subsequence s, which is formed in the similar way to S L . DenoteS C as a submatrix of S C by removing its first row vector. Then, the filtered signal in (8) after matched filtering corresponding to the last two subsequences can be re-expressed as There holds the equation [28] where S L,2W (φ) is the 2W × (3W − 1) linear convolution matrix with sequence diag(e j2π(W −1)φ , e j2π(W −2)φ , · · · , 1) s = e j2π (W −1)φ E W (−φ)s. According to (25), we can rewrite (24) into where S C (φ) is the W × W circulant convolution matrix with sequence e j2π(W −1)φ E W (−φ)s. It is observed that, in the absence of noise, the first and second halves of y (MF) m remains identical except for a phase shift introduced by the CFO. Hence, the timing metric based on delayed autocorrelation can be also applied to detect the presence of frame. Lemma 6: Given the CFO of φ, the missed detection probability of the proposed two-stage method can be expressed as where y(Z ) and V (Z ) exactly equal (12) and (13), respectively. Moreover, the factor λ(φ) = sin 2 (πW φ) represents the mismatch effect between the matched filter and the received training sequence due to existence of CFO.
Proof: See Appendix D. Then, following the similar steps in Appendix B, we can obtain an approximate closed-form expression for (27) as where (1−η) . It is interesting to observe that (14) and (28) are exactly in the same form. The only difference is that the SNR ρ in (14) has been replaced by ρλ(φ) in (28). This theoretically indicates that, regarding the missed detection performance of the proposed method, the CFO would result in an SNR degradation by a factor of λ(φ). Especially, we have λ(φ) = 1 with φ = 0, and then (28) turns into (14). Hence, the analytical expression in (28) is a direct extension of (14) with the consideration of CFO.
Remark: Here we note that a very large CFO may appear in some rare scenarios. Especially, when φ = 1/W , we have λ(φ) = 0 and the expected signal will be completely filtered away by the matched filter, which would lead to 100% missed detection. Nevertheless, we find that this issue can be tackled with the aid of multi-branch CFO compensation. Specifically, let us consider a set of few pre-determined trial values for CFO compensation: {φ 1 ,φ 2 , · · · ,φ T } in ascending order. As illustrated in Fig. 4, let the received signal path through T branches in a parallel manner. The ith branch first performs CFO compensation by the trial valueφ i , then carries out the proposed two-stage method and outputs its timing metric Q i (d). The final metric takes the maximum value from the T output matrices, i.e., Q(d) = max T i=1 Q i (d). A frame is supposed to be detected if the metric Q(d) exceeds a preset threshold. It is expected, as long as the CFO is within the range ofφ 1 ∼φ T , the energy of the pilot signal  will be preserved by the multiple matched filters and thus, the detection performance will be maintained.

VI. SIMULATION RESULTS
In this section, we evaluate the performance of the proposed two-stage frame arrival detection method through numerical simulations and verify the derived theoretical analysis. Unless otherwise stated, we consider that the subsequence is of length W = 128 and is composed of K = 8 repeated segments in the proposed two-stage method. We also include the results of S&C method for comparison. For fairness, the length of training sequence in S&C method is set as 2N = 3W , such that the total training sequences of S&C and the proposed method keep the same length. Moreover, a frequency selective Rayleigh fading channel is considered. The channel length L = 5 is assumed unless otherwise stated. For conciseness, missed detection and false alarm probability are referred to as 'MDP' and 'FAP', respectively.
In the first example, we evaluate the miss detection performance of the proposed method in Fig. 5 as a function of SNR from −12 dB to 0 dB. Different values of the number of antennas M are considered, while the detection threshold η is chosen corresponding to the associated FAP. We also include the performance of S&C for comparison. The following observations can be made: 1) The corresponding closed-form analytical results from (14) are provided by the VOLUME 8, 2020 dashed curves. It is observed that the numerical curves are well predicted by the analytical results, which demonstrates the effectiveness of the theoretical analysis for both S&C and the proposed method. 2) As expected, the MDP can be decreased by increasing the number of antennas M or raising the associated FAP (equivalently lowering the threshold). Especially, the curves with M = 2 will drop more quickly than the case with M = 1 on the right hand side of x-axis. This exactly coincides with our analysis that the detection has a diversity gain of ML and thus, a larger M would increase the slope of the missed detection curve. 3) Under the same level of associated FAP, we see that the proposed method can substantially outperform the S&C method. For example, for the case with FAP around 3 × 10 −6 , a performance gap of more than 3 dB is observable between S&C and the proposed method. Fig. 6 shows the influence of different training length on FAP of S&C and the proposed method at SNR=−4 dB. We consider M = 2 in this example. The thresholds are taken as η = 0.026 and η = 0.15 in S&C and the proposed method, such that the two methods exhibit similar FAP. The corresponding analytical FAP results from (16) and (19) are included as the dashed curves for comparison. We see that the analytical results basically match the numerical curves. It is observed that in y-axis with logarithmic scale, the FAP curves have a linear relationship with the training subsequence length W , which has been predicted by the previous theoretical analysis. Moreover, the corresponding MDP curves of S&C and the proposed method are also included. It is evident that as compared to S&C method, the proposed method can achieve much lower MDP with the similar FAP performance. This once again demonstrates the superiority of the proposed method.
In Fig. 7, we evaluate the MDP performance of the proposed method as the training length increases. The analytical and asymptotic results computed from (14) and (15) are both included. We assume M = 1 and the threshold is fixed  as η = 0.3. For the case of SNR = 0 dB, we can observe a certain deviation between the simulation and the asymptotic results. Nevertheless, when SNR increases to 3 dB, the simulation result can closely approach the asymptotic one as the training length increases.
We display the performance of the proposed method in Fig. 8 in terms of both MDP and FAP under SNR = −4 dB. We assume M = 2 in this example. The x-axis and y-axis represent FAP and MDP, respectively. Each curve is obtained by varying the threshold η. In other words, each curve shows the evolution of FAP and MDP as the threshold changes. For comparison, we also show the results of the existing six methods: S&C method [3], differential factor normalized autocorrelation method [11] (labelled as 'Autocorr-Diff-Norm'), fourth-order statistics based method [14] (labelled as '4th-Order'), fourth-order statistics plus differential normalization method [15] (labelled as '4th-Order-Diff-Norm'), fourth-order statistics for IEEE 802.11p [17] (labelled as '4th-Order-Diff-Norm-11p') and the pseudo-nose sequence aided method [8] (labelled as 'PN-Weighting'). Here, all the existing works have been directly extended for multi-antenna case. Note that, in order to achieve acceptable balance between miss detection and false alarm performance, all methods may employ different threshold configurations. The results clearly demonstrate the tradeoff between missed detection and false alarm performance. By varying the threshold, we cannot simultaneously decrease both missed detection and false alarm probability. Note that, the closer the curve is to the lower left corner, the better the performance is. It is evident that the proposed two-stage method can achieve much better performance than all the existing methods. Especially, we see that with a proper threshold, both MDP and FAP of the proposed method can be lower than 10 −4 . Given FAP of 10 −3 , the proposed method can almost decrease MDP more than two orders of magnitude as compared to the existing methods. This clearly demonstrates the superiority of the proposed method.
As mentioned before, the first length-W subsequence in the proposed method acts as an additional CP. In Fig. 9, we shorten the length of first subsequence and evaluate its impact on the MDP performance of the proposed method. The length of the first subsequence is denoted by W p , while the last two subsequences keep length of W . We assume M = 2 and η = 0.2 in this example. Note that W p = W is required to guarantee that the last two subsequences could exactly maintain equality after the matched filtering stage. It is expected that when W p < W , this exact equality will be destroyed and the performance should be compromised. The simulation results coincide with the expectation that shortening W p indeed raises MDP of the proposed method. However, it is interesting to see that the performance degradation is quite limited. Even W p = 0, i.e., without the first subsequence, the performance degradation is only around 1 dB. This means that, the proposed method can decrease the training overhead by a factor of 1/3 without significant sacrifice of performance. Next, we evaluate the effect of CFO on the proposed detection method in Fig. 10. We depict the missed detection probability of the proposed method with CFO of φ = 0.2/W and φ = 0.3/W . We consider M = 4, η = 0.1 in this figure and the FAP is given by 3 × 10 −6 . The corresponding performance without CFO is also included as the benchmark. The analytical expression from (28) is included for comparison. The simulation results closely match the analytical results and indicate that the CFO would indeed degrade the missed detection performance of the proposed method. Specifically, the degradation in dB can be expressed as −10 log 10 (λ(φ)), which equal 0.58 dB and 1.32 dB for φ = 0.2/W and φ = 0.3/W , respectively. Nevertheless, for practical systems, e.g., a sampling rate of 20 MHz at the carrier frequency of 2.4 GHz, the normalized CFOs of φ = 0.2/W and φ = 0.3/W correspond to real CFO of 31.25 KHz and 46.9 KHz, respectively, while a moderate 10 ppm oscillator in practice would lead to a maximum CFO of 24 KHz. Hence, practical CFO levels would lead to little performance degradation of the proposed method. For more comparison, we include the result of S&C method in this figure whose FAP is in the same order of the proposed method. It is well-known that S&C method has the advantage of immunity to CFO effect. Nevertheless, even with a large CFO level, the proposed method would still outperform S&C from this example.
In Fig. 11, we evaluate the scenario with large CFOs under M = 4. We consider the normalized CFOs of φ = 0.5/W , φ = 0.75/W and φ = 1/W , which correspond to real CFO of 78.13 KHz, 117.18 KHz and 156.25 KHz under 20 MHz sampling rate. We consider five-branch CFO compensation, i.e., T = 5 and the preset CFO compensation values are taken as {φ 1 ,φ 2 , · · · ,φ T } = {− 1 W , − 0.5 W , 0, 0.5 W , 1 W }. The threshold is taken as η = 0.1 and η = 0.11 respectively for the cases without and with multi-branch CFO compensation, such that the FAP is maintained in the same level of 3×10 −6 . It is evident that, without multi-branch VOLUME 8, 2020 CFO compensation, the proposed method would completely fail with large CFOs. In comparison, with multi-branch procedure, it is observed that the proposed method would work properly, and the performance degradation due to large CFO is within around 0.5 dB.
We then evaluate the computational complexity of the proposed method in terms of the number of real multiplications. We assume a single receive antenna for example. Due to the special repeated structure in the proposed method, the matched filtering stage can be implemented by two cascaded finite impulse response (FIR) filters: The coefficients of one FIR filter correspond to the length-P segment while the coefficients of the other FIR filter all equal ones. Thus, for one time instant d, the first stage requires 4W /K real multiplications. The complexity of the delayed autocorrelation stage in the proposed method exactly equals that of S&C method, which can be expressed as 7 real multiplications for one instant. Hence, the total complexity of the proposed method for one time instant can be expressed as 4W /K + 7. With multi-branch procedure, the complexity of the proposed method can be further expressed as 4TW /K + 7T . Considering K = 8 and T = 5 adopted in simulations, the complexity can be given by 2.5W + 35. Under the condition of the same length of training sequence, the complexities of the S&C [3], 4th-Order [14], 4th-Order-Diff-Norm [15] and 4th-Order-Diff-Norm-11p [17] can be expressed as 7, 10W − 40, 6.5W −26 and 7.5W −1, respectively [17]. In summary, the complexity comparison is listed in Table 1. It is evident that, although the proposed method has a higher computational burden than S&C method, it is quite computationally efficient as compared to the other three competitors. Considering the superior performance of the proposed method demonstrated before, we actually provide a better way to trade complexity for performance.

VII. CONCLUSION
In this paper, we have proposed a two-stage frame arrival detection method. The training sequence is designed to be sparse in frequency domain. The matched filtering is employed in the first stage to enhance the SNR. Then, the delayed autocorrelation is adopted in the second stage to detect the presence of frame. Performance analysis is derived in terms of both missed detection and false alarm probability. Furthermore, performance analysis for the proposed method in a system with CFO is also developed. Numerical simulation results verify accuracy of the theoretical analyses, the superiority of the proposed method over the benchmark S&C method, and the robustness of the proposed method against practical levels of CFO.

APPENDIXES APPENDIX A PROOF OF LEMMA 1
Define where r 1,m and r 2,m have been defined in (9). Denote a m = S C S C h m for short. Then, r 1 and r 2 can be expressed as where the second order noise term n m has been ignored. Bearing in mind that n (1) m and n (2) m have some overlapping noise elements, we can express them as n  (2) m . As the imaginary part is negligible, we can approximate (29) as We further obtain the numerator of (10) as where the higher order noise terms have been ignored.
On the other side, R 1 (d) and R 2 (d) in the denominator of (10) can be expressed as We further obtain where the higher order noise terms have been ignored. Given a detection threshold η, the missed detection probability can be expressed as P miss = P |G(d)| 2 R 1 (d)R 2 (d) < η | frame arrival present). Combining both (31) and (32), after some tedious manipulations, the event |G (d) j .
Provided that s is generated from equi-amplitude interleaved pilot tones from frequency domain and L ≤ P,  [3], that is, of (37) can be obtained as Note that due to the special repeated structure of s, the matrix S L S H L approximately forms a sparse Toeplitz matrix with 2W σ 2 s on the main diagonal, K −|i| K 2W σ 2 s on the (iP)th diagonal, i = ±1, ±2, · · · , ±(K − 1), and zeros elsewhere. Then, after some tedious manipulations, we obtain the approximation: Likewise, we further obtain Then, according to (39) and (40), given one channel realization, the variance of the Gaussian variable of (37) can be approximately expressed as Hence, given the channel realization, the missed detection probability can be expressed as x −∞ e −t 2 /2 dt denotes the CDF of the standard normal distribution. We consider the typical frequency selective Rayleigh fading channels and h i,j 2 obeys exponential distribution with mean 1/L, such that the channel gain between the transmitter and each receive antenna is normalized. Then, according to the additivity of Gamma distribution, we know Z obeys Gamma distribution, i.e., Z ∼ (ML, L), where the ∼ operator means ''is distributed'' and represents the Gamma distribution. Therefore, following the above discussions, the missed detection probability of the proposed two-stage frame arrival detection method can be expressed as (11). This completes the proof.

APPENDIX B PROOF OF LEMMA 2
First, it is seen that V (Z ) is a cubic function of Z with two zero points: is a quadratic function of Z with two zero points: K ρ(1−η) and a pole point of M η (1−η)K ρ . Since the pole point is in the range from 0 to Z 0 /(K ρ), that is, 0 < M η (1−η)K ρ < Z 0 /(K ρ), it is expected that the magnitude of y(Z ) √ V (Z ) can be quickly increased to a large value. Moreover, since the CDF function (x) quickly approaches 1 when x > 3, we can make the following approximation: Second, note that with a relatively large Z , y √ Var can be approximated as where α = K 3W ρ 2(5K 2 +1) ≈ 3W ρ 10 . Then, when Z ≥ Z 0 /(K ρ), we make the approximation: where (·) denotes the Gamma function. From [29], we know µ −n−1 γ (n + 1, µu) , u > 0, Re µ > 0, n =≥ 0. Then, the missed detection probability of the proposed method can be finally approximated in closed form as in (14). This completes the proof.

APPENDIX C PROOF OF LEMMA 3
We consider the case in the absence of frame arrival. The two halves of delayed autocorrelation in this situation can be expressed as r 1 = [(S L n 2 ) T , · · · , (S L n 2 ) T , · · · , (S L n where S H L S L has been partitioned into A ∈ C W ×(W −1) , B ∈ C (W −1)×W , C 1 ∈ C (W −1)×(W −1) and C 2 ∈ C W ×W in accordance with the dimensions of w (1) k ,w (3) k and w (3) k . Here, w (2)H k C 1 w (2) k can be approximated by its expectation as a constant w (2)H k C 1 w (2) k ≈ 2σ 2 n Tr (C 1 ) = 0. The other three terms can be approximated to be complex Gaussian variables according to the central limit theorem, and independent of each other. The false alarm probability of the proposed method can be then expressed as This completes the proof. ∈ C W ×1 . That is, r 1,m = e j2πφ E W (φ)S C (φ)S C h m + S L n (1) m and r 2,m = e j2π(W +1)φ E W (φ)S C (φ)S C h m + S L n (2) m . Denote a m (φ) = S C (φ)S C h m for short. There holds It is worth noting that Fs ≈ e −jπ(W−1)φ sin(π W φ) W sin(π φ) Fs. (48) The approximation in (48) is due to the following facts: 1) The inter-carrier interference (ICI) matrix (φ) is approximately banded matrix with most significant elements around the main diagonal and else where are close to zeros.
Then, following the similar steps in Appendix A, the missed detection probability in the presence of CFO can be given by where y(Z ) and V (Z ) are exactly the same as (12) and (13), respectively. We can then directly rewrite (49) into (27) by variable substitution. This completes the proof.