ELM-based Frame Synchronization in Nonlinear Distortion Scenario Using Superimposed Training

The requirement of high spectrum efficiency puts forward higher requirements on frame synchronization (FS) in wireless communication systems. Meanwhile, a large number of nonlinear devices or blocks will inevitably cause nonlinear distortion. To avoid the occupation of bandwidth resources and overcome the difficulty of nonlinear distortion, an extreme learning machine (ELM)-based network is introduced into the superimposed training-based FS with nonlinear distortion. Firstly, a preprocessing procedure is utilized to reap the features of synchronization metric (SM). Then, based on the rough features of SM, an ELM network is constructed to estimate the offset of frame boundary. The analysis and experiment results show that, compared with existing methods, the proposed method can improve the error probability of FS and bit error rate (BER) of symbol detection (SD). In addition, this improvement has its robustness against the impacts of parameter variations.


I. INTRODUCTION
D UE to the limited bandwidth resources, wireless communication systems pursue high spectrum efficiency in the past few decades [1]. As we can see, the spectrum efficiency of the fifth generation (5G) wireless communication system is many times higher than that of the fourth generation (4G) wireless communication system [2], [3]. In wireless communication systems, frame synchronization (FS) is a fundamental and essential task to guarantee the overall system performance [4], which usually consumes substantial bandwidth resources to overcome the synchronization challenge [5]. Thus, during the FS phase, the contradiction between the high bandwidth resources consumption and the high spectrum efficiency requirement needs to be resolved. Meanwhile, the wireless communication system has a large number of nonlinear devices or blocks, e.g., high power amplifier (HPA), digital to analog converter (DAC), etc., inevitably causing nonlinear distortion [6], [7]. With limited considerations for nonlinear distortion, the classical methods (e.g., correlation-based FS [8]) and the recent solutions (e.g., compressed sensing-based FS in [9]) are usually difficult to apply in nonlinear distortion scenarios [10]. Therefore, the FS is facing challenges from not only the spectrum efficiency but also the nonlinear distortion.
To cope with nonlinear distortion, machine learning (ML), in particular, deep learning (DL) has shown its prominent ability [11], [12]. In recent years, DL has been applied in wireless communication, e.g., signal detection [13], precoding [14], channel state information (CSI) feedback [15], channel estimation [16], [17], mobile Internet of Things (IoT) [18], etc. However, these DL-based approaches exist weaknesses such as long-time training, complex parameter tuning [10], [19] etc. Different from DL-based scheme, as a single hidden layer feed-forward neural network, extreme learning machine (ELM) can learn quickly, randomly generate for input weight and hidden bias, require no gradient back-propagation, and has good generalization performance [10], [20], [21]. As one of the effective options, using ELM to deal with nonlinear distortion is a promising solution.
For saving bandwidth resources and thus improving spectrum efficiency, the FS using superimposed training sequence is an attractive scheme. Without any occupation of bandwidth resources, this FS superimposes the training sequence on data symbols, yielding more transmission symbols than that of non-superposition mode [22] in the same transmission interval. The superimposed training-based FS has been investigated in past years, e.g., [22]- [25]. These promising FSs promote us to develop further explorations, especially for the scenarios with nonlinear distortion.
Inspired by those advantages of ELM networks and superimposed training, we investigate an ELM-based FS by using superimposed training, which overcomes the challenges from spectrum efficiency and nonlinear distortion during the FS phase. In our work, the merits to cope with nonlinear distortion can be reaped by ELM networks, and then high spectrum efficiency can be achieved by using superimposed training. The combination of ELM network and superimposed training further improves the FS performance in the scenarios of nonlinear distortion, e.g., the error probability of FS. To our best knowledge, for ELM-based FS, there are limited works to focus on nonlinear distortion, much less to focus on superimposed training.

A. RELATED WORKS
We respectively present the related works of DL-based FS and ELM-based FS as follows.
The DL-based FS has been investigated in [26]- [29]. In [26], an artificial neural network (ANN)-based synchronization method was proposed. For the end-to-end communication systems, [27] and [28] investigated the FS based on neural network (NN). In [27], to achieve the task of FS, a deep neural network (DNN) was employed to auto-encoder, and a convolutional neural network (CNN) was developed in [28] to compensate impairments introduced by timing offset and timing error of sampling. As for [29], a CNN-based FS method was proposed to convert the one dimension (1D) correlator to two dimension (2D) matrix to find the frame offset. From [26]- [29], the DL technology provides effective approaches for FS. Nevertheless, these DL-based FSs are still challenged by many issues, such as long training time, complex parameter tuning, large memory requirements, etc.
Relative to DL networks, the ELM network features many advantages [10], [20], [21], e.g., the gradient backpropagation can be avoided, the output weight can be obtained by solving the least square (LS) question, and a single-hidden layer is employed for the feed-forward neural network, etc. In [10], an ELM-based time-division FS was proposed with the consideration of nonlinear distortion. In [30], the ELM network was employed to compensate the residual time offset with time-division mode. Although the error probability of FS is lower than conventional method, the training sequence for FS still occupies the bandwidth resources, which reduces systems' spectrum efficiency. To avoid the occupation of bandwidth resources, an ELM-based FS method using superimposed training is proposed in this paper to reduce the error probability of FS.

B. CONTRIBUTIONS
To overcome the challenges of spectrum efficiency and nonlinear distortion during the FS phase, the ELM-based FS using superimposed training is investigated in this paper. The main contributions of this paper are summarized as follows.
• Firstly, an ELM-based FS method by using superimposed training is proposed. In contrast to the ELMbased time-division FS scheme in [9], not only the occupation of bandwidth resources is avoided in the proposed FS method, but also the smaller error probability of FS is achieved with the same energy cost. • Secondly, the superimposed training-based FS is investigated in the scenarios of nonlinear distortion. Our investigation remedies the deficiencies of the existing superimposed training-based FS, which cannot work well in the scenarios of nonlinear distortion and is suitable for practical application. • Thirdly, extensive experiments are given to verify the effectiveness of the proposed method in this paper. Compared with the classical correlation method in [8] and the time division method in [10], both the FS's error probability and the symbol detection (SD)'s bit error rate (BER) are reduced with the same energy consumption. In addition, the proposed FS presents its robustness against the impacts of parameters. The remainder of this paper is structured as follows: In Section II, we describe the system model. The ELM-based FS using superimposed training method is presented in Section III, followed by the experimental results and analysis are illustrated in Section IV. Finally, Section V concludes our work.
Notations: Bold face upper case and lower case letters denote matrix and vector respectively. (·) T , (·) H , (·) † , denote the transpose, conjugate transpose, and matrix pseudoinverse, respectively. · 2 is the Frobenius norm. |x| denotes the absolute value of x and |x| denotes the absolute value operation to the entry-wise of vector x.

II. SYSTEM MODEL
At the receiver, the received M × 1 complex-valued signal vector, denoted as y, can be expressed as where n ∈ C M ×1 represents the complex additive white Gaussian noise (AWGN) vector with zero-mean and σ 2variance entries.
x ext ∈ C (2N −L+1)×1 denotes the extended vector of transmitted signal with nonlinear distortion, which can be given by where N and L are the size of search window and the number of multi-path, respectively. The unknown frame boundary offset is denoted as τ , whose range is 0 ≤ τ ≤ N − L + 1. In (2), x = [ x 1 , x 2 , · · · , x N ] T denotes the distorted transmitted signal and can be expressed as where f dis (·) represents the influence of nonlinear distortion.
T is the superimposed transmitting signal without nonlinear distortion, which can be given by where ρ ∈ [0, 1] represents the power proportional coefficient (PPC), E denotes the transmitted power of superimposed transmitting signal. s ∈ C N ×1 is the training sequence and c ∈ C N ×1 is the modulated data symbol. The complex matrix H ∈ C M ×(2N −L+1) given in (1) is an M × (2N − L + 1) cyclic matrix, which can be defined as where h = [h 1 , h 2 , · · · , h L ] T denotes the finite CIR vector of L samples memory, and h l represents the complexvalued channel impulse response (CIR) of the lth path, l = 1, 2, · · · , L. With the received signal y given in (1), we employ ELM network to implement FS using superimposed training, as well as reducing the influence of nonlinear distortion.

III. ELM-BASED FS
The performance of FS is seriously degraded by the influence of nonlinear distortion. To conquer this difficulty, the ELM network is introduced into FS using superimposed training due to its prominent ability to cope with nonlinear distortion [10]. In the following subsections, we first present the preprocessing of FS in Section III-A. Then, the ELM-based FS method is given in Section III-B.

A. PRE-PROCESSING FOR FS
In wireless systems, the FS is usually difficult to obtain, especially for nonlinear distortion scenarios. In particular, the error probability of DL-based timing synchronization is far higher than that of matched filtering in [26]. Similar behaviors are also observed in ELM-based FS experiments. Thus, a pre-processing of FS is employed to coarsely capture the features of SM. According to [8], by using the crosscorrelation based method, the SM vector g ∈ R N ×1 can be expressed by g = S H y 2 .
Here, the M × N complex matrix S can be written as where s i , i = 1, 2, · · · , N , represents the ith entry of training sequence s. Need to be mentioned that, besides the crosscorrelation based SM in (6), other SMs can also be applied in our method with the similar processing. In order to standardize the training of ELM network, the g given in (6) is normalized as With the normalized SM (i.e., g), an ELM network is utilized to conquer nonlinear distortion and improve SMs for superimposed training-based FS, which is described in the following subsection.

B. ELM-BASED FS
The ELM-based network is employed to improve SMs and decrease the influence of nonlinear distortion. The ELMbased FS includes offline and online procedures, which are elaborated in TABLE 1 and TABLE 2, respectively. The offline training procedure is described as follows.

1) OFFLINE TRAINING SPECIFICATION
The offline procedure is elaborated in TABLE 1. In the following, we first describe the data collection for ELM-net training.
• Data Collection For ELM-Net training For ELM-net training, N t samples of input signals and offset labels, denoted by {(g i , T i )} , i = 1, 2, · · · , N t are collected to form a training set. According to the N t collected g i forms the input matrix g ∈ R N ×Nt , which can be written as Similarly, N t offset label vectors T i are converted to construct the target output matrix T ∈ R N ×Nt , which can be given by where the label T i can be encoded according to one-hot mode, i.e., where τ i is the ith sample's frame boundary offset. In this paper, the input weights W ∈ R N ×N and hidden layer biases b ∈ R N ×1 of ELM network (with N hidden neuron number) are respectively randomly chosen, which is similar to the standard process of ELM network [21]. It should be noted that the initial values of W and b have a certain impact on the FS performance, we mainly investigate the FS method in this paper. Admittedly, initial values can further improve the FS's error probability and SD's BER. Then, the input weights W and hidden layer biases b are saved in storage space for later use in the offline and online procedure.
• Networking Training As shown in TABLE 1, during the training procedure, the training data-set {(g, T)}, input wights W and hidden layer biases b are gradually loaded from storage space. Then, with {(g, T)}, W and b, the output matrix of the hidden layer H ∈ R N ×Nt can be given by where σ (·) represents the activation function, e.g., sigmoid [31], hyperbolic tangent [32], rectified linear units (ReLU) [33], etc. The sigmoid is used in this ELM network [21]. The objective of ELM network is actually to find the suitable output weights Υ ∈ R N × N to approximate the target output matrix T, which can be expressed as [20] T = ΥH, where Υ = Υ 1 , · · · , Υ k , · · · , Υ N , Υ k denotes the output weighting vector connecting the kth hidden neuron and the output neurons. The LS solution Υ of T = ΥH with minimum norm of output weights Υ is given by [20] the output weighting matrix Υ is learned from offline training of ELM network, which is saved in storage space for online running.

2) ONLINE DEPLOYMENT
With the learned ELM-based network parameters, the online running procedure can be implemented in this subsection, which is shown in TABLE 2.
With the input metric vector q, which can obtained by employing the preprocessing procedure according to (6)-(8), the learned output weights Υ, the random input weights W and hidden layer biases b, the ELM network output O ∈ R N ×1 can be written as Nt} generated by SMs and one-hot label and these two are corresponding to the input and desired output of the ELM, respectively. step 1 : Randomly chose the input weights W ∈ R N ×N and hidden layer biases b ∈ R N ×1 , respectively, set the hidden neuron number N . step 2 : With g, W and b, the output matrix of hidden layer H ∈ R N ×N t is calculated through activation function σ (·) according to (12). step 3 : Construct the target output matrix T using offset label vector T i according to (10), compute the output weights Υ ∈ R N × N according to (14).  (6) to (8), perform the preprocessing to obtain metric vector q as the input of the trained ELM-net. step 2 : With input metric vector q, the learned output weights Υ, the randomly chosen input weights W and hidden layer biases b, the ELM network output O can be obtain according to (15). step 3 : Find the location of the maximum square of the absolute value of the element in O, (i.e., estimation of frame boundary offset τ ) according to (16).
where O = [o 1 , o 2 , · · · , o N ] T , and the estimation of frame boundary offset can be expressed as With the estimation from (16), the FS is completed by acquiring the frame's starting point τ , After FS, the SD is performed, and the detected symbol c can be represented as where x est denotes the estimation of superimposed transmitting signal in the scenario of nonlinear distortion, and can be obtained by The FS and SD can be achieved according to (6)- (18), which improve SMs and address the issues about multi-path interfere and nonlinear distortion.

IV. EXPERIMENTAL ANALYSIS
In this section, numerical results of the proposed ELMbased FS using superimposed training are given. Firstly, basic parameters and definitions involved in simulations are given in Section IV-A. Then, in Section IV-B, the FS's error probability and SD's BER of the proposed scheme with nonlinear distortion is shown to verify the effectiveness of the proposed ELM-based FS, followed by the robustness of improvement with different parameters is discussed in Section IV-C. At last, the computational time complexity is analyzed in Section IV-D.

A. PARAMETER SETTING
In the simulations, the basic parameters are set as N = 512, M = 2N = 1024, N = 10N = 5120 [20], [34], L = 8, and N t = 10 5 . The Zadoff-Chu sequence [35] is employed as the training sequence s. For the time-division method in [10], N s = 16 is considered as the length of training sequence. By referencing [22] and considering the total performance of FS's error probability and SD's BER, ρ = 0.3 is adopted in this paper. The modulated data symbol c is formed according to the symbol of quadrature-phaseshift-keying (QPSK) modulation. For the channel model, the multi-path Rayleigh fading channel with an exponentiallydecayed power coefficient η = 0.2 is considered, where each of the following L − 1 paths is set as zero-valued with a probability of 0.5 beside the first path to keep the same situation as [9] and [10]. For the sake of fair comparison, we assume the superimposed FS and the time-division FS consume the same energy for transmitting symbols. Definitions involved are listed as follows. The signal-tonoise ratio (SNR) in decibel (dB) is defined as [36] SN R = 10log 10 E σ 2 .
For nonlinear distortion, the HPA effect is taken into account in these simulations. The nonlinear amplitude A (x) and phase Φ (x) are obtained by [37] A where α a = 1.96, β a = 0.99, α φ = 2.53, and β φ = 2.82 are considered in the experiments according to [37]. To measure the distortion intensity, the error vector magnitude (EVM) is used in this paper, which is expressed as [38] EVM where x n is the n-th distorted symbol through HPA, which denotes the HPA workings in saturated region. R n denotes the desired linear outputs of HPA given the same input without amplification distortion. In this paper, the EVM is set as EVM = 35% except for the robustness analysis against EVMs. For simplicity, "Prop", "TD_Corr", "TD_ELM", and "Sup_Corr" are used to denote the proposed ELM-based superimposed FS, the correlation-based time-division FS in [8], the ELM-based time-division FS in [10], and the correlationbased superimposed FS method in [22], respectively.

B. FS AND SD PERFORMANCE
To validate the effectiveness of the proposed ELM-based FS using superimposed training, the error probability of FS and BER of SD under different SNRs are illustrated in Fig. 1 and Fig. 2, respectively. The effectiveness of the error probability of FS is presented in Fig. 1. It could be observed that the "Prop" reaches the smallest error probability among different methods. This reflects the "Prop" obtains the best performance of error probability for FS, and thus can work well in the scenarios of nonlinear distortion. In addition, some insights of FS with nonlinear distortion can be achieved in Fig. 1. Firstly, the error probability of "Sup_Corr" is smaller than that of "TD_Corr" with the same energy consumption, which embodies the superiority of superimposed FS compared with VOLUME 4, 2016  time division FS. Secondly, for relatively high SNR (e.g., SN R > 6dB), the "TD_ELM" has a smaller error probability of FS than those of "Sup_Corr" and "TD_Corr". That is, the ELM network effectively suppresses the nonlinear distortion for time division FS in [8], and the ELM-based timedivision FS can further obtain better performance of error probability than that of superimposed FS in [22]. Although the ELM-based time-division FS has shown its effectiveness to suppress the nonlinear distortion, it can only be effective in a relatively high SNR region, and the merits of superimposed FS have not been developed. The "Prop" develops the merits of superimposed training and ELM network, and thus obtains the smallest error probability of FS in all given SNR region. Therefore, the combination of superimposed training and ELM network in "Prop" possesses its effectiveness to improve the error probability of FS in the scenarios of nonlinear distortion.
Since the training sequence s is superimposed on the modulated data symbol c, it needs to be verified whether the superimposed interference (from the superimposed training) degrades the detection performance of data symbols. In this paper, the BER of SD is used to measure the detection performance and is plotted in Fig. 2. From Fig. 2, the "Prop" achieves the smallest BER for almost all given SNRs. Thus, for the same energy consumption, the superimposed interference does not degrade the BER performance. On the contrary, the BER of "Prop" effectively benefits from the proposed FS, especially for the relatively high SNR region (e.g., SN R > 10dB). That is, with simple processing of interference cancellation given in (17), the "Prop" achieves the best BER performance among all given FS methods. In particular, the advantages of superimposed training and ELM network can be separately demonstrated from Fig. 2. Without using superimposition approaches, "TD_ELM" obtains smaller BER than that of "TD_Corr", which reflects the effectiveness of ELM network to deal with nonlinear distortion. We can also observe that the BER of "Sup_Corr" is smaller than that of "TD_Corr". That is, superimposed training used in "Sup_Corr" is useful to improve the BER performance of "TD_Corr". Thus, by combining the superimposed training and ELM network, the BER performance is improved.
As a whole, compared with the "TD_Corr", "TD_ELM", and "Sup_Corr", both the FS's error probability and the SD's BER in the scenarios of nonlinear distortion are improved by "Prop". Especially, compared with the "TD_Corr" and "TD_ELM", the "Prop" can obtain the chance to transmit more data symbols, and thus can further improve the spectrum efficiency. By the way, with different modulation conditions (e.g., BPSK and 16QAM), the proposed method still improves the FS's error probability and SD's BER from Fig. 1 and Fig. 2.

C. ANALYSIS OF PARAMETER IMPACT
In this subsection, the robustness of the proposed scheme against parameter variation is analysed. The impact of EVM is first discussed, followed by the number of multi-path (i.e., L), the transmitted frame-length N , and the PPC ρ. It is worth noting that, besides the change of the impact parameter (i.e, EVM, L, N , and ρ), other basic parameters remain the same as those given in Section IV-A during the simulations.

1) IMPACT OF EVM
EVM is usually used to measure the distortion intensity.
To analyze the robustness of the proposed method against different distortion intensities, Fig. 3 respectively plots the curves of FS's error probability and SD's BER with different EVMs (i.e, EVM = 35%, EVM = 40%, EVM = 45%, and EVM = 50%). From Fig. 3, compared with those of "TD_Corr", "TD_ELM", and "Sup_Corr", the "Prop" method achieves the smallest error probability for each given EVM. That is, relative to the existing methods, the proposed FS scheme still improves the FS's error probability against varying EVMs. With the increase of EVM, the FS's error probabilities for all curves in Fig. 3 (i.e., "TD_Corr", "TD_ELM", "Sup_Corr", and "Prop") increase due to the rise of distortion intensity. However, the FS's error probability of "Prop" is smaller than those of "TD_Corr", "TD_ELM", and "Sup_Corr", especially for the high SNR region. This reflects the proposed scheme can improve the FS's error probability against varying EVMs.
For the SD, Fig. 3 shows the BER of "Prop" is smaller than those of "TD_Corr" and "Sup_Corr" in almost all given SNR regions. Especially, in a relatively high SNR region, e.g., SN R ≥ 12dB, we can observe the BER of "Prop" is smaller than those of "TD_Corr", "Sup_Corr", and "TD_ELM". Thus, for different EVMs, the "Prop" achieves similar or better BER performance.
As a result, against the impact of EVM, the "Prop" possesses its robustness for improving FS's error probability and SD's BER.

2) IMPACT OF L
The FS's error probability and SD's BER are usually impacted by the number of multi-path (i.e., L). To demonstrate the robustness of the proposed FS scheme against the impact of L, the error probability of FS and BER of SD curves are given in Fig. 4, where L = 4, L = 6, L = 8, and L = 10 are considered, respectively. From Fig. 4, relative to the "TD_Corr", "TD_ELM", and "Sup_Corr", the "Prop" achieves the minimal error probability of FS for each given L. This reflects the "Prop" improves the FS's error probability of the existing methods with the variations of L. In addition, with the increase of L, the FS's error probabilities of "Prop", "TD_Corr", "TD_ELM", and "Sup_Corr" rise with the enlargement of multi-path interference. Even so, the "Prop" still presents the ability to cope with the nonlinear distortion and multi-path interference under different values of L, and thus obtains the smallest FS's error probability. As a whole, against the impact of L, the "Prop" can robustly reduce FS's error probability.
From the curves of SD's BER in Fig. 4, the "Prop" achieves the smallest BER for relatively large L (e.g., L ≥ 8). This reflects the "Prop" possesses better BER performance compared with the existing methods when L ≥ 8. For the case where 4dB≤ SN R ≤ 12dB and L = 4, the BER of "Prop" is slightly higher than that of "TD_ELM". This is because the "TD_ELM" consumes additional bandwidth resources to avoid the superimposed interference of "Prop". Moreover, the ELM network is also employed by "TD_ELM", and thus effectively suppresses the nonlinear distortion as well. Nevertheless, compared with "TD_ELM" for the case where 4dB ≤ SN R ≤ 8dB, the "Prop" has only slightly higher BER, while transmitting more data symbols VOLUME   to obtain higher spectrum efficiency. Besides, the BER of "Prop" is still smaller than that of "TD_ELM" when SN R ≥ 12dB.
To sum up, against the impact of L, the "Prop" can effectively reduce the FS's error probability and SD's BER, especially for the cases of relatively large L and relatively high SNR.

3) IMPACT OF N
Usually, the FS's error probability and SD's BER performance are influenced by the frame-length, i.e., N . To validate the robustness against the impact of N , the error probability of FS and BER of SD are illustrated in Fig. 5 with different values of N (i.e., N = 256, N = 512, N = 768, and N = 1024).
From Fig. 5, for the cases where N = 512, N = 768, and N = 1024, the "Prop" obtains a smaller FS's error probability than those of "TD_Corr", "TD_ELM", and "Sup_Corr". This reflects the "Prop" could reduce the error probability of FS against the varying N . When N = 256 and SN R ≤ 8dB, the "Prop" can still obtain the minimum error probability of FS to embodies the robustness against the impact of N . However, for SN R ≥ 10dB, such situation could not be held. As can be seen in Fig. 5(a), the FS's error probability of "Prop" is higher than that of "TD_ELM". This is because the "TD_ELM" can also suppress the nonlinear distortion as that of the "Prop", while the relatively short N brings "Prop" the difficulty to combat the superimposed interference from the data symbol c. Even so, compared with the "TD_ELM", the "Prop" saves the bandwidth resources and significantly reduces FS's error probability in a relatively low SNR region (e.g., SN R ≤ 8dB). In particular, relative to the "TD_ Corr" and "Sup_Corr", the "Prop" clearly reduces the error probability of FS for all given SNRs. In addition, with the increase of N , the FS's error probability of "Prop" decreases. This reflects the elongated N can effectively suppress the superimposed interference of the "Prop", due to the elongation of the superimposed training sequence s.
Compared with "TD_ELM", "TD_Corr", and "Sup_Corr", the "Prop" has similar or smaller BER, which demonstrates the "Prop" obtains the similar or better SD's BER performance with different values of N . Relative to "TD_Corr" and "Sup_Corr", the "Prop" reduces the BER for each given N , especially in relatively high SNR region (e.g., SN R ≥ 10dB). Meanwhile, for the case where N = 256 and 0dB≤ SN R ≤ 10dB, the "Prop" obtains similar BER performance as that of "TD_ELM". When SN R ≥ 10dB, the BER of "Prop" is slightly higher than that of "TD_ELM". The reasons are given as follows. On the one hand, the nonlinear distortion is also suppressed by employing the ELM network in "TD_ELM". On the other hand, the "TD_ELM" consumes additional bandwidth to avoid superimposed interference. Moreover, the corresponding FS's error probability of "TD_ELM" is smaller than that of "Prop", which also deteriorates the SD's BER performance. Even so, the "Prop" can save the bandwidth resource and obtains similar BER performance as that of "TD_ELM" when N = 256. Especially, for relatively large N (e.g., N ≥ 512), the "Prop" can achieve lower BER than that of "TD_ELM" when SN R ≥ 12dB.
On a whole, with the varying of N , the "Prop" can improve the performance of FS's error probability and SD's BER, and

D. COMPLEXITY ANALYSIS
The training time and online running time between the correlation-based superimposed FS method in [10] (i.e., "TD_ELM") and the proposed ELM-based superimposed FS (i.e., "Prop") are illustrated in TABLE 3 to compare the computational time complexity.
For a fair comparison, 10 5 experiments are conducted for "Prop" and "TD_ELM" on the same server with Intel Xeon(R) E5-2620 CPU 2.1GHz_16 by using Matlab software, respectively. During the experiments, only running time is considered, (i.e., the time to generate data sets and through channel is not included). From TABLE 3, in the training phase, the "TD_ELM" consumes about 19.6 minutes, while the "Prop" costs about 20.2 minutes. In the online running stage, the "TD_ELM" consumes about 11.6 minutes, and the "Prop" costs about 11.7 minutes. We can see that the average training and online running time of "Prop" is slightly more than that of "TD_ELM" in each experiment, while the "Prop" improves the spectrum efficiency with similar or better performance of FS's error probability and SD's BER relative to the existing methods.

V. CONCLUSION
In this work, we integrated superimposed training-based FS and ELM network to investigate an ELM-based FS scheme using superimposed training in nonlinear distortion scenarios. Firstly, a preprocessing procedure is employed to coarsely reap the features of SM. Then, an ELM network is introduced to conquer the impact of nonlinear distortion and to obtain the estimation of frame boundary offset. Compared with some existing methods, the proposed method can improve the error probability of FS and BER of SD, and those improvements are robust against parameter variation. In future works, we will investigate the generalization method of ELM-based FS to alleviate the difference between the data set from simulation and the data set in real scenarios.