Permutation Polynomial Interleaved DFT-s-OFDM

We propose frequency-domain interleaver for discrete Fourier transform spread orthogonal division multiplexing (DFT-s-OFDM) based on a linear- or quadratic permutation polynomial (LPP/QPP). Interleaving the Fourier coefficients (i.e., the DFT precoder output) implies that the modulation symbols become transmitted over both time- and frequency domain, which is beneficial over time-frequency selective channels. Despite that the single-carrier property is lost due to the interleaving, the peak-to-average-power ratio (PAPR) can be improved. The results show that a QPP can suppress the error floor of the bit/block error rate (BER/BLER) which occurs on channels with large Doppler spread and simultaneously reduce the PAPR. An LPP primarily decreases the PAPR, especially for BPSK, where the gain is several dB. We derive criteria of how to analytically determine the QPPs and the LPPs.


I. INTRODUCTION
R ELIABLE communications over channels with high mobility is an important feature of cellular systems and an active area of research [1], [2]. In particular, waveforms which perform well in such scenarios, e.g., for high speed trains, satellites [3] etc., are of interest for 5G systems because communications at velocities up to 350 km/h should be supported, and in some cases even as high as 500 km/h [4]. Thereto, higher frequency bands are introduced in 5G compared to 4G systems, which further exacerbates the Doppler effect. Orthogonal frequency division multiplexing (OFDM) and discrete Fourier transform spread OFDM (DFT-s-OFDM) are used in both 4G Long Term Evolution (LTE) and 5G New Radio (NR), where DFTs-OFDM is confined to uplink transmission. It has been proposed to introduce DFT-s-OFDM in the downlink for sub-THz communications [5], [6], [7]. OFDM multiplexes modulation symbols in the frequency domain by dividing the frequency spectrum into subcarriers. DFT-s-OFDM, on the other hand, is a single-carrier waveform which multiplexes modulation symbols in the time domain. Since a modulation symbol is transmitted over the whole frequency spectrum, performance may be better than for OFDM on a frequency selective channel. For OFDM, a modulation symbol is transmitted over the whole OFDM symbol and performance may be better than DFT-s-OFDM on a time selective channel.
Performance improvements for DFT-s-OFDM have been achieved by filtering the Fourier coefficients, i.e., the DFT precoder output. One direction is unitary frequency domain filtering, i.e., the filter only consists of phase rotations and is applied after the DFT precoder [8], [9]. The filter implies that the single-carrier property of DFT-s-OFDM is lost and the net effect is that a modulation symbol becomes transmitted over both time-and frequency domain. The benefit of these waveforms is that they have been shown to outperform OFDM and DFT-s-OFDM in bit/block error rate (BER/BLER) on a fading channel with large Doppler spread. Another direction is non-unitary frequency domain filtering, which is used for frequency domain spectrum shaping (FDSS), primarily to reduce the peak-to-averagepower-ratio (PAPR) [10], [11], [12], [13]. With FDSS, the Fourier coefficients are multiplied with a window function which in turn smoothens the amplitude of the signal. The cost of the FDSS is typically a reduced spectral efficiency and that a larger signal-to-noise ratio (SNR) is required to maintain a given error rate. Furthermore, both non-unitary [14] and unitary FDSS [8] were suggested to generate chirp-based waveforms from DFT-s-OFDM, while also other methods to produce chirp-based waveforms have been considered [15]. These chirp-based waveforms can also improve performance on time-frequency selective channels.
Another way of manipulating the Fourier coefficients was proposed in [16], utilizing frequency domain interleaving, i.e., the DFT precoded symbols are interleaved. Thus, the single carrier property is lost and there is a potential performance gain over DFT-s-OFDM on a time-frequency selective channel. The benefit of this method is its simplicity, no filter is needed, and since DFT-s-OFDM is already implemented in existing 4G/5G systems, introducing interleaving after the DFT precoder could be a rather straightforward addon. The existing parts of the transmit/receive chain could be kept and there would be no significant increase of the implementation complexity. On the other hand, the full potential of frequency domain interleaving is unclear, since prior work used random interleaving and no insight was provided on how to design the interleaver. Thus, it is an open issue how a particular interleaver affects either the BER/BLER or the PAPR. In [17], a block interleaver was utilized to interleave modulation symbols from different OFDM symbols, which is different from [16].
A sequence of randomly permuted integers could be used to interleave the Fourier coefficients [16], however, its performance impact is unpredictable. In an average sense, especially if the permutation sequence is long, random permutations may lead to improved BER/BLER. However, as will be shown herein, some permutation sequences do not improve the BER/BLER but improve the PAPR, and vice versa. Hence, within the set of permutation sequences, there are both 'good' ones and 'bad' ones, depending on the desired performance measure, e.g., BER/BLER or PAPR. Therefore, the performance of random interleaving can generally not be predicted or guaranteed. This leads us to consider interleavers based on permutation polynomials, which by their algebraic construction could be determined to result in desirable signal properties. A second issue with random interleaving is its implementation complexity. The interleaving sequence from a permutation polynomial could be computed in real-time since it is represented by a closedform expression. However, a pseudo-random interleaver does not necessarily have any simple analytical structure and the permutation sequence may need to be stored in memory in the transmitter. A corresponding deinterleaver sequence will be stored in the receiver. An interleaver based on a permutation polynomial, on the other hand, can be fully and succinctly characterized by its polynomial coefficients, which requires less memory.
In this work, we propose the novel use of permutation polynomials for frequency domain interleaver for DFT-s-OFDM. A main benefit of this technique is that it is practically applicable and can be introduced into existing LTE and NR systems. The results show that interleaving can improve the BER/BLER on channels with large Doppler spread and can reduce the PAPR. The main contributions are summarized as follows: • Selection of quadratic permutation polynomial (QPP): We derive a criterion for determining QPPs which improve the BER/BLER compared to DFT-s-OFDM.
Prior methods for filtered DFT-s-OFDM may enhance either BER/BLER or PAPR, i.e., they are not improved simultaneously. In this work, we demonstrate that it is possible to enhance BER/BLER and PAPR simultaneously and a QPP can perform better than random interleaving. • Selection of linear permutation polynomial (LPP): We derive a criterion for determining LPPs which improve the PAPR, especially for BPSK, compared to DFT-s-OFDM. • Signal properties: We express frequency domain interleaved DFT-s-OFDM as a waveform with orthogonal basis functions. In contrast to those obtained by random interleaving, the basis functions are characterized by being sparse, i.e., contain many zero samples, and the sparsity is determined by the properties of the permutation polynomial. Moreover, we show that the property of ideal periodic autocorrelation is invariant under any permutation. • Implementation aspects: We show that frequency domain interleaving can equivalently be represented as a linearly precoded DFT-s-OFDM signal. Furthermore, by decomposing the QPP, we determine several ways of generating the permutations to provide implementations with reduced complexity. The rest of the paper is organized as follows. Section II gives the system model and we analyze properties of the interleaved DFT-s-OFDM signal. Based on these insights, Section III focuses on constructing permutation polynomials for reducing BER/BLER and PAPR, respectively. Implementation aspects are discussed in Section IV and the conclusions are drawn in Section V.
Notation: Throughout the paper, vectors are denoted by bold lowercase letters and matrices are denoted by bold uppercase letters, I is the identity matrix, (·) and (·) † indicate the transpose and Hermitian transpose operators, respectively. The modulo-M operator is denoted by (mod M), gcd(x, y) gives the greatest common divisor of x and y and arg(z) gives the angle of z. The conjugate operator is denoted by (·) * , and · and · indicate the floor and ceiling operators, respectively. δ[k] corresponds to the Kronecker delta function satisfying is the unit step function and j = √ −1.

A. SIGNAL DEFINITION
We consider interleavers based on permutation polynomials and for k = 0, 1, . . . , M − 1 with  [18], [19]. Let x[m], m = 0, 1, . . . , M − 1, be the modulation symbols which could be, e.g., data carried by pulse amplitude modulated (PAM) or quadrature amplitude modulated (QAM) symbols, or consist of predetermined reference symbols or be symbols of a synchronization sequence. The modulation symbols are DFT precoded as We define the interleaver using the permutation polynomial π [k] which is applied to X[k] such that with (3), the lowpass equivalent time-discrete signal for n = 0, 1, . . . , N − 1 becomes where the basis function for symbol m is defined as N nk (6) and N (N ≥ M) is the number of subcarriers. A cyclic prefix (CP) of length N CP is prepended to the signal for n = −N CP , −N CP + 1, . . . , −1. The allocated M subcarriers are assumed to be located contiguously, which is in accordance with the uplink frequency resource allocation in LTE/NR. The term N/M can be regarded as an oversampling factor. In the following analysis in Section II and for the evaluations of BER/BLER in Sections III-A and III-B, we will consider N = M and return to the case of N > M in Section III-C, where we consider the PAPR issue.

B. CHANNEL MODEL
We assume a time-discrete channel with the relative channel tap power P l and sample delay τ l ∀l = 0, 1, . . . , L − 1. A time-variant channel based on Clarke's two-dimensional isotropic scattering Rayleigh fading model [20] is used where P is the number of propagation paths per channel tap, f D is the maximum Doppler frequency and θ p and φ p are the angle of arrival and initial phase of the pth propagation path, respectively. Both θ p and φ p are random variables uniformly distributed over [ − π, π) for all p and they are mutually independent.

C. RECEIVED SIGNAL
where (5) and (6) were inserted in (9). By defining N vn (12) where (11) is the time-frequency channel transfer function and (12) is the Doppler-frequency channel transfer function, the received signal at subcarrier c = 0, 1, . . . , M − 1 is  (15), it can be observed that the effect of the channel is a scaling and phase rotation factor (D[0, c]) and the sum which comprises inter-carrier interference (ICI) that may lead to an error floor.

D. LINEAR PERMUTATION POLYNOMIAL
where gcd(f 1 , M) = 1 and f 0 is any integer. An interleaver based on an LPP results in a permuted DFT-s-OFDM signal, which is contained in the following property proven in Appendix A-A.
Property 1: With an LPP interleaver, s[n] is a timeinterleaved and phase modulated DFT-s-OFDM signal, where f −1 1 f 1 ≡ 1 (mod M). From the proof of Property 1, the basis function can be identified as Since (16) and (17) describe time division multiplexing (TDM) of the modulation symbols, it is not expected that an LPP will improve the BER/BLER compared to DFT-s-OFDM.

E. QUADRATIC PERMUTATION POLYNOMIAL
Conditions on the f 2 and f 1 coefficients can be found in, e.g., [18], and f 0 is any integer. Let M = p l 0 0 p l 1 1 · . . . · p l r r be the prime factorization of M where p i , i = 0, 1, . . . , r are prime numbers and the multiplicities l i > 0 are integers. Then, the conditions on the QPP coefficients can be succinctly stated as [21]: . QPP interleavers have been designed particularly for turbo codes, cf. [22], [23]. Such QPPs have been selected to facilitate implementations with parallel contention-free decoding and to achieve low error probability. However, in that context, the interleaver is internal and part of the error correcting code, and does not serve the purpose of a channel interleaver. Herein, the main objective is channel interleaving over time-frequency selective channels. Thus, the previous QPP interleavers are not directly relevant and new design criteria will be needed, which we address in Section III. With a QPP interleaver, the signal (4) with N = M becomes, where the inner sum is a generalized quadratic Gauss sum. Certain QPPs, i.e., irreducible QPPs, produce permutations which cannot equivalently be obtained from an LPP. It has been shown that a QPP is irreducible if and only if M > gcd(M, 2f 2 ) [22]. Moreover, two distinct QPPs could produce the same permutation. The number of irreducible QPPs which provide unique permutations depends on M and can be computed by given formulas [21], [24]. In [22], it was shown that if M is divisible by 8, there exists an irreducible QPP. More generally, according to [21], when the prime factorization of M is such that p 0 = 2, l 0 = 0, or 1, or 2 and l i = 1, i = 1, 2, . . . , r, then there exist no irreducible QPPs. Thus, such M should be excluded for the construction of QPPs. In LTE and NR, resources are allocated in multiples of resource blocks (RBs). The number of RBs is for the data channel using DFT-s-OFDM constrained to be N RB = 2 k 0 3 k 1 5 k 2 where k 0 , k 1 and k 2 are non-negative integers [25]. A resource block contains N RB sc = 12 subcarriers, i.e., M = N RB N RB sc . Using the aforementioned conditions on M, it can be shown that the only resource allocations for which there would not exist an irreducible QPP are when N RB = 1 or N RB = 5, i.e., when M = 12 or M = 60.

F. SIGNAL PROPERTIES OF INTERLEAVED DFT-S-OFDM
For the case of N = M, it is straightforward to verify that the basis function for DFT-s-OFDM is obtained with f is transmitted on time sample m, which results in TDM of the symbols. The basis function is sparse since only 1 out of M samples is non-zero, i.e., carries a modulation symbol, and M − 1 samples are zeros. By using the geometric sum identity for any integer n, it can be shown from (6) that g[0, n] = δ[n] for any interleaver, i.e., it contains 1 non-zero sample. However, our evaluations of (6) with random permutations have shown that the basis functions for m > 0 are typically such that g[m, n] = 0. On the other hand, if a permutation polynomial is used, the basis functions could still contain many zeros. A sufficient but not a necessary condition for zero-valued samples is as follows, which is proven in Appendix A-B.
is a permutation for , it is clear that the polynomial coefficients will determine the zero-valued samples. Fig. 1 shows examples of basis functions obtained from a QPP, where it can be observed that the number of non-zero elements differ among the basis functions and that the power is constant on the non-zero samples for a given basis function. Since a basis function can have multiple distinct peaks, the modulation symbols are not transmitted by TDM.
The basis functions can be further characterized by the periodic crosscorrelation function (PCCF), according to the following property which is proven in Appendix A-C.

Property 3: The PCCF of basis functions is
Using (6), (19) and (22), it can be shown that and Thus, according to (23) the basis functions are orthogonal, which simplifies a receiver and basic channel equalization methods could be applied. Furthermore, according to (24) a basis function has an ideal periodic autocorrelation function (PACF), i.e., it is orthogonal to any time delayed version of itself.
If {x[m]} is a pre-defined sequence of symbols, the signal may be used for channel estimation, synchronization etc. In such cases the PACF of the signal becomes important, which is defined by and the PACF for two modulation symbol sequences is The relation between the PACF of the signal and the basis functions (i.e., implicitly the interleaver) is given by the following property, which is proven in Appendix A-D. Property 4: The PACF of the signal is By utilizing Property 3 with p = m+n, we have for any m that g * [n, d] = ρ g m g m+n [d], which could be inserted in to (27) to give the relation between the correlation functions of the signal, the modulation sequence and the basis functions. If , then by using (6) and (19) in (27) Hence, the ideal PACF property of {x[m]} is invariant under frequency domain interleaving, which simplifies the construction of a reference symbol sequence and any suitable interleaver could be chosen.
Frequency domain interleaving does not change the power of the signal since the modulus sum of the basis functions in either time-domain or over basis functions is constant according to the following property, which is proven in Appendix A-E. Thus, for the purpose to select a permutation polynomial which reduces the PAPR, it suffices to select one to reduce the peak power of the signal.

Property 5:
The power of the basis functions fulfill the following: It should be pointed out that Property 2-5 were proven without the assumption that the interleaver is based a permutation polynomial. Hence, they apply for any type of permutation π [k], including random interleaving. In the following, particular aspects of signal design using LPPs and QPPs will be considered.

III. SELECTION OF PERMUTATION POLYNOMIALS A. QPP FOR REDUCING BER/BLER
The basis function (6) for DFT-s-OFDM with oversampling factor N/M becomes a sinc-like function (cf. (37) with f 1 = 1 and f 0 = 0), i.e., its power |g[m, n]| 2 has a single distinct peak which is located at sample n = mN/M, if N/M is an integer. We anticipate that better performance may be achieved on a time-varying channel for basis functions having multiple peaks, since the transmission of the modulation symbol then experiences diversity. For the case N = M, according to Property 2, the basis functions can have different number of non-zero elements and these can be regarded as the distinct peaks, cf. Fig. 1. In order to obtain basis functions having multiple distinct peaks, we therefore determine permutation polynomials that maximize the number of nonzero elements in the basis functions when N = M. Utilizing the unit step function u(x), the novel criterion we propose to determine such polynomials is to select the QPP to maximize the number of non-zero elements over all M time-samples and M basis functions according to the function The inner sum in (30) is the number of non-zero samples of a basis function, i.e., the number of distinct peaks of (6) when N = M. In Appendix A-J, the following is derived for a QPP which can be used to compute (30). Since a basis function has from 1 to M non-zero values, it is clear that the value range is M ≤ V ≤ M 2 . In particular the lower limit applies to LPPs, according to the following property which is proven in Appendix A-F.
Moreover, an upper bound of (30) can be found, according to the following property which is proven in Appendix A-G.
Property 7: If π [k] is a QPP and M is even, then V ≤ (M − 1)M/2 + 1. In order to determine the QPPs which maximize the value V, the following sufficient condition for zero values of a basis function is useful, which is proven in Appendix A-H.
becomes an integer for as many time samples n as possible. That is facilitated by minimizing the denominator of v. Moreover, the term mf 1 in (32)  We have also evaluated random interleaving (i.e., the set of values {0, 1, . . . , M −1} is randomly permuted) and found that typically V/M 2 ≈ 1, and that V does not show large variations among random interleaving sequences. Condition (30) is thus applicable for selecting permutation polynomials but is not suitable for selection among random interleaving sequences. It also means that the basis functions with random interleaver are typically dense. However, the difference between QPPs and random interleaving not only relates to V but also the distribution of power among the samples. Since the basis functions are dense for random interleaving, the power is much lower per sample than for basis functions of a QPP interleaver. Moreover, as shown in Fig. 1, for a QPP the power is constant for the samples where the basis function is not zero. We have not observed constant power for basis functions obtained from random interleaving. Thus, the potential gain of the larger value V of random interleaving may become limited by the non-uniform power distribution among the samples of the basis function.

B. SIMULATION RESULTS FOR BER/BLER
Considering the time-varying channel and the non-linear basis functions, it does not appear analytically tractable to determine expressions of BER/BLER on closed-form and we   Table 2 to compare performance with DFT-s-OFDM. A CP with N CP ≥ τ L−1 is attached to s[n] and minimum mean square error (MMSE) frequency domain equalization (FDE) is used (see Appendix B). Ideal channel estimation is assumed, thus higher BER/BLER is expected in practice. Note that in [16], advanced iterative receivers were used, which may bring further gains. High speed scenarios for 5G have been defined up to 500 km/h [4] and we also evaluate 120 km/h and 350 km/h. We use a subcarrier spacing of f SCS = 15 kHz and a carrier frequency of f c = 6 GHz, which at 500 km/h corresponds to a maximum Doppler shift of 2.78 kHz, i.e., 19% of the subcarrier spacing. For BLER evaluation, the 3GPP NR polar code and rate matching is used [26] with a decoding algorithm according to [27]. In these simulations, the coded bits are mapped to modulation symbols which are transmitted in 1 DFT-s-OFDM symbol and the same permutation polynomial is used for all transmitted OFDM symbols. Fig. 3 gives a block diagram of the transmitter and receiver.
An issue at higher velocities is the existence of an error floor, which arises due to that the orthogonality among the subcarriers is not maintained, cf. (15). Fig. 4 shows uncoded BER as function of SNR for QPSK. It confirms that polynomial selection according to (30) works, since QPPs with a large V offer lower BER and suppress the error floor. A properly chosen QPP can result in better BER than for

DFT-s-OFDM with and without QPP (f [k ] = 2k 2 + k ), on a Vehicular A channel with 0 km/h and 500 km/h velocity for M = 128.
DFT-s-OFDM, which has the same BER as that of the reducible QPP f [k] = 64k 2 + k. The BER of random interleaving is also shown, i.e., a random interleaving sequence is generated for each transmission attempt and the BER is an average over the interleaver sequences. Thus the result can be interpreted as the expected BER for any random interleaving method. It can be concluded that the best QPP gives a BER comparable to random interleaving, despite that it has a value V M 2 . The selection of polynomial according to (30) is independent of modulation format, and Fig. 5 and Fig. 6 contain the BER for DFT-s-OFDM with the best QPP (i.e., f [k] = 2k 2 + k) which shows that the error floor can be suppressed significantly for the higher velocities, at least by an order of a magnitude for all the modulation formats. It can be seen that the relative gains of QPP interleaver increases with the velocity and for BPSK and QPSK, the BER decreases with the velocity when QPP is applied. For 16-QAM the diversity gain of QPP does not fully compensate for the loss due to the ICI, in comparison to the 0 km/h case. We apply the same condition (30) for determining a QPP for reducing the BLER, i.e., a QPP that maximizes the value V. The desired BLER depends on the application and typically ranges from 10 −1 for mobile broadband (MBB) data to 10 −5 for ultra reliable low-latency communication (URLLC) [28]. A channel code will be able to capture timefrequency diversity effects by itself. Nevertheless, Fig. 7 which contains the BLER with the same QPP as in Fig. 6,  shows that there are substantial gains from QPP interleaving also with channel coding, especially for BPSK and for low BLER. It should be noted that this polar code is specified for QPSK in 3GPP NR system and we apply it herein also for BPSK and 16-QAM. As a further performance reference, OFDM is included, which is shown to provide larger BLER than the DFT-s-OFDM based schemes. Moreover, results for the chirp-based waveform CCDT [8] are included, which is shown to also be better than for DFT-s-OFDM.

C. LPP FOR REDUCING PAPR
The power dynamics of the baseband signal can be characterized by the peak-to-mean envelope power ratio (PMEPR) [29]. The PMEPR should ideally be evaluated on the time-continuous signal but as an approximation, the oversampled signal can be used and we define (33) If we consider an LPP and oversampling by a factor N/M, the signal can be expressed for n = 0, 1, . . . , N − 1 as with the Dirichlet kernel Thus, (36) and (37) show that the signal can be expressed by a set of basis functions, g[m, n], which are non-linear in the LPP coefficients. According to Property 5, the interleaver does not change the average transmitted power of the signal, thus minimization of the PMEPR reduces to minimizing |s[n]| 2 . As opposed to (16), where N = M and the PMEPR does not depend on the interleaver, the LPP coefficients in the basis functions for (37) could affect the PMEPR.
where the phase term being dependent on the time sample n in (36), which is common to all basis functions m, has been removed. Thus, the phase difference between basis functions for two consecutive modulation symbols m 0 and To reduce the peak power on samples n 0 < n < n 1 , different phase values should be used for symbol m 0 and m 1 , such that their basis functions do not add coherently. Therefore, f 0 and f 1 should be chosen to rotate the modulation symbol constellation such that for any consecutive symbols x[m 0 ] and x[m 1 ] taken from the modulation constellation, it holds that We propose that this could be achieved by rotating the constellation an angle according to (40) as, where t = 1 for QPSK and 16-QAM, t = 2 for BPSK and q is an integer. Consider first the case f 1 = 1 where q is chosen such that f 0 is a positive integer. From (42) we can solve for f 0 , and it follows that The resulting phase rotations of (43) or (44) may not be the optimal ones since, firstly, they are derived based on the assumed phase angle of (42) which may not be optimal and, secondly, because the desired phase angle may not be achieved perfectly since f 0 and f 1 have to be integer coefficients of an LPP. The results in Fig. 9, where the 1-percentile of the complementary cumulative distribution function (CCDF) of the PMEPR, which is obtained from simulations, is plotted as function of f 0 , confirm that (43) and (44)   The lowest PMEPR for BPSK is obtained with f 0 = 14, which corresponds to the effective rotation angle 27π/64, i.e., slightly less than π/2. Notably, phase rotation angles in the vicinity of π/2 have been identified to minimize the PMEPR for BPSK in [30].
For the case f 1 = 1, it follows from (36) and (37) that |g[m 0 , n 0 ]| 2 has its main peak at sample n 0 = N/Mf 1 m 0 (mod N) and for m 1 , the main peak is at sample 1 (mod N). Therefore, it may be that the main power contribution on samples N/Mf 1 m 0 < n < N/Mf 1 m 0 + N/M does not come from a modulation symbol being consecutive to m 0 , i.e., if N/Mf 1 m 1 (mod N) = N/Mf 1 m 0 + N/M. Therefore, arg(θ (m 0 )θ * (m 0 + 1)) can depend on the symbol index m 0 which makes it difficult to analytically derive a proper value of f 0 . However, as shown in Fig. 10, there is a periodicity in the PMEPR and some values of f 0 are better than others, but there is no significant gain compared to when f 1 = 1 as in Fig. 9. Fig. 11 contains the CCDF of the PMEPR for an LPP according to (43) and for DFT-s-OFDM (i.e., f [k] = k). The PMEPR reduction is significant for BPSK, while being moderate for QPSK and 16-QAM.

D. RELATION BETWEEN PMEPR AND BER/BLER
A basis function with QPP interleaver may not be a sinclike function and can have multiple peaks (in contrast to the distinct single peak of the LPP basis function (37) shown in Fig. 8), and it may also not have a symmetric shape. Therefore, it is not straightforward to analytically determine the best QPP coefficients in order to reduce the PMEPR, as could be done for an LPP. Fig. 12 contains simulation results of the CCDF for the PMEPR for QPSK using the QPPs of Table 1 and for random interleaving, which is shown to produce the largest PMEPR. Comparing with Fig. 4, it can be concluded that the QPPs which offer the lowest PMEPR give the highest BER.
The results in Fig. 4 and Fig. 12 suggest that either the PMEPR or the BER/BLER is improved. To investigate this further, we set M = 32, which makes it feasible to perform a complete exhaustive search. That is, we generate all LPPs and all irreducible QPPs and for each one of them, signals are produced and simulations are made to determine the PMEPR at the 1-percentile of the CCDF. Then we select the polynomial which minimizes the PMEPR and the uncoded BER is evaluated by simulations for this polynomial to determine the required SNR to achieve 10 −2 BER. These results are contained in Table 3 and Table 5, which show that LPP reduces the PMEPR, especially for BPSK, but not the BER. However, we note that for BPSK, there exist QPPs (e.g., f [k] = 8k 2 + 7k + 4) which simultaneously reduce the PMEPR and the BER, compared to DFT-s-OFDM. The SNR gain of random interleaving is not significant compared to QPP, which also has much lower PMEPR. Table 4 contains the required SNR to achieve 10 −3 BLER, assuming the best QPP from Table 1 which minimizes the BLER and the LPP is selected to minimize the PMEPR according to (43). It can be seen that the QPP improves the BLER more for the higher code rate and the gains are larger for BPSK and 16-QAM. The LPP improves the PMEPR more for BPSK than for QPSK and 16-QAM. Random interleaving has no significant gain over a QPP interleaver.

E. LPP WITH FDSS
The PMEPR reduction methods of LPP and FDSS could be performed jointly such that the transmitted signal becomes where F[k] is the FDSS window. For Fig. 13, a root raised cosine (RRC) window with roll off factor 0.2 has been used and the results show that the gains of the LPP method is maintained, and are specifically substantial for BPSK modulation.

IV. TRANSCEIVER ASPECTS
A frequency domain interleaver could be introduced for DFTs-OFDM with small impact. Nevertheless, a few options for the implementation are discussed in this section.

A. INVERSE PERMUTATION POLYNOMIALS
The receiver is performing the inverse operations of the transmitter, i.e., an N-point DFT, FDE, deinterleaving and an M-point inverse DFT (IDFT). Interleavers based on permutation polynomials allow for simple deinterleaving, which is applied on the received and equalized Fourier coefficients ]. The degree of π −1 [k] may not be the same as for π [k]. The inverse permutation polynomial, and in particular minimum degree inverse permutation polynomials, could be determined by the algorithms in [18], [19], [22]. However, as will be shown, the inverse permutation polynomial π −1 [k] always gives the same value V π −1 of (30) as for the associated permutation polynomial V π . Thus it is not expected that there would be any significant difference in BER/BLER among them, which is captured by the following property proven in Appendix A-I. Property 9: For a permutation polynomial π [k] and its associated inverse permutation polynomial π −1 [k], A large degree of the permutation polynomial may increase the implementation complexity, which is more critical for the mobile terminal than for a base station. Thus, it would be possible to utilize the one permutation polynomial of π [k] or π −1 [k] which has the smallest degree in the mobile terminal.

B. COMPLEXITY REDUCTION OF QPP INTERLEAVER
Since M may be large, computing the values of (1) involves squaring and modulo operation of large integers. This could be avoided by decomposing the polynomial. In [31], it was shown that a QPP interleaver can be expressed as where Q (Q < M) depends on the prime factorization of M and of f 2 . Hence, the number of squaring operations could be reduced from M to Q. Alternatively, complexity reductions can be made as follows.
and it is straightforward to verify that if then, the permutation values can be computed by a linear relation Thus, the number of permutation values which need to be computed by π [k] can be reduced. One example fulfilling (49) is when kd = M. Furthermore, by letting d = 1 and recursively utilizing (48), we obtain for k > 0 Moreover, by using the identity for the sum of natural numbers an equivalent representation is Hence, by utilizing either (51) or (54), squaring operations can be avoided in order to determine the permutation (1).

C. ALTERNATIVE SIGNAL GENERATION
An alternative representation of frequency interleaved DFTs-OFDM is precoded DFT-s-OFDM, which is derived as follows. Define the matrices: Hence, from (56) it can be observed that frequency domain interleaving can alternatively be viewed as DFT-s-OFDM with a precoder G applied before the DFT precoder. Since P is an orthogonal matrix, it follows that the inverse precoder, e.g., to be used in the receiver, could be defined as G −1 = G † . When P is derived from an LPP, the precoder G has simple structure. Namely, if f 0 = 0, then G is a permutation matrix. If f 0 = 0 and N = M, then G is a phase modulated permutation matrix. This could be realized as follows. Suppose y[n] is the output from the precoder and that q = f −1 1 is the inverse of f 1 modulo-M, which always exists since gcd(f 1 , M) = 1. Then, with y = [y[n]] and y = Gx, it will hold that Hence, the (n + 1)th row of G will contain e −j 2π M f 0 qn in column qn (mod M)+1, and zeros otherwise. Generally, from the definition of G = [g kp ] for k = 0, 1, . . . , M − 1 and p = 0, 1, . . . , M − 1, it can be deduced that, where the second step follows from (6). It is straightforward to verify that g 00 = 1, g 0p = 0, ∀p = 0 and g k0 = 0, ∀k = 0 for an LPP with f 0 = 0, i.e., G is a permutation matrix. For a QPP, determining G involves evaluation of the generalized quadratic Gauss sum in (61), for which there are closed-form expressions only in some cases.

D. INVOLUTORY PERMUTATION POLYNOMIALS
A permutation matrix is involutory if P 2 = I, i.e., its inverse is the matrix itself. This means that the same permutation polynomial can be used for deinterleaving and for interleaving, i.e., the inverse permutation polynomial is the same as the permutation polynomial. In that case, the same interleaver can be used for transmission and reception, which reduces the implementation complexity. This requires that the self-inverse property holds and inserting a QPP into (62), the following congruence equations determine the necessary conditions.
For example, it can be verified that f [k] = 8k 2 + 63k is irreducible and fulfills (62) for M = 128. The results contained in Fig. 14 show that in most cases, a QPP which is its own self-inverse, has the same maximum value V as any other QPP for a given M. It is also possible to construct LPPs which have the selfinverse property and the conditions are given by (63)-(67) with f 2 = 0. For example, for even M such an LPP is f 1 = 1 and f 0 = M/2. It has been shown that P commutes with the DFT matrix [32]

V. CONCLUSION
Frequency-domain interleaving could be straightforwardly introduced in existing 4G/5G systems utilizing DFT-s-OFDM, without significant changes to the transmit/receive chain implementation. The interleaver will also not impact the spectrum of the transmitted signal, e.g., out-of-band emissions. The proposed novel criteria for constructing permutation polynomials result in improved BER/BLER on channels with high Doppler spread and can produce lower PAPR than for DFT-s-OFDM. Notably, for some cases, e.g., BPSK with QPP, both the BER/BLER and PAPR can be improved simultaneously. Random interleaving provides much worse PAPR and has no obvious advantage, in terms of performance or implementation complexity. With a QPP interleaver the basis functions have different shapes, e.g., different number of peaks, which suggests that modulation symbols carried on different basis functions may experience unequal reliability. We leave for further study the construction of permutation polynomials considering this aspect as well as use of cubic permutation polynomials. Moreover, the case of discontiguously located subcarriers, evaluation with advanced receivers and channel estimation could be considered.
where the substitution p − m = t has been utilized.

E. PROOF OF PROPERTY 5
For the first relation, where the variable substitution s − k = t has been utilized.

APPENDIX B
We assume that N CP ≥ τ L−1 and the received signal in the frequency domain, after CP removal, can then be expressed on matrix form as whereη is additive white Gaussian noise (AWGN) and H is the channel convolution matrix, which for a time-varying channel does not become circulant [33] and