Is FFT Fast Enough for Beyond-5G Communications?

In this paper, we study the impact of computational complexity on the throughput limits of the {\color{black}fast Fourier transform (FFT)} algorithm for {\color{black}orthogonal frequency division multiplexing(OFDM)} waveforms. Based on the spectro-computational {\color{\corcorrecao}complexity} (SC) analysis, {\color{\corcorrecao} we verify that the complexity of an $N$-point FFT grows faster than the number of bits in the OFDM symbol.} Thus, we show that FFT nullifies the OFDM throughput on $N$ unless the $N$-point discrete Fourier transform (DFT) problem verifies as $\Omega(N)$, which remains a"fascinating"open question in theoretical computer science. Also, because FFT demands $N$ to be a power of two $2^i$ ($i>0$), the spectrum widening leads to an exponential complexity on $i$, i.e. $O(2^ii)$. To overcome these limitations, {\color{\corcorrecao} we consider the alternative frequency-time transform formulation of vector OFDM (V-OFDM), in which an $N$-point FFT is replaced by $N/L$ ($L$$>$$0$) smaller {\color{\corcorrecao}$L$-point} FFTs to mitigate the cyclic prefix overhead of OFDM. Building on that, we replace FFT by the straightforward DFT algorithm to release the V-OFDM parameters from growing as powers of two and to benefit from flexible numerology (e.g., $L=3$, $N=156$). Besides, by setting $L$ to $\Theta(1)$, the resulting solution can run linearly on $N$ (rather than exponentially on $i$) while sustaining a non null throughput as $N$ grows. }


I. INTRODUCTION
T HE fast Fourier transform (FFT) algorithm [1] is among the top-ten most relevant algorithm of the 20th century [2]. FFT outperforms the O(N 2 ) straightforward discrete Fourier transform (DFT) algorithm by performing an N -point frequency-time transform in O(N log 2 N ) time complexity 1 . Particularly for signal communication processing, FFT revolutionized the design of an N -subcarrier OFDM signal by replacing a bank of N synchronized analog oscillators by a single digital chip that requires a single oscillator. 1 As in the computational complexity theory, by "time" or "time complexity", we mean "number of computational instructions" unless otherwise stated. The term is interchangeable with wall-clock runtime, provided the wall-clock time taken by each instruction on a particular computational apparatus.
Ever since, FFT has been employed as the frequency/time transform algorithm by several multicarrier and single carrier waveforms [3].

A. MOTIVATION AND PROBLEM STATEMENT
In recent discussions [4], [5], scholars have doubted the performance abilities of FFT to modulate signals in the future sixth generation (6G) of wireless networks. They point that 6G waveforms are expected to leverage data rate to the order of Terabit per second (Tbit/s) to improve the mobile broadband service of 5G. This envisions signals operating in the so-called terahertz (THz) frequency band of the electromagnetic spectrum i.e., 0.1-10 ×10 12 Hz [6]. To alleviate the power consumption implied by the FFT complexity under wide signals, Rappaport et al. [5] suggest to give up the "perfect fidelity" of the DFT computation on behalf of (slightly) more error-prone approximation algorithms in portable devices [4]. In other words, the throughput gains envisioned by extremely wideband services of future wireless networks can lead the computational complexity of FFT to prohibitive levels in some practical scenarios. From this, it does result a natural trade-off between throughput and complexity that might concern the design of beyond 5G wireless networks.
In this work, we consider the throughput-complexity tradeoff of FFT in the context of OFDM-based waveforms and reason about the throughput limit of a DFT algorithm considering its computational complexity as the number of points grows. In summary, we place the following questions: can the FFT complexity impose a bottleneck that nullifies the OFDM throughput as N grows? Besides, what should be the lower bound asymptotic complexity required to sustain a non null throughput of the DFT problem in OFDM?

B. CONTRIBUTIONS
We study the impact of computational complexity on the throughput limits of different DFT algorithms (such as FFT) in the context of OFDM-based waveforms. The spectrocomputational (SC) analysis [7], [8], [9] is employed to calculate the SC throughput of different DFT algorithms. The SC throughput SC(N ) = B(N )/T (N ) of a signal processing algorithm stands for the computational complexity time T (N ) spent to modulate B(N ) bits into an Nsubcarrier symbol. In the SC analysis, a signal algorithm is asymptotically scalable if its throughput does not nullify as the spectrum grows, i.e., lim N →∞ SC(N ) > 0. Our contributions can be classified into two categories. First, we report novel asymptotic limits relating complexity and throughput of FFT in the context of OFDM signals. Prior works have considered the impact of asymptotic complexity on aspects other than throughput such as DFT silicon area [10], [11] or information loss of computation [12]. Although complexity and throughput have been widely recognized as key performance indicators for future wireless networks [6], to the best of our knowledge, a formal answer to our question still lacks in the literature. In summary, we demonstrated the following novel asymptotic laws for FFT in OFDM: • The throughput of FFT nullifies on N in OFDM. Besides, considering that FFT imposes the number of points to grow as a power of two N = 2 i (i > 0), the spectrum widening causes FFT to run exponentially on i, i.e., O(2 i i); • No exact DFT algorithm scales throughput on N (given a constellation size M ) unless the asymptotic complexity lower bound of the DFT problem verifies as Ω(N ). Currently, this DFT lower bound remains an open "fascinating" question in field of computational complexity [13]; • We formalize what we refer to as the samplingcomplexity (or the Nyquist-Fourier) trade-off. This trade-off accounts for the fact that the DFT complexity increases as the Nyquist interval decreases, causing the N -point DFT computation to become a bottleneck for the sampling task. Considering OFDM symbols of fixed duration, this trade-off cannot be solved since it demands a lower bound of Ω(1) for the DFT problem.
In our second set of contributions, we consider alternative forms of frequency-time transform computation under which the resulting complexity T (N ) meets the fundamental criterion of the SC analysis lim N →∞ SC(N ) > 0, i.e., complexity does not nullify throughput as N grows. We disclose how to meet such criterion for vector OFDM (V-OFDM) [14], a variant of OFDM that replaces an N -point FFT by N/L (L > 0) smaller FFTs to mitigate the cyclic prefix overhead of OFDM. Our contribution results from the fact that other V-OFDM-based works e.g., [15], [16], [17], [18] care on aspects other than the throughput-complexity trade-off for the DFT problem. In this sense, we report the following contributions: • We present the SC analysis of the frequency-time transform problem in V-OFDM. In this context, we replace FFT by DFT to relax the power of two constraint on N and to provide V-OFDM with flexible numerology (e.g. L = 3, N = 156). Besides, we apply the parameterized complexity technique [19] on the DFT algorithm, getting what we refer to as the parameterized DFT (PDFT) algorithm. By setting L = Θ(1), PDFT can run linearly on N rather than exponentially on i while sustaining a non null throughput as N grows; • We identify the most efficient setup of V-OFDM to mitigate sampling-complexity trade-off. By setting L = 2, PDFT becomes multiplierless requiring only O(N ) complex sums. Although this does not solve the sampling-complexity trade-off, the most expensive computational instruction of DFT is eliminated and the additions can be performed in parallel.
The remainder of this work is organized as follows. In Section II, we present a joint throughput-complexity analysis of the DFT problem and the FFT algorithm. We also enunciate the sampling-complexity (Nyquist-Fourier) trade-off, based on which we calculate the minimum asymptotic complexity required for a DFT algorithm to meet the sampling interval of digital-to-analog/analog-to-digital (DAC/ADC) converters in OFDM-based waveforms. In Section III, we present the PDFT algorithm. In Section IV, we present a comparative performance among FFT and the PDFT algorithm and validate our theoretical results. In Section V, we summarize the the work.

II. SPECTRO-COMPUTATIONAL ASYMPTOTIC ANALYSIS
In this section, we study the joint capacity-complexity asymptotic limit of the DFT problem by means of the SC analysis (subsection II-A). Then, in Subsection II-B, we specialize the analysis to the FFT algorithm to respond whether it is sufficiently fast to process signals of increasing throughput. Finally, in Subsection II-C, we relate the DFT complexity with the Nyquist sampling interval and introduce what we refer to as the sampling-complexity (Nyquist-Fourier) trade-off. The notation and symbols used throughout the paper are summarized in Table 1.

A. CAPACITY-COMPLEXITY LIMITS OF THE DFT COMPUTATION
The IDFT at an OFDM transmitter consists in computing the complex discrete time samples Y t , t = 0, 1, · · · , N − 1 of a symbol given the complex samples X k that modulate the baseband frequencies k = 0, 1, · · · , N − 1. According to the Fourier analysis, such relationship is given by At the receiver, a DFT algorithm takes the signal back from time to the frequency domain by performing X k = N −1 t=0 Y t e −j2πkt/N , k = 0, 1, · · · , N − 1. Since in each transform both k and t vary from 0 to N − 1, it is easy to see that the resulting asymptotic complexity T DF T (N ) is O(N 2 ). The FFT algorithm improves this complexity to O(N log 2 N ) at the constraint of N = 2 i , for some i > 0. For this reason, the number of FFT points (hence, channel width) at least doubles across novel wireless network standards targeting faster data rates, e.g., IEEE 802.11ax [20]. For more details about the theory of DFT and FFT, please refer to [21].
The SC analysis proposed in [7], [8] defines the SC throughput SC(N ) bits/time of an N -subcarrier signal processing algorithm as the ratio between the amount of useful transmission bits B(N ) carried by the symbol and the overall computational complexity T (N ) required to build the symbol. For a constellation diagram of size M = 2 p (for some p > 0), each subcarrier modulates log 2 M bits. Thus, in OFDM DFT is performed on a symbol that car-ries a total of B(N ) = N log 2 M useful bits. As usual in the analysis of algorithms, the complexity accounts for the most recurrent and expensive computational instruction. Thus, without loss of generality, let now T DF T (N ) denote the asymptotic number of complex multiplications performed by a given DFT algorithm. Let us also denote T M U LT (d) as the computational complexity to perform a single complex multiplication between two d-bit complex numbers. For OFDM d = log 2 M , then the SC throughput of a DFT algorithm in bits/computational time is, We assume that the channel SNR does not grow arbitrarily on N , meaning that the number of points in the constellation diagram is bounded by a constant, i.e., M = Θ(1). Hence, N is the unique variable of our asymptotic analysis, i.e., N → ∞ 2 . Thus, there exist constants d > 0 and c > 0 such that the SC throughput in (1) rewrites as, Now, proceeding the asymptotic analysis on N and assuming the implied limit exists, all constants can be neglected and the following asymptotic SC throughput results, As N grows, both the number of bits of the OFDM symbol as well as the DFT complexity to assemble it grows accordingly. The condition for the scalability of a DFT algorithm as N grows is given in Def. 1.   Proof. Let M be the length of the largest constellation diagram at which the bit error rate becomes negligible. Assuming the channel SNR does not grow arbitrarily, M is bounded by a constant (i.e., M = Θ(1)), so the number of bits 2 Note that this technicality of the asymptotic analysis does not mean the signal bandwidth is unlimited. Instead, it will enable us to verify whether the SC throughput nullifies on N through a benefit-cost ratio analysis. VOLUME xx, 2022 d = log 2 M per subcarrier. Thus, the computational complexity to multiply two d-bit complex constellation points is bounded accordingly, resulting in a constant c. Therefore, the complexity T DF T (N ) required to process N d bits ensuring the throughput of the DFT algorithm does not nullify as N grows (i.e., remains greater or equal than a non-null constant k) is given by: Considering the O(N ) upper bound of Ineq. 5 along with the fact that no O(N ) storage complexity DFT algorithm can run below Ω(N ) steps [13] -assuming that at least N computational instructions are needed to read the input -a non-null throughput DFT algorithm must run in Θ(N ) time complexity.
If one relaxes the assumption M = Θ(1) by considering M can grow faster or as fast as N (i.e., channels of unbounded SNR), the required T DF T (N ) complexity upper bound can be calculated from Eq. 1 by considering either M = Ω(N ) or M = Θ(N ), respectively. In this case, the overall asymptotic complexity (so the algorithm throughput) also depends on the multiplication algorithm, whose complexity depends on the number of bits per subcarrier d = log 2 M . Considering, as a matter of example, N = Θ(M ) and the O(d log 2 3 ) complexity of the Karatsuba multiplication algorithm [22], the DFT complexity upper bound would be nearly O(N/ log 0.585 2 N ).

B. SPECTRO-COMPUTATIONAL ANALYSIS OF THE FFT ALGORITHM
The FFT algorithm [1] outperforms the O(N 2 ) straightforward DFT algorithm by running in O(N log 2 N ) time complexity. FFT performs O(N ) computational instructions to decrease an N -point DFT problem into two N/2-DFTs per iteration (or recursive calling). This is possible by noting that the frequency samples X k and X k+N/2 (k = 0, 1, · · · , N/2−1) can be computed from the same following N/2-point DFTs: In other words, E k (Eq. 6) and O k (Eq. 7) are the N/2point DFT taken from the even-indexed and odd-indexed time samples of the N -point input array, respectively. Based on them, the Danielson-Lanczos lemma shows that, This way, N/2 iterations are necessary to compute X k and X k+N/2 , yielding a total of O(N ) computations. Each of these iterations needs to solve both the N/2-point DFTs E k and O k . Denoting T DF T (N ) as the complexity of an N -point FFT and applying the Danielson-Lanczos lemma recursively, the overall complexity can be given by the recurrence relation Corollary 1 (Asymptotic Null FFT Throughput). The spectro-computational throughput of the FFT algorithm does nullify as N grows.
Proof. From Lemma 1, the FFT throughput follows, If the SNR can get arbitrarily large such that the constellation diagram length M grows on N then d = Ω(log 2 N ). In this case, the complexity c to multiply two d-bit numbers grows at least linearly on d. Thus, since the fastest multiplying algorithm implies in c = Θ(d), the asymptotic throughput of FFT is given by Eq. 11 at best. Therefore, the FFT throughput nullifies as N grows. Fig. 1 illustrates the asymptotic growth of the FFT throughput for different subcarrier signal mappers assuming a total of T DF T (N ) = N log 2 N complex multiplications. Without loss of generality for the asymptotic analysis, it is assumed each complex multiplication takes a constant c t of 1 picosecond, yielding a total runtime of N log 2 N/10 12 seconds. Note that hardware improvements (e.g., pipelined FFT hardware) translates into lower c t i.e., faster execution of a computational instruction. However, the overall number of instructions remains N log 2 N , meaning that better hardware can improve wall-clock runtime but cannot decrease the complexity of an algorithm. Hence, in fast pipelined FFT hardwares the complexity penalizes performance indicators other than wall-clock runtime such as portability, manufacturing cost and power consumption. Moreover, for all constellations, widening symbol spectrum by increasing the number of subcarriers causes the FFT throughput to decrease rather than increasing. This happens because complexity grows faster than the number of modulated bits in FFT. To overcome this bottleneck, the processing capability of the FFT hardware should scale on N . We believe that the SC analysis of FFT (as illustrated in Fig. 1) formally endorses the issues conjectured by prior works about the infeasibility of FFT for some scenarios of future wireless networks. In this sense, the FFT throughput nullification implied by complexity translates the prohibitive power consumption FFT may experience under "massive channel bandwidths".

C. THE SAMPLING-COMPLEXITY NYQUIST-FOURIER TRADE-OFF
DFT algorithms face two particular issues in the context multicarrier waveforms such as OFDM. The first comes from a mismatch between the unit of processing of DFT algorithms and the other algorithms along the processing block diagram. Although blocks such as "signal mapping" and "cyclic prefix insertion" process a total of N signal samples, they can process them in a sample-by-sample basis. Thus, the processing of a particular sample does not depend on the value of other samples in those blocks.
By contrast, DFT algorithms cannot start running before all N samples are loaded in the input. Hence, the unit of processing of DFT algorithms is N times higher than their preceding and succeeding processing blocks. As N grows, such mismatch turns a DFT algorithm to become a bottleneck along the OFDM block diagram. This problem has been described by the digital radio design literature as a runtime deadline to be met by signal processing algorithms [23], [24], [25], [26], [27]. By formalizing the problem as an asymptotic trade-off between sampling and computational overhead, we can calculate the required asymptotic complexity to meet the sampling interval.
Second, DFT algorithms are responsible to feed the DAC in a classic OFDM transmitter. To avoid signal aliasing at the receiver, the transmitter must sample the time-domain signals produced by the IDFT algorithm within a specific time interval. This interval is calculated from the Nyquist sampling theorem which states that the largest time interval between two equally spaced (time-domain) samples of a signal band-limited to W Hz must be T N Y Q = 1/(2W ) seconds. In the case of complex IQ modulators where the real and imaginary dimensions of the signal are independently and simultaneously sampled by two parallel samplers, In IQ systems, at least W samples must feed the DAC every second -which is known as the Nyquist sampling rate -otherwise the signal frequency can suffer from aliasing thereby preventing its correct identification at the receiver. For an inter-subcarrier space of ∆f Hz, the width of an Nsubcarrier OFDM signal is W OF DM = N ∆f , so a complex time-domain OFDM sample must feed the DAC every, Based on Eq. 12, we relate the asymptotic complexity of DFT algorithms with the Nyquist interval. As result, we introduce the sampling-complexity (Nyquist-Fourier) trade-Off in Def. 2.
Definition 2 (The Sampling-Complexity Nyquist-Fourier Trade-Off). In OFDM radios with ∆f Hz of intersubcarrier space, the N -point DFT computational complexity T DF T (N ) increases as the Nyquist period 1/(N ∆f ) decreases to improve symbol throughput.
The sequence of discrete time samples output by the IDFT algorithm corresponds to the time-domain version of the OFDM symbol that lasts T SY M = 1/∆f seconds. In the design of a real-time OFDM radio the entire digital signal processing must take no more T SY M , otherwise the system either suffers from sample losses or misses the real-time communication capability [23], [24], [25], [26], [27]. We capture this condition in terms of asymptotic complexity in Lemma 2. Proof. Considering that a DFT algorithm is the asymptotically most complex procedure of the basic OFDM waveform, its complexity must satisfy Assuming that the throughput improvement is achieved by enlarging N and that the symbol duration does not grow on N (e.g., IEEE 802.11ac [28]), it follows that Note that one can relax the complexity lower bound predicted by Lemma 2 if the radio digital baseband processing capabilities can grow arbitrarily on the number of subcarriers.

VOLUME xx, 2022
However, with the end of the so-called "Moore's law" [29], higher processing capability translates into higher manufacturing cost, power consumption and hardware area, bringing doubts to the feasibility of portable multicarrier Terahertz radios.
The Corollary 2 follows from Lemma 2.
Corollary 2 (Unfeasible Nyquist-Constrained DFT). Given that the minimum possible lower bound complexity of the DFT problem is Ω(N ) [13] and the Nyquist interval imposes an upper bound of O(1) (Lemma 2), no DFT algorithm can meet the Nyquist interval as N grows.
To face the result of the Corollary 2, one may relax the Nyquist constraint which results in the compressive sensing systems [30]. However, high accuracy signal frequency prediction in such systems has been proved to be a NP-hard problem [31] which turns out to much more complex systems because only exponential time algorithms are known for that class of problems.
Note that the sampling-complexity trade-off does not restrict to multicarrier waveforms such as OFDM and its variants but also to single carrier signals that rely on DFT to mitigate the peak-to-average power ratio of uplink transmissions in wireless cellular networks [32]. Of course, the trade-off is more critical to waveforms designed for broadband traffic services that target wider spectrum.

III. PUSHING THE CAPACITY-COMPLEXITY LIMITS OF DFT
In this section, we consider methods to overcome the throughput bottleneck faced by N -point DFT algorithms such as FFT (Section II) and discuss a solution to mitigate the sampling-complexity trade-off described in Subsection II-C.

A. PARAMETERIZED COMPLEXITY
To mitigate the Nyquist-Fourier trade-off in practice, we apply an algorithm design technique inspired in the parameterized complexity [19]. The parameterized complexity was originally proposed to enable the polynomial time solution of multi-parameter NP-complete problems. The idea consists in bounding one or more parameters of the problem such that the complexity of the solution becomes a polynomial function of the non-bounded parameters. For a comprehensive study about the parameterized complexity please, refer to [19].
We consider an alternative parameterized formulation of the frequency-time transform problem in order to achieve faster-than FFT computations. In a typical OFDM transmitter, the IDFT operation associates N input frequency samples X k (k = 0, · · · , N − 1) to N respective baseband frequencies k Hz at the time instant t by the complex multiplication X k e j2πkt/N . The direct IDFT algorithm repeats these N multiplications to compute N time samples, which yields a total of O(N 2 ) operations. To cut this complexity, we parameterize the number g ≤ N of frequency samples associated to a given baseband frequency, as illustrated in  We identify that the waveform resulting from the parameterized DFT computation we have just described is not new. Indeed, it exactly matches vector OFDM (V-OFDM), a waveform originally proposed to reduce the cyclic prefix overhead of OFDM [14]. Prior works have investigated V-OFDM with respect to different aspects. Cheng et al. [17] study the BER performance in Rayleigh channels and Li et al. [18] identify setups in which the V-OFDM BER performs similarly or better than OFDM for different low-complexity receivers. More recently, V-OFDM has been merged with other signal processing techniques such as index modulation [16] and MIMO [15].
Our work builds on those prior works to present novel results for V-OFDM. In particular, we rely on V-OFDM as an alternative to avoid the throughput nullification faced by FFT in OFDM. Also, we exploit the VB structure to relax the power of two constraint of FFT without giving up a fast asymptotic complexity. By releasing all V-OFDM parameters from growing as powers of two, more flexible numerologies can be enabled (e.g. n = 3, N = 156). In Subsections III-B and III-C, we review the V-OFDM signal and discuss how to relax the N = 2 i constraint of V-OFDM keeping a complexity that does not nullify throughput on N , respectively.

B. THE VECTOR OFDM SIGNAL
The V-OFDM transmitter arranges the N -sample complex frequency domain symbol {X i } N −1 i=0 into L complex vectors blocks (VBs) x l (l = 0, 1, · · · , L − 1) having M = N/L samples each. Denoting [·] T as the transpose of the matrix [·], the samples of {X i } N −1 i=0 within the l-th VB x l is given by The sequence of complex frequency domain samples is The q-th time domain VB (q = 0, 1, · · · , L − 1) is denoted as The V-OFDM literature [15], [16], [17], [18] performs M inverse L-point FFTs to calculate each time domain VB. Since this contrasts to a single N -point FFT of OFDM, we refer to it as the Parameterized FFT (PFFT). The resulting samples within the q-th time domain VB is therefore The time domain transmitting sequence is Both the normalized inverse DFT and DFT signals are respectively summarized as follows x l e j2πql/L q = 0, 1, · · · , L (21) y q e −j2πql/L l = 0, 1, · · · , L (22) After the inverse DFT transform, the signal follows as in the classic OFDM waveform for transmission. At the receiver, the signal undergoes the reverse steps unless for the detection processing whose complexity can grow exponentially on the VB size (e.g., maximum likelihood estimation). While the conditions for low complexity detection have been discussed by the V-OFDM literature e.g., [15], [30], in this work we focus on the complexity of the IDFT/DFT problem in V-OFDM regardless of the chosen detection heuristic. In what follows, we adopt the notation LM (instead of ng) that is usual across the V-OFDM literature.

Algorithm 1
The parameterized (inverse) DFT (PDFT) algorithm for the Vector OFDM waveform. By relying on the DFT algorithm and the parameterization technique, PDFT relaxes the N = 2 i constraint of FFT (thereby enabling a wider range of numerologies) and can run linearly on N rather than exponentially on i. If N = ML is set to grow as a power of two 2 i , setting L to Θ(1) leads both FFT and PDFT to run in O(M) = O(2 i /L) time complexity. However, if that constraint is relaxed, PDFT can provide V-OFDM with flexible numerology while running linearly on N rather than exponentially on i. The flexible numerology of PDFT, turns V-OFDM a competitive waveform for spectrum allocation in fragmented frequency bands. Besides, the reduced complexity is a step towards the enhancement of current broadband-driven services such as the enhanced mobile broadband service of 5G [33] and the very high throughput service of IEEE 802.11ac [28].

D. MULTIPLIERLESS PARAMETERIZED DFT AND THE SAMPLING-COMPLEXITY TRADE-OFF
We identify that the specific case L = 2 can have notable implications for the lower bound complexity of the frequency-time transform problem in V-OFDM. As explained in Subsection II-C, a lower bound complexity of Ω(1) is required if the frequency-time transform computation of a N -point signal is constrained by the Nyquist sampling theorem. This is typical requisite of real-time implementations of physical layer standards such as 5G [24] and Wi-Fi [23], [25], [26], [27]. Next, we explain how the L = 2 case of V-OFDM relate to the sampling-complexity trade-off.
By setting L to 2, the N -subcarrier V-OFDM symbol is vectorized into only two VBs, leading to N/2 2-point DFTs. Since these 2-point DFTs are completely independent from each other, they can be computed in parallel. Each 2point transform takes O(1) time complexity regardless of the value of N , therefore the entire solution requires N/2 complex additions. Indeed, both the indexes l and q that iterate across the frequency and time VBs (lines 10 and 8 in Algorithm 1, respectively), vary from 0 to 1, causing the complex exponential to simplify to either 1 or −1. The two and Therefore, from the perspective of an analysis that considers complex multiplications as the asymptotic dominant instruction of the DFT problem, the L = 2 case satisfies the Ω(1) lower bound requisite of the sampling-complexity trade-off. By contrast, if all instructions are considered, the solution does not meet that requisite. However, the O(N ) complex additions are easier to implement in practice and can remove the DFT bottleneck by being performed in parallel. Note also that the case L = 1 of V-OFDM dispenses the DFT computation at the transmitter but requires an extra N -point IDFT at the receiver. In turn, the case L = 2 is multiplierless at both the transmitter and the receiver.

IV. EVALUATION
In this section, we present simulation results to compare the FFT and PDFT algorithms and to validate our theoretical analysis. Please, note that FFT remains the recommended choice even for the most recent variations of V-OFDM [15] [16]. Moreover, the FFT performance remains a reference for the frequency-time transform problem in current and upcoming wireless network physical layer standards [5], [33], [20]. In Subsection IV-A, we describe the methodology of the simulations. In Subsection IV-B, we discuss the performance of both algorithms under a power of two number of points, as required by the FFT algorithm. In Subsection IV-C, we discuss the performance of the PDFT algorithm under a non power of two number of points.

A. TOOLS AND METHODOLOGY
We compare our proposed PDFT algorithm for V-OFDM against the FFT algorithm employed by both OFDM and V-OFDM state of the art. We implement the PDFT algorithm in C++ and refer to the FFT implementation of [34] to assess the FFT algorithm. It is important to remark that the runtime performance of our chosen FFT implementation can be outperformed by highly optimized FFT libraries available in the literature e.g., [35]. However, these libraries impose several preliminary runs of distinct DFT algorithms to pick the one that perform best for the considered platform and value of N . Hence, the chosen algorithm may vary across distinct values of N and the assessed runtime is highly dependent on several hardware optimizations that vary across the chosen platform. By contrast, our focus in this work is on the asymptotic complexity improvement rather than on hardware optimizations that can be handled in future work.
We vary the number of points which is equivalent to the number of subcarriers N for both algorithms. In this simulation, we vary N as powers of two considering a relatively small number of subcarriers, as in today's FFTbased waveforms. In the other simulation, we consider non power of two N and a minimum of 10 5 subcarriers. In this simulation, we also vary the number of VBs of PDFT, as well as the number of points per VB. For each algorithm, we assess the runtime T DF T (N ) (seconds) and the throughput SC(N ) (Megabits per second) according to the Def. 1. We also report the complexity of the compared algorithms. Note that the complexity captures the total number of calculations performed by the algorithms and holds irrespective of the way they are implemented (such as pipelined ASIC). Unless differently stated, the throughput of each algorithm was measured considering each subcarrier is BPSK-modulated.
We sampled the wall-clock runtime T DF T (N ) of each algorithm with the standard C++ timespec library [36] under the profile CLOCK_MONOTONIC on a 1.8 GHz i7- 4500U Intel processor with 8 GB of memory. We repeated each experiment as many times as needed in order to achieve a mean with relative error below 5% with a confidence interval of 95%. Each sample of T DF T (N ) was forwarded to the Akaroa-2 tool [37] for statistical treatment. Akaroa-2 determined the minimum number of samples required to reach the transient-free steady-state mean estimation for T DF T (N ). In each execution, we assigned our central processing unit (CPU) process with the largest real-time priority and employed the isolcpus Linux kernel directive to allocate one physical CPU core exclusively for each process. We generate the input points for the algorithms with the standard C++ 64-bit version of the Mersenne Twistter (MT) 19937 pseudo-random number generator [38] set to the seed 1973272912 [39]. In Tables 2 and 3 of Appendix A, we report the statistics and results discussed in subsection IV-B and Subsection IV-C, respectively.

B. POWER OF TWO DFTS
In this Subsection, we evaluate the performance of FFT and PDFT algorithms under power of two number of points, as required by the FFT algorithm. In Fig. 3, we plot the runtime of the FFT algorithm (employed by OFDM and V-OFDM) and the multiplierless PDFT algorithm we propose for V-OFDM set to two N/2-subcarrier vector blocks. In Fig. 4, we plot the total number of arithmetic instructions predicted by the theoretical complexity analysis. The overall number of arithmetic instructions performed by the FFT algorithm and the PDFT algorithm are at least 5N log 2 N [35] and N (Subsection III-D), respectively. The statistics of the runtime are reported in Table 2. We report the throughput considering the BPSK modulation in which one bit modulates one subcarrier. Thus, one can reproduce Fig. 5 and Fig. 6 just by multiplying the BPSK-based throughput with the number of bits achieved by other modulation, e.g., 6 in the case of 64-QAM.
As one can observe in Fig. 3 and Fig. 4, the exponential nature of the FFT complexity becomes clear after N = 2 12 = 4096 points. Because the FFT algorithm demands N to grow as a power of two 2 i (for some i > 0), the number VOLUME xx, 2022 !" !" !" !" FIGURE 7: Runtime of FFT and the proposed PDFT algorithms for a number of points N = 1 · 10 5 , 2 · 10 5 , · · · , 6 · 10 5 . For FFT, only the powers of two 2 17 = 131072, 2 18 = 262144 and 2 19 = 524288 are considered. of DFT points must at least double in novel standards that adopt more subcarriers to improve throughput. Consequently, the complexity of the FFT algorithm grows accordingly. We highlight the performance of FFT for the largest number of points of different wireless communication standards. In the case of the IEEE 802.11a [40], IEEE 802.11ac [28] and 5G [33] physical layer standards the maximum number of FFT points are 64, 512 and 4096, respectively. Considering the 5N log 2 N arithmetic instructions of the Cooley-Tukey algorithm [35], no less than 1920, 23040 and 245760 arithmetic instructions must be performed by FFT in those standards, respectively. In our simulation, these complexities caused the FFT runtime to grow at least one order of magnitude, which corresponded to 3.58 µs, 33.97 µs and 363.8 µs, respectively, as reported in Fig. 3.
The wall-clock runtime of FFT can be improved if FFT is implemented on dedicate hardware such as applicationspecific integrated circuits (ASICs). However, as shown in Fig. 4, the overall number of arithmetic instructions remains exponential irrespective of the implementation technology. Thus, the FFT complexity represents a serious concern for other relevant performance indicators of future networks like manufacturing cost, area (device portability) and power consumption.
By contrast, the proposed PDFT algorithm performed about two orders of magnitude better than FFT for all scenarios, even under the power of two constraint of FFT. Also, the FFT algorithm nullifies on N . In the simulation, this behavior can be observed by noting that the FFT throughput reaches the maximum value for N = 2 6 but achieves nearly the same value for N = 2 2 and N = 2 18 (Fig. 5). In turn, the PDFT algorithm keeps nearly the same throughput after N = 2 7 (Fig. 6). According to our theoretical analyses, this stems from the fact that both the PDFT complexity and the number of processed bits grows linearly on N . Therefore, the PDFT throughput tends to a non-null constant as N gets arbitrarily large.

C. NON POWER OF TWO DFTS
In this Subsection, we evaluate the performance of the PDFT algorithm under a non power of two number of points N . We vary N through 1 · 10 5 , 2 · 10 5 , · · · , 6 · 10 5 . In Figs. 7 and 8, we plot the runtime and throughput performance of the proposed PDFT algorithm, respectively. We vary the number of vector blocks L = 2, 3, 4, 5 and plot the performance of the FFT algorithm by setting N to the existing powers of two in the interval [1 · 10 5 , 6 · 10 5 ], namely 2 17 = 131072, 2 18 = 262144 and 2 19 = 524288. PDFT requires the length N/L of each vector block to be an integer. This requisite is met by all chosen values of N and L except L = 3. In this case, we decrease N by N mod 3 to ensure N/L is an integer (x mod y returns the remainder of division of x by y). Thus, for L = 3 the values of N 10 5 , 2 · 10 5 , 4 · 10 5 and 5 · 10 5 are subtracted by −1, −2, −1 and −2, respectively. The runtime and throughput of the FFT and PDFT algorithms are taken from Table 2 and Table 3, respectively. Both tables have the same structure of columns, as we explained in Subsection IV-B.
As one can see in Fig. 7, the runtime performance of PDFT improves for lower values of L. The best performance is achieved for L = 2 in which PDFT becomes multiplierless and performs N/2 2-point transforms. Although the PDFT performance worsens for larger L, its complexity remains linear on N for all evaluated setups. This happens because PDFT exploits the parameterization technique to perform M = N/L independent L-point DFTs. By setting L to Θ(1), each independent DFT takes L 2 = Θ(1) time complexity, yielding a total of (N/L) · Θ(1) = O(N ) complexity.
The lowest complexity of PDFT (achieved with L = 2) translates into the fastest throughput among all algorithms, which is about two orders of magnitude above all other algorithms, as one can see in Fig. 8 where throughput is plotted considering one bit per point (i.e., BPSK modulation). Despite that, PDFT sustains a non-null throughput for all values of L whereas FFT nullifies as N grows.
The throughput nullification happens because the complexity grows asymptotically faster than the number of mod-ulated bits as N grows. In the case of PDFT, the throughput remains constant as N grows even considering the fact that complexity grows too. Besides, because PDFT relies on the straightforward DFT algorithm rather than FFT, the number of points can grow in an unitary manner rather than doubling. Considering the range of the experiment [1 · 10 5 , · · · , 6 · 10 5 ] for example, there exist 250001, 166667, 125001 and 100001 possible setup choices of N for PDFT under L = 2, L = 3, L = 4 and L = 5, respectively. By contrast, there are only three choices of N for the FFT algorithm in the same range, they are 2 17 = 131072, 2 18 = 262144 and 2 19 = 524288. This can provide standardization bodies with more setup choices for future multicarrier wireless communication standards.

V. CONCLUSION AND FUTURE WORK
In this work, we demonstrated that the fast Fourier transform (FFT) algorithm can be too complex for the post-5G generation of broadband waveforms. The constraint that the number of points N must grow as a power of two 2 i (for some i > 0) along with the unprecedented growth in the number of subcarriers, cause FFT to run in the exponential complexity O(2 i · i). Also, because this complexity grows faster than the number of modulated bits, the FFT throughput nullifies as N grows. We generalized this result to show that the throughput of any DFT algorithm nullifies on N unless the lower bound complexity of the DFT problem verifies as Ω(N ), which is an open conjecture in computer science.
To overcome the scalability limitations of FFT, we consider the alternative frequency-time transform formulation of vector OFDM (V-OFDM) [14], a waveform that replaces an N -point FFT by N/L (L > 0) smaller FFTs to mitigate the cyclic prefix overhead of OFDM. In this sense, we replace FFT by DFT to relax the power of two constraint on N and to provide V-OFDM with flexible numerology (e.g. L = 3, N = 156). Besides, by parameterizing L to Θ(1), we identify that the resulting DFT-based solution (we refer to as parameterized DFT, PDFT) can run linearly on N rather than exponentially on i.
We also formulate what we refer to as the samplingcomplexity (Nyquist-Fourier) trade-off, which stems from the fact that the N -point DFT algorithm operates on a batch of N samples but its associated sampler operates on a sample by sample basis. As N grows, the Nyquist inter-sample time interval demanded by the sampler decreases but the DFT complexity to compute all samples increases. We demonstrate that the asymptotic solution of the trade-off would require Θ(1) DFT algorithms. Since DFT algorithms grows linearly on N at best, i.e., Ω(N ), no DFT algorithm can meet the Nyquist deadline as N grows. However, we identify that the trade-off can be countered in practice if V-OFDM is set to two N/2-subcarrier vector blocks (i.e., L = 2). In that case, the transform simplifies to N/2 complex sums that can be performed in parallel both at the transmitter and receiver. Thus, the N -point DFT becomes multiplierless and each sample that feeds the DAC/ADC comes only from two -rather than N -other samples. We believe these results turn V-OFDM into a competitive candidate waveform for future broadband wireless networks.
In future work, the PDFT-based V-OFDM implementation can be coupled to an analog Terahertz radio and the optimal parameterization for the PDFT complexity can be identified under different channel propagation conditions. The joint throughput-complexity asymptotic limit of detection algorithms can be investigated as well. In this sense, one may concern about enhancing the analytic model employed in this work to capture the natural trade-off between complexity and bit error rate in algorithms such as signal detection and error correction codes. Also, the impact of sampling on the SC analysis of DFT algorithms can be investigated under other conditions not considered in this work such as variable symbol duration and sub-Nyquist samplers [30], [31]. .

APPENDIX A SIMULATION RESULTS
In Tables 2 and 3, we report the statistics of each simulation. Both tables report the number of points, the algorithm, the runtime in µs, the throughput, the runtime's half-width of the confidence interval and the runtime's variance, respectively. No experiment demanded more than 70000 repetitions and an average of about 500 samples were discarded due to the transient stage.