Frequency-Domain Signal Processing for Spectrally-Enhanced CP-OFDM Waveforms in 5G New Radio

Orthogonal frequency-division multiplexing (OFDM) has been selected as the basis for the fifth-generation new radio (5G-NR) waveform developments. However, effective signal processing tools are needed for enhancing the OFDM spectrum in various advanced transmission scenarios. In earlier work, we have shown that fast-convolution (FC) processing is a very flexible and efficient tool for filtered-OFDM signal generation and receiver-side subband filtering, e.g., for the mixed-numerology scenarios of the 5G-NR. FC filtering approximates linear convolution through effective fast Fourier transform (FFT)-based circular convolutions using partly overlapping processing blocks. However, with the continuous overlap-and-save and overlap-and-add processing models with fixed block-size and fixed overlap, the FC-processing blocks cannot be aligned with all OFDM symbols of a transmission frame. Furthermore, 5G-NR numerology does not allow to use transform lengths shorter than 128 because this would lead to non-integer cyclic prefix (CP) lengths. In this article, we present new FC-processing schemes which solve the mentioned limitations. These schemes are based on dynamically adjusting the overlap periods and extrapolating the CP samples, which make it possible to align the FC blocks with each OFDM symbol, even in case of variable CP lengths. This reduces complexity and latency, e.g., in mini-slot transmissions and, as an example, allows to use 16-point transforms in case of a 12-subcarrier-wide subband allocation, greatly reducing the implementation complexity. On the receiver side, the proposed scheme makes it possible to effectively combine cascaded inverse and forward FFT units in FC-filtered OFDM processing. Transform decomposition is used to simplify these computations. Very extensive set of numerical results is also provided, in terms of radio-link performance and associated processing complexity.


I. INTRODUCTION
O RTHOGONAL frequency-division multiplexing (OFDM) is the dominating multicarrier modulation scheme and it is extensively deployed in modern radio access systems. OFDM offers high flexibility and efficiency in allocating spectral resources to different users through the division of subcarriers, simple and robust way of channel equalization due to the inclusion of cyclic prefix (CP), as well as simplicity of combining multi-antenna schemes with the core physical-layer processing [2]. The main drawback is the limited spectrum localization, especially in challenging new spectrum use scenarios like asynchronous multiple access, as well as mixed-numerology cases aiming to use adjustable symbol and CP lengths, subcarrier spacings (SCSs), and frame structures depending on the service requirements [3]- [7].
Initial studies on filtered OFDM were based on time-domain filtering [8]- [10], and later also polyphase filter-bank-based solutions have been presented [11]- [14]. Fast-convolutionbased filtered OFDM (see Fig. 1) has been presented in [15]- [18]. Especially in [18], the flexibility, good performance, and low computational complexity of FC-F-OFDM was clearly demonstrated in the 5G-NR context. These schemes typi-cally apply filtering in continuous manner over a frame of CP-OFDM (or zero-prefix-OFDM) symbols.
The problematic or inconvenient aspect of the conventional time-domain filtering-based schemes is their high complexity. In [19], it was shown that the complexity of time-domain filtering based solutions in mixed-numerology case is 70 times the complexity of CP-OFDM processing. This complexity can be reduced using classical (e.g., polyphase) filter-bank models, however, these solutions have somewhat reduced flexibility in adjusting the subband center frequencies and bandwidths. Since FC is a block-wise processing scheme with fixed block length, the position of the useful parts of the OFDM symbols vary within a frame of transmitted OFDM symbols. With the usual continuous FC-processing model as described in [15], [18], it is necessary that the CP lengths and useful symbol durations correspond to an integer number of samples at the lower sampling rate used for transmitter OFDM processing at each subband. In case of narrow-band allocations, this limits the choice of the transform lengths, significantly increasing the computational complexity. With 3GPP long-term evolution (LTE) and 5G-NR numerologies, the shortest possible transform length is 128, while length-16 transform would be sufficient when a subband contains one physical-layer resource block (PRB) (12 subcarriers) only. This restriction applies to both time-domain filtering and FC-based solutions with continuous processing model. On the receiver side, the proposed scheme makes it possible to effectively combine cascaded inverse and forward FFT units in FCfiltered OFDM processing. Transform decomposition is used to simplify these computations, leading to significantly reduced implementation complexity in various transmission scenarios.
Building on our early work in [1], this article proposes symbol-synchronized discontinuous FC processing targeting at increased flexibility and reduced complexity of FC-F-OFDM. It is shown that the proposed processing supports more flexible parametrization of the FC engine, resulting in reduced complexity and latency with narrow subband allocations and in mini-slot transmission. Important use cases are seen, e.g., in spectrally well-contained narrow-band internet-of-things (NB IoT) transmission with one PRB allocation and in ultrareliable low-latency communications (URLLC) for generating short transmission bursts, so-called mini-slots, to reduce the radio link latency. However, the proposed schemes can be used for wide-band allocations as well.
The proposed scheme allows to reduce the complexity in the case of (i) short transmissions (e.g. mini-slot), (ii) in multiplexing multiple relatively narrow subbands (e.g., gateway for massive machine-type communications (MTC) communications), and (iii) user equipment (UE) side transmitter (TX) processing, assuming that only one numerology is transmitted. Moreover, in case of parallelized hardware implementations, it is a benefit that each OFDM symbol can be generated and filtered independently of the others. This also minimizes the TX signal processing latency.
In this article, we develop and describe discontinuous symbol-synchronized FC-based processing techniques. The main contributions of the article can be listed as follows: ▸ Mathematical models for discontinuous symbol-synchronized FC-based TX and receiver (RX) processing are described. Both overlap-and-add (OLA) and overlapand-save (OLS) variants are discussed. ▸ Extrapolating TX FC processing is suggested for reducing the required inverse fast Fourier transform (IFFT) size for OFDM modulation by relaxing the CP length related constraints on the IFFT size. ▸ Model for simplifying the IFFT computations in RX processing is proposed. The computational savings are achieved by effectively combining the computations of the IFFT of the FC module and the FFT of subband OFDM processing through transform decomposition. ▸ Extensive numerical results are provided, verifying the validity of the proposed models and illustrating the benefits of the proposed FC-based filtered-OFDM processing. ▸ Complexity evaluations are given quantifying the savings achieved using the proposed techniques. ▸ Discontinuous FC processing is proposed as an additional useful element in the toolbox for frequency-domain waveform processing, and it can be expected to find applications also in other areas of digital signal processing. The remainder of this paper is organized as follows. Section II, first shortly reviews the continuous FC-based filtered-OFDM processing. Then, the proposed discontinuous TX FCprocessing model is described with implementation alternatives resulting to the reduced complexity and latency in Section III for TX and in Section IV for RX. Section V presents an analysis of the computational complexity of considered alternative FC schemes. In Section VI, the performance of the discontinuous processing is analyzed in terms of uncoded bit error rate (BER) in different interference/multiplexing scenarios and channel conditions, while also numerical results for the complexity for alternative FC schemes are provided. Finally, the conclusions are drawn in Section VII.

II. CONTINUOUS FAST-CONVOLUTION PROCESSING
The block diagram for the basic continuous, symbolnonsychronized OLA-based FC-F-OFDM TX processing for subband is shown in Fig. 2. First, the CP-OFDM signal for subband is generated by using the smallest IFFT length equal to or larger than act, supporting an integer length CP. Let us denote this transform (IFFT) size by OFDM, . Then, low-rate CP of length CP, , is added to each of the OFDM, OFDM symbols for = 0, 1, … , OFDM, − 1 and the signal is converted to serial format. These are all operations equivalent to basic CP-OFDM TX processing.
The actual FC processing per subband starts by partitioning the time-domain input sample stream to FC blocks, as illustrated in Fig. 2. Note that the exact number of FC processing blocks depends on the input sequence length, overlap factor, and FFT length . Next, we take -point FFT of each processing block and apply FFT-shift operation which essentially places the DC-carrier in the middle of each vector. Then, a frequency-domain window is applied to implement the designed filter response. After frequencydomain windowing the given subband is placed at the allocated FFT bins with transition-band values possibly exceeding the  The -point IFFT is common part for all subbands. It converts all the low-rate frequency-division multiplexed subband signals to time domain per FC block. In addition, it provides the sampling-rate conversion by the factor of = ∕ . Next, OLA processing is used to concatenate the high-rate FC-blocks in order to construct the filtered timedomain representation of the transmitted signal. Alternatively, FC processing can be realized using OLS scheme. In this case, the zero padding in block partitioning is replaced by the straightforward segmentation into the overlapping blocks and the OLA after the last transform is replaced by the discarding of the overlapping output segments. More detailed description of the FC filtering process can be found, e.g., from [18], [20].
The basic continuous FC-processing flow of FC-F-OFDM transmitter for OFDM, = is illustrated in Fig. 3. The assumed overlap between processing blocks is 50 % (the overlap factor is = 0.5). From Fig. 3 we can observe how the FC processing is continuous by collecting OFDM, ∕2 = ∕2 samples from the input sample stream to each FC processing block. Also, the overlap factor is constant over all FC processing blocks.

III. SYMBOL-SYNCHRONIZED DISCONTINUOUS FC-BASED
FILTERED-OFDM TX PROCESSING Fig. 4 illustrates the proposed discontinuous TX FC-F-OFDM processing flow for a mini-slot of two OFDM symbols ( OFDM, = 2). It can be observed that in discontinuous processing, two FC processing blocks are synchronized to each OFDM symbol, where the first FC block contains the first half of the OFDM symbol and the second FC block contains the second half of the OFDM symbol. In addition, the first FC processing block contains the low-rate CP samples. This reduces the overlap in the beginning of the first FC block, that is, the overlap factor becomes = 0.5 − CP, , ∕ . Here, CP, , is the CP length of the th symbol on subband . In practice, this reduction is relatively small, causing only minor increase in the related distortion effects. For discontinuous processing, only four FC processing blocks are used, instead of five in the continuous processing model (see Fig. 3), resulting in reduced complexity.
This scheme is particularly beneficial in cases like Fig. 4 where the FC block length is equal to the OFDM symbol duration (a common assumption in earlier studies of FC-F-OFDM) and the two halves of the basic OFDM symbol are processed in two consecutive FC blocks. Then the FC blocks are synchronized to the OFDM symbols, and the CP is also processed within the first FC block. Such discontinuous FC-based TX filter processing reduces computational complexity through dynamically adjustable overlap of consecutive CP-OFDM symbols. Fig. 5 shows a detailed example of the sample-level interpolation and extrapolation process. The CP part is included in the leading overlap section of the first FC block of each OFDM symbol and the CP length is fine tuned in the OLA processing for consecutive OFDM symbols at high rate. The time resolution in adjusting the CP length is equal to the sampling interval at high rate (as in traditional CP-OFDM).
In generic setting, the discontinuous FC-F-OFDM process can be formulated as follows. First the CP-OFDM symbols are generated at the minimum feasible sampling rate for each  subband and, if needed, the CP length is truncated to the highest integer number of low-rate samples which does not exceed the CP length of the transmitted signal. FC-based filtering (or interpolation) is applied to each CP-OFDM symbol individually to generate filtered symbols at the high (output) sampling rate. The CP-OFDM signal for a transmission slot is constructed from the generated individual symbols using the OLA principle. When combining the individual symbols, their spacing in time direction is adjusted (with the precision of the output sampling interval) to correspond to the precise CP duration.
In basic form, the proposed scheme is suitable for scenarios where the overall symbol durations of all subband signals to be transmitted have equal lengths and the symbols are synchronized. It is notable that different durations (e.g., different CP lengths) are allowed for different CP-OFDM symbol intervals within a transmission slot.

A. CP-OFDM Processing with Variable CP Lengths
Let OFDM, and CP, , be the OFDM symbol length and the CP length of the th symbol, respectively, on th subband for = 0, 1, … , − 1, where is the number of subbands. The CP-OFDM TX processing, as illustrated in left-hand side block in Fig. 2, is formally expressed as and CP-OFDM, , = CP, , where , ∈ C OFDM, ×1 is the vector containing the incoming quadrature amplitude modulation (QAM) symbols on act, active subcarriers, −1 OFDM, ∈ C OFDM, × OFDM, is the inverse discrete Fourier transform (IDFT) matrix, and CP, , ∈ Z ( OFDM, + CP, , )× OFDM, is the CP insertion matrix. In general, the CP length could be different for each symbol, while for 5G-NR and LTE, two CP lengths are used for normal CP configuration [21], [22] such that the first symbol of a half subframe (0.5 ms) is longer than the others.

B. FC-based Synthesis Filter Bank with Overlap-and-Add or Overlap-and-Save Processing
FC-based filtering carries out the processing in overlapping blocks. In the synthesis filter bank (SFB) case, the input block length of the th subband is and the output block length is . The overlap between the input blocks is determined by the number of overlapping input samples O, . The number of non-overlapping input samples is given as S, = − O, while the overlap factor is expressed using these values as The number of overlapping input samples can be further divided into leading and tailing overlapping parts as follows: The corresponding number of overlapping and nonoverlapping output samples are determined as O = and S = (1 − ) , respectively. Similarly, the number of overlapping output samples are divided into leading and tailing parts, L and T , respectively. The FC processing increases the sampling rate by the factor of = ∕ , resulting in OFDM symbol and CP durations of OFDM, = OFDM, and CP, , = CP, , , respectively. Here OFDM, and CP, , have integer values. It is convenient, but not necessary, that OFDM, and CP, , have integer values as well.
In continuous FC SFB, the filtering of the th CP-OFDM subband signal for the generation of the high-rate waveform can be represented as where is the block diagonal transform matrix of the form Here, CP-OFDM, is the column vector formed by concatenating the CP-OFDM, , for = 0, 1, … , OFDM, −1. The zero-padding before and after the CP-OFDM symbols is  , and high-rate OFDM symbol duration of 64 samples. CP length is one sample at low rate and six samples at high rate (corresponding to one and half low-rate samples) typically selected to be L, = − S, . The overall highrate waveform to be transmitted is then obtained by combining all the subband waveforms as follows: The multirate version of the FC SFB can be represented either using the OLA block processing by decomposing the , ( , )'s as the following matrix (OLA) , or OLS block processing when the , ( , )'s are decomposed as (OLS) , Here, ∈ C × and −1 ∈ C × are discrete Fourier transform (DFT) and IDFT matrices, respectively. The DFT shift matrix ( ∕2) ∈ N × is circulant permutation matrix expressed as while ∈ R × is diagonal matrix with diagonal elements being the weights of the subband . The frequency-domain mapping matrix ( , ) ∈ N × maps frequencydomain bins of the input signal to frequency-domain bins of the output signal as follows: where is the center bin of the subband . The phase rotation needed to maintain the phase continuity between the consecutive overlapping processing blocks is given as For further details, see [18].  For OLA processing, the time-domain analysis window matrix , ∈ N × is a diagonal weighting matrix as given by For OLS processing, the time-domain synthesis window matrix , ∈ N × is given by C. Symbol-Synchronized TX FC Processing for One CP-OFDM Symbol Fig. 6 illustrates the OLA-based FC processing of one CP-OFDM symbol in two processing blocks corresponding to overlap factor of = 0.5. Here, it is assumed that the OFDM modulation IFFT lengths and the FC-processing short transform (FFT) lengths are the same, that is, OFDM, = for = 0, 1, … , − 1.
In symbol-synchronized processing, the incoming symbol is first zero padded in the beginning and the end by L, , = L, − CP, , and T, zeros, respectively, to form a zeropadded symbol , of length 3∕2 as follows Now, , ( , )'s for = 0, 1 essentially process (filter and possibly interpolate) two overlapping segments of length from the zero-padded symbol. Let us denote these segments by ( ) , ∈ C ×1 for = 0, 1 and the samples belonging to these processing segments are given by The effective overlap factor for the first processing block is reduced to 0.5 − CP, , ∕ due to inclusion of CP and, therefore, the first time-domain analysis window has to be redefined to as well. Let ( ) , ∈ C ×1 for = 0, 1 denote the product of the processing blocks by the transform matrices as expressed by for = 0, 1. Here, an additional phase rotation as given by is included to compensate the truncation of the CP length to interger samples on the low-rate side. The filtered high-rate subband waveform of length 3∕2 corresponding the th symbol on the th subband can finally obtained be combining these filtered blocks as Alternative to OLA scheme, the above processing can also be carried out following the OLS approach as illustrated in Fig. 7. The basic difference is that now the time-domain windowing is realized after the convolution and only one of the filtered blocks is non-zero in the overlapping regions. The abrupt truncation of output waveform at the edges of the filtered symbol can be avoided by smoothly tapering the raising edge of the first time-domain synthesis window ,0 and the falling edge of the second time-domain window ,1 , as illustrated using the dashed line in Fig. 7. Discontinuities in the output waveform give raise to a high spectral leakage and, therefore, the OLA scheme is preferable on the TX side in general.

D. Symbol-Synchronized TX FC Processing for Multiple Symbols
In the case of multiple OFDM symbols, the high-rate filtered symbols are combined with symbol-wise OLA processing as follows being the starting index of the th filtered block , of length 3∕2 OFDM, in as illustrated in Fig. 8. Here, -by-3∕2 matrix aligns the filtered symbols to their desired time-domain locations at the high-rate output sequence.

E. CP Extrapolation by TX FC Processing
Suppose that OFDM, is chosen such that the CP length on the low-rate side is not an integer, that is, OFDM, < 128 for 5G-NR and LTE numerologies. In this case, CP length can be rounded to next smaller integer as given by and the FC-based filtering with the accompanying symbolwise overlap-and-add processing as given by (16) inherently extrapolates at the high-rate side CP, , − ∕ CP, , samples corresponding to fractional part of the low-rate CP.
As an example, in the 10 MHz 5G-NR or LTE case with 15 kHz SCS and normal CP length, the output sampling rate is s,out = 15.36 MHz, the useful OFDM symbol duration is OFDM, = 1024 high-rate samples, and the CP length is CP, , = 80 high-rate samples for the first symbol (for mod 7 = 0) of each slot of 7 symbols, and CP, , = 72 samples for the others (for mod 7 ≠ 0). Then in the continuous processing model, the smallest possible OFDM IFFT length is OFDM, = 128, corresponding to s,in = 1.92 MHz input sampling rate, and the CP lengths are CP, , = 10 and CP, , = 9 low-rate samples for the first and other symbols, respectively. However, using the discontinuous FC processing model with narrow subband allocations, like 12, 24, or 48 subcarriers (SCs), the OFDM IFFT length OFDM, can be reduced to 16, 32, or 64, respectively. These transform lengths correspond at the low-rate side to CP lengths CP, , of 1.25, 2.5, and 5.0 for the first symbol or 1.125, 2.25, and 4.5 for the others. The same transform (IFFT) lengths are used for the subband OFDM signal generation. One important case is NB IoT using 180 kHz transmission bandwidth corresponding to a single PRB (12 SCs) and needs IFFT length of 128 and sampling rate of 1.92 MHz in traditional implementation. Discontinuous processing allows to generate the signal by using the FFT size of 16 for the OFDM symbol generation and FC processing at the sampling rate of 240 kHz.
Cyclic prefix Fig. 9. Discontinuous OLS processing for FC-F-OFDM receiver with overlap factor of = 0.5. FC processing blocks are synchronized to the OFDM symbols. Four FC-processing blocks are needed for two OFDM symbols.
in order to process two CP-OFDM symbols. In this case, the CPs are discarded after the FC filtering as part of normal RX CP-OFDM reception. With continuous processing, the FCprocessing chain needs to wait for varying number of samples belonging to the second OFDM symbol before it can start processing the third FC-processing block in order to obtain final samples in the output for the first OFDM symbol. In discontinuous processing, as seen in Fig. 9, the RX waits only for the samples belonging to the first OFDM symbol and desired amount of overlapping samples from the beginning of the second symbol, after which it can start processing the second FC processing block, providing at the output the last filtered samples of the first OFDM symbol.
In the case of maximal timing-adjustment flexibility, the number of samples collected from the following CP-OFDM symbol time corresponds to the number of overlapping samples at the end of FC processing block. In addition, in discontinuous processing, two first FC processing blocks can be processed independently from the two following FC processing blocks, as they represent different OFDM symbols. In continuous processing, two OFDM symbols are linked through common samples in the third FC processing block.
Because the content (FC block contains first or second half of OFDM symbol) and processing of even and odd FC processing blocks remain constant over the whole RX signal, we can process even and odd blocks separately from each other. This allows for an implementation where even and odd FC processing blocks are processed in parallel FC processing chains, allowing to minimize the latency of the RX implementation.
In the RX case, the transform matrices for FC processing with OLA and OLS schemes are given as respectively, while the time-domain analysis and synthesis window matrices are now given as respectively. The RX side discontinuous FC processing starts by segmenting the received high-rate waveform̂ FC-F-OFDM into the blocks of length 3∕2 aŝ Here, it is assumed for simplicity that the length of the received waveform is as given by (16d) and that the waveform is synchronized such that it contains L, , samples before the first data symbol. The high-rate symbolŝ , are further divided into the overlapping blocks of length aŝ and the resulting low-rate OFDM symbol is converted back to frequency domain as a part of the OFDM demodulation process as follows: Alternatively, the concatenation and the removal of the extensions can be combined in .
This latter form is beneficial when finding the simplified implementations for the discontinuous RX processing, as described in next section.
V. IMPLEMENTATION COMPLEXITY The FC processing complexity can be divided into highrate side complexity, high-rate , corresponding to long FCprocessing transform and low-rate (or subband-wise) complexity, low-rate, , corresponding to short FC-processing transform, frequency-domain windowing, and OFDM (de)modulation transform. Let us denote the FC RX processing long transform (FFT) and short transform (IFFT) complexities given in terms of number of real multiplications by ( ) and ( ), respectively, and OFDM transform (FFT) complexity by ( OFDM, ).
The number of real multiplications per received data symbol can now be evaluated as where the high-rate and low-rate complexities per subband are defined, respectively, as high-rate = ( ) (21b) and low-rate, = ( ) + 6 tb, + ( OFDM, ).
Here, tb, is the number of transition-band bins per transition band, = 2 is the number of FC blocks per OFDM symbol, and ≤ 1 is the implementation related factor as described in the next subsection. Factor 6 in (21c) is due to fact that two transition bands are needed for each subband and one complex non-trivial transition-band bin requires three real multiplications in general. For 5G-NR and LTE numerologies, the minimum allocation size is one PRB corresponding to 12 subcarriers. In this case, the minimum usable FFT size for continuous processing is OFDM, = = 128 whereas for discontinuous processing, FFT of size OFDM, = = 16 can be used. Assuming that an efficient implementation (using split-radix algorithm) requires ( ) = log 2 ( ) − 3 + 4 real multiplications for transform of size , each transform of size 128 requires (128) = 516 real multiplications and each transform of size 16 requires (16) = 20 real multiplications. These complexities are evaluated according to scheme requiring three real multiplications per complex multiplication as described in [23].

A. Simplified Implementation
The basic idea for joint processing of the IFFTs of the FCbased filter and the FFT of OFDM receiver is shown in Fig. 11. This structure is made possible by the symbol-synchronized discontinuous RX processing scheme.
Here, it is assumed that the length of the subband-wise OFDM symbol and the size of the IFFT used in discontinuous RX FC processing are the same ≡ OFDM, = . is selected such that it contains all active SCs per subband and transition-band bins used in the frequency-domain windowing performed in the "Block-wise FFT, subband FFT bin selection and weighting" block.
For simplicity, we denote the time-domain synthesis window matrix , by . Now, the processing of the th symbol from the frequency-domain FC output blocks ,2 and ,2 +1 can be written aŝ Here, ( ∕4) and (− ∕4) are given by (20g) and (20h), respectively. Alternatively, (23) can be represented aŝ  Here,̂ ∈ Z × as given bŷ is the time-domain synthesis window matrix circularly left and up shifted by ∕4 samples. The above formulation relies on rotating the ,2 +1 and the corresponding window by ∕2 samples. In this case, 0 and 1 form a lowpass-highpass filter pair essentially meaning that 0 1 = . The low-rate complexity of (25) per OFDM symbol is one IFFT and one FFT of length as depicted in Fig. 12. The direct approach, as illustrated in Fig. 11, requires two IFFTs and one FFT of length , that is, the saving in number of real multiplications for this simplified processing is 33.3 %. In (21), = 1∕2 gives the complexity of the simplified processing whereas = 1 gives the complexity of the direct approach. The half-band filtering provided by (25c) can be further decomposed into smaller transforms to achieve some savings in implementation, however, these decompositions are beyond the scope of this paper.

VI. NUMERICAL RESULTS
In this section, we will analyze the performance of the discontinuous FC processing in terms of uncoded BER in different interference and channel conditions, and also show complexity comparison between continuous and discontinuous FC processing. Here we assume the overlap factor of = 0.5. Continuous FC processing with the overlap of = 0.25 marginally degrades the spectral containment and error vector magnitude (EVM) performance compared to the overlap of = 0.5 with the benefit of somewhat lower implementation complexity.

A. Bit-Error Rate Performance in Narrow-band Allocations
Figs. 13-17 compare the simulated uncoded BER performance of different CP-OFDM configurations, with or without subband filtering in 10 MHz 5G-NR uplink scenario, with high-rate IFFT size of = 1024 and SCS of 15 kHz. Table I shows details of the considered scenarios and filtering configurations. The used channel models are additive white Gaussian noise (AWGN) and tapped-delay line (TDL)-C, which is one of the channel models considered in the 5G-NR development [24]. Two different values of the root-mean-squared (RMS) channel delay spread, 300 ns and 1000 ns, are used for TDL-C channel. Two subband configurations are considered: single PRB or four PRBs of 12 subcarriers. In both cases, four deactivated subcarriers are used as for guard bands. Focusing on the asynchronous up-link scenario, different instances of the channel model are always used for the three adjacent subbands included in the simulations. Perfect power control is assumed in such a way that the three adjacent subbands are always received at the same power level and constant SNR for all channel instances. Fig. 13(a) shows the BER simulation results for AWGN channel in synchronous scenario with single PRB per subband. In this case, the performance of the TX filtered continuous and discontinuous approaches and the RX-filtering-only scheme are practically the same. When compared with the basic synchronous OFDM, the filtered schemes have a minor performance loss. In asynchronous scenario, as depicted in Fig. 13(b), the performance of TX filtered schemes remains close to theoretical one whereas the BER floor for RXfiltering-only and basic OFDM schemes is about 0.5 % and 3 %, respectively. Fig. 14(a) shows the BER simulation results for TDL-C 300 ns channel in synchronous scenario with single PRB per subband. In this case, the channel maximum delay spread (about 2.6 µs) is well below the CP length (about 4.7 µs). When comparing with basic synchronous OFDM, we can see minor performance degradation of the schemes with filtering at both ends, while the degradation of the RXfiltering-only scheme is more visible. In the asynchronous case, the benefits of subband-filtered OFDM are clearly visible as illustrated in Fig. 14(b). Fig. 15(a) compares the performance for TDL-C 300 ns channel in synchronous scenario with four PRBs per subband. Now the performance difference between the schemes has been decreased due to the fact that, on the average, the wider subband suffer less from the interference between the subbands. The same trend can also be seen from simulation results of asynchronous scenario as shown in Fig. 15(b) where the performance degradation of RX-filtering-only and basic OFDM schemes is less obvious. Fig. 16(a) shows the BER simulation results for TDL-C 1000 ns channel with single PRB per subband. In this case the channel maximum delay spread (about 8.7 µs) exceeds the CP duration (about 4.7 µs), resulting in higher error floor in all configurations. The same conclusions can be made as above, except that the TX filtered schemes have now better performance than the basic OFDM. The performance degradation of basic OFDM scheme is due to the inter-carrier interference (ICI) induced by the increased frequency dispersion of the channel whereas, for TX filtered OFDM, the better spectral containment provides also better protection against the ICI [25], [26]. For asynchronous scenario, as illustrated in Fig. 16(b), the performance of TX filtered schemes are approximately the same as in synchronous scenario whereas the basic OFDM and RX-filtering-only schemes have considerably higher error floors than the synchronous cases.
The BER simulation results for TDL-C 1000 ns channel with four PRBs per subband are shown in Figs. 17(a) and (b) for synchronous and asynchronous scenarios, respectively. As seen for these figures, the performance improvement of the FC filtered waveforms remain consistent with other results. Fig. 18 and 19 compare the simulated uncoded BER performance of continuous and discontinuous subband filtering in 10 MHz 5G-NR uplink scenario, with high-rate IFFT size of = 1024 and SCS of 15 kHz. Single subband configuration is considered with 52 active PRBs of 12 subcarriers. In this case, eight non-active subcarriers are used as for transition bands. Table II shows details of the considered scenarios and filtering configurations. The used channel models are AWGN and TDL-C. Again two different values of the RMS channel delay spread, 300 ns and 1000 ns, are used for TDL-C channel. Fig. 18(a) shows the uncoded BER for QPSK, 16-QAM, and 64-QAM modulations. As seen from this figure, both the continuous and discontinuous processing reach the theoretical BER performance in AWGN channel. The power spectral density (PSD), as illustrated in Fig. 18(b), is about 5 dB lower at the out-of-band region for the continuous processing when compared with discontinuous processing. The average passband EVM for continuous and discontinuous processing are 63.8 dB and 63.4 dB, respectively.

B. Bit-Error Rate Performance in Wide-band Allocations
The uncoded BER performance in TDL-C 300 ns and 1000 ns channels is shown in Figs. 19(a) and 19(b), respectively. As seen from Fig. 19(a), the performance of all the unfiltered and filtered schemes are approximately the same in TDL-C channel with 300 ns delay spread. However, for channel model with maximum delay spread exceeding the CP duration as illustrated in Fig. 19(b), the pulse shaping provided by the filtering gives slightly improved performance over the plain CP-OFDM waveform.    The short FC-transform length is equal to the IFFT length in OFDM generation and it is selected as the smallest feasible power-of-two value. Notably, the smallest value of in continuous processing is 128, while in discontinuous processing we can use = 16 for single PRB ( act, = 12     pronounced and significant also for higher slot lengths. In these cases, the complexity of discontinuous processing is lower or similar to that of the continuous processing with 25 % overlap.

C. Implementation Complexity
3) We remind that with 50 % overlap, the imperfections of FC processing can be ignored, while the use of 25 % overlap degrades the performance with high modulation and coding schemes (MCSs). 4) The use of non-power-of-two short transform length ( ) in building the CP-OFDM symbols and FC processing blocks may give significant complexity reduction in both continuous and discontinuous FC processing, especially in the channel filter use case. This is shown for the fullband transmission case with = 768 instead of = 1024 (channel filtering example). This transform length can be efficiently implemented by three FFTs of length 256 and some additional twiddle factors. 5) Discontinuous processing allows to generate a single-PRB subband signal (e.g. for NB IoT) by using the FFT size of 16 for the OFDM symbol generation and FC processing at the sampling rate of 240 kHz, while prior art implementation require sampling rate of 1.92 MHz with the FFT size of 128.
The complexity evaluations for time-domain filtered OFDM and windowed overlap-and-add (WOLA) schemes and their relative complexity with respect to continuous FC processing are given in [18], [20].

VII. CONCLUSIONS
In this article, discontinuous symbol-synchronized fastconvolution (FC) processing technique was proposed, with particular emphasis on the physical layer processing in 5G-NR and beyond mobile radio networks. The proposed processing approach was shown to offer various benefits over the basic continuous FC scheme, specifically in terms of reduced complexity and latency as well as increased parametrization flexibility. The additional inband distortion effects, stemming from the proposed scheme, were found to have only a very minor impact on the link-level performance. The benefits are particularly important in specific application scenarios, like transmission of single or multiple narrow subbands, or in minislot type transmission, which is a core element in the ultrareliable low-latency transmission service of fifth-generation new radio (5G-NR).
Generally, discontinuous FC processing can be regarded as an additional useful element in the toolbox for frequencydomain waveform processing algorithms, and it can be useful for various other signal processing applications as well.  His current research interests include enhanced orthogonal frequency-division multiplexing waveforms and advanced multicarrier schemes.
Toni Levanen received the M.Sc. and D.Sc. degrees from Tampere University of Technology (TUT), Finland, in 2007 and 2014, respectively. He is currently with the Department of Electrical Engineering, Tampere University.
In addition to his contributions in academic research, he has worked in industry on wide variety of development and research projects. His current research interests include physical layer design for 5G NR and beyond." Kari Pajukoski received his B.S.E.E. degree from the Oulu University of Applied Sciences in 1992. He is a Fellow with the Nokia Bell Labs. He has a broad experience from cellular standardization, link and system simulation, and algorithm development for products. He has more than 100 issued US patents, from which more than 50 have been declared âĂĲstandards essential patentsâĂİ. He is author or coauthor of more than 300 standards contributions and 30 publications, including conference proceedings, journal contributions, and book chapters.
Arto Palin has long industrial experience in wireless technologies, covering cellular networks, broadcast systems and local area communications. He holds an MSc. (Tech.) degree from earlier Tampere University of Technology, and is currently working as Technical Leader at Nokia Mobile Networks, Finland, in the area of 5G SoC architectures.