OTFS vs. OFDM in the Presence of Sparsity: A Fair Comparison

Many recent works in the literature declare that Orthogonal Time-Frequency-Space (OTFS) modulation is a promising candidate technology for high mobility communication scenarios. However, a truly fair comparison with its direct concurrent and widely used Orthogonal Frequency-Division Multiplexing (OFDM) modulation has not yet been provided. In this paper, we present such a fair comparison between the two digital modulation formats in terms of achievable communication rate. In this context, we explicitly address the problem of channel estimation by considering, for each modulation, a pilot scheme and the associated channel estimation algorithm specifically adapted to sparse channels in the Doppler-delay domain, targeting the optimization of the pilot overhead to maximize the overall achievable rate. In our achievable rate analysis we consider also the presence of a guard interval or cyclic prefix. The results are supported by numerical simulations, for different time-frequency selective channels including multiple scattering components and under non-perfect channel state information resulting from the considered pilot schemes. This work does not claim to establish in a fully definitive way which is the best modulation format, since such choice depends on many other features which are outside the scope of this work (e.g., legacy, intellectual property, ease and know-how for implementation, and many other criteria). Nevertheless, we provide the foundations to properly compare multi-carrier communication systems in terms of their information theoretic achievable rate potential, within meaningful and sensible assumptions on the channel models and on the receiver complexity (both in terms of channel estimation and in terms of soft-output symbol detection).


I. INTRODUCTION
In any communication scenario, the channel state information (CSI), i.e., the knowledge of the communication channel, is required at the receiver in order to perform coherent detection [1]. The most common approach to acquire CSI is through the transmission of known pilot symbols [1]. Generally, these pilots are arranged within the block of information symbols This work has been supported by MIUR under the PRIN Liquid Edge contract.
Lorenzo Gaudio was with Department of Engineering and Architecture, University of Parma, 43124 Parma, Italy (e-mail: lorenzo.gaudio@studenti.unipr.it). Giulio Colavolpe is with the Department of Engineering and Architecture, University of Parma, 43124 Parma, Italy (e-mail: giulio.colavolpe@unipr.it) and CNIT Research Unit, I-43124 Parma, Italy. Giuseppe Caire is with the Department of Electrical Engineering and Computer Science, Technical University of Berlin, 10623 Berlin, Germany (e-mail: caire@tu-berlin.de). A preliminary version of this paper titled "On achievable rate of OFDM and OTFS in the presence of sparsity" was presented in part at the IEEE International Conference on Communications 2021, Montreal, Canada. following a chosen fixed pattern known to both transmitter and receiver (see e.g, [2]- [4]). 1 It is also well-known that, subject to the meaningful and widely used assumption of block fading (i.e., the propagation channel remains constant over blocks of consecutive time-domain symbols, and may change independently from block to block), pilot-aided schemes are indeed nearly information-theoretically optimal in terms of the capacity scaling in the high spectral efficiency / high signal-to-noise ratio (SNR) regime (see, e.g., [5]- [8]). On the other hand, both the pilot pattern and the channel estimation algorithm should be optimized for the particular communication scenario. By considering a noisy channel, the CSI is inevitably affected by an estimation error, whose magnitude depends on the channel SNR and number of pilots per block. The estimated channel coefficients are then used to perform coherent detection at the receiver side. Taking as the meaningful and most relevant performance metric the achievable communication rate, i.e., the amount of useful information sent in a block of symbols, there is a tension between the number of pilot symbols per block dedicated to CSI estimation and the number of information-bearing symbols. The optimization of this tradeoff is generally not trivial and depends on the modulation format and on the channel propagation characteristics.
In this work, we compare two digital modulation formats designed to handle time-frequency selective channels, namely orthogonal time frequency space (OTFS) [9] and orthogonal frequency division multiplexing (OFDM), in terms of pragmatic capacity, i.e., the achievable rate of the channel induced by the signal constellation and the detector soft-output [10], [11]. 2 In general, a soft-output detector at the receiver side produces an estimate of the posterior probability of the transmitted symbols given the received signal block (pilots and data). This estimated posterior probability (e.g., in the form of log-likelihood ratios) is then passed to a decoder, that treats the sequence of soft-output symbols as the output of a virtual channel. The pragmatic capacity is the capacity of such virtual channel, with discrete input represented by the modulation symbols, and soft-output generated by the 1 In some communication scenarios the transmitter could send an entire block of pilot symbols, i.e., without any useful information, followed by blocks with no pilots. Thus, the symbol detection is based on the channel estimation made from the first block. In this work we are not considering such a case, by focusing on a per block channel estimation, with associated benefits and losses. 2 Indeed, the analysis can be extended to the many multicarrier modulation formats proposed so far in the literature (e.g., see [12]). However, we consider here OFDM only since by far the most widely used scheme in modern standards.
detector. Hence, the pragmatic capacity is representative of the achievable rate under the assumption of separate detection and decoding, i.e., without "turbo" reprocessing of the decoder output (see, e.g., [13], [14] and references therein). In practice, iterative "turbo" detection is very hard to implement since often the detector is implemented in hardware (e.g., in an integrated circuit) and the decoder is implemented in software, and maybe even in a different location (as for example in the so-called 7.2 split between hardware and software, enabling cloud-based processing of the signals from remote radio heads [15]). For this reason, we believe that the pragmatic capacity for separated detection and decoding is a very meaningful performance metric to compare different modulation formats and the associated pilot schemes and soft-output detectors.
In order to make a fair comparison between the two modulation formats, we take into account the pilot overhead resulting from the optimization of the above mentioned tradeoff. Since the loss associated to the presence of pilots cannot be neglected, the achievable communication rate inevitably deviates from the upper limit of additive white Gaussian noise (AWGN) capacity for the same constellation format. Moreover, we also consider the presence of a guard interval (GI) or cyclic prefix (CP), which additionally reduces the achievable rate. One GI for each block is used in OTFS to avoid inter-block interference (IBI) [16], while a CP is used for every OFDM symbol to avoid inter-symbol interference (ISI) and to make symbols orthogonal in the time-frequency domain. Here a first significant difference between the two modulation formats evidently appears. In fact, in order to accommodate typical channel delay spreads, the CP length in several communication standards based on OFDM may be a large fraction (up to 25%) of the symbol time (see, e.g., [17] and references therein), leading to a remarkable loss in terms of capacity. On the other hand, OTFS does not need a per symbol separation, but this comes at the cost of a non-negligible increase in signal processing complexity [18]. 3 The time-frequency selective channels of interest mainly target outdoor scenarios with high mobility, since OTFS has been specifically proposed for these cases, where the Doppler spreads are significant and where there are few reflectors (or group of reflectors with similar properties), and thus a small number of multipath components [9]. Since the properties of the communication channel represented in the Doppler-delay domain depend on the physical geometry of the environment, the scattering components are thus sparse in the Doppler-delay plane. 4 In this context, exploiting the channel sparsity plays a fundamental role since estimation algorithms built over this concept exhibit very good tradeoffs between pilot overhead, complexity, and estimation error.
The idea of exploiting the channel sparsity in multi-carrier systems is well established and shared by other estimation techniques in the current literature, which generally applies concepts from compressed sensing (CS) (e.g., see [19]- [26]), for both OFDM and OTFS modulations. However, it should be noticed that CS is meaningful when the measurements (i.e., the pilot symbols) are obtained in the dual domain with respect to the domain in which the channel is sparse. Thus, in our case, the channel is sparse in the Doppler-delay domain, for which the dual domain (related by a two-dimensional Fourier transform) is the time-frequency domain. In the considered pilot scheme for OTFS, the pilots are directly placed in the sparsity domain (Doppler-delay). Hence, direct estimation via the maximum likelihood (ML) approach is both efficient and computationally feasible. In contrast, the OFDM modulation format places the pilot symbols in the time-frequency domain. Therefore, in this case it makes sense to consider a CS-based channel estimation approach.
Based on the aforementioned discussion, the proposed channel estimation algorithm for OTFS takes into account a pilot scheme similar to the one proposed in [4] (adopted in some other works, see, e.g., [27]), which considers a high energy center pilot (or a cluster of pilots [19], to contrast eventual destructive non-linear amplification effects over the single pilot) surrounded by zeros in the transmitted two-dimensional (Doppler-delay domain) block of symbols. This configuration of pilots and information symbols is a natural consequence of the input-output relation of OTFS, where the effect of the channel cause a cyclic shift of the transmitted symbol of a quantity proportional to the delay and Doppler shifts associated to each channel path. The estimation algorithm for OTFS is based on the ML approach of [18], with the introduction of a low-complexity pilot-based mechanism to detect channel multipath components and achieve a first coarse estimation of relevant parameters. On the other hand, for OFDM we adopt a well-known CS-based estimation algorithm based on least-absolute shrinkage and selection operator (LASSO) [28] (i.e., l 1 -norm regularized least squares minimization). In this case, the pilots placed in the time-frequency domain define the sensing matrix, and the proposed estimation algorithm makes use of the soft-thresholding iterative algorithm [29], optimized to efficiently work in our system setup [30]- [32]. The pilot configuration for OFDM is discussed in Sec. II.
The paper is organized as follows. In Sec. II and in Sec. III we present the input-output relation, the used pilot scheme, and the proposed channel estimation algorithm for OFDM and OTFS modulation, respectively. Sec. IV discusses the numerical results, while Sec. V concludes the paper.

II. OFDM MODULATION AND THE CS ALGORITHM
We consider OFDM with per-symbol CP transmitted over a time-frequency selective channel, assuming perfect symbol orthogonality and absence of inter-carrier interference (ICI). The channel impulse response (CIR) in time-frequency domain is given by [33] H (t, f ) = where P is the number of multipath scattering components and h p , ν p , and τ p are the complex channel gain including the pathloss, the Doppler shift, and the delay, associated to the pth scattering component, respectively. Note that the maximum channel delay and Doppler shift are assumed to satisfy where T is the symbol time and ∆f is the subcarrier spacing. By discretizing the time axis at steps nT , for n = 0, . . . , N −1, and the frequency axis at steps m∆f , for m = 0, . . . , M − 1, we obtain the discrete time-frequency channel matrix where a (ν p ) = 1, e j2πτpT , . . . , e j2πτp(N −1)T T , where (·) T and (·) H denote the transpose and the conjugate transpose (Hermitian) operation, respectively. By gathering the information symbols {x n,m } in a N × M matrix X, the expression of the received samples after transmission over the channel in (3) is and where denotes element-wise multiplication and Z is the AWGN with zero mean and covariance matrix σ 2 I N M . Information symbols may belong to any suitable complex modulation alphabet.
At this point, in order to help understanding the rationale behind the adopted channel estimation algorithm, let us represent the channel in a different form. Suppose to define a Dopplerdelay grid Γ, with some grid steps (both in the Doppler and delay domain) and total dimension G, given by the product between Doppler and delay axis dimensions. For each grid point γ i ∈ Γ, for i = 0, . . . , G − 1, the corresponding rank-1 channel component can be expressed, similarly to (3), as in which τ (γ i ) and ν (γ i ) are two fixed values of delay and Doppler depending on the discretization point γ i . By stacking the N × M matricesH (γ i ) to column vectors for all points γ i ∈ Γ (vec(·) operator) and concatenating the obtained vectors, we create a generally overcomplete dictionary matrix of dimension N M × G. We also define a G × 1 vector h sp (read: "h-sparse"), representing the channel gains corresponding to each discrete Doppler and delay component γ i ∈ Γ. Thus, the vectorized channel matrixH, which is an approximation of the true channel matrix H in (3), takes on the form vec(H) = Dh sp .
Since the true channel contains only a small number P G of Doppler-delay components, the grid coefficient vector h sp is a sparse, in the form where the positions of the approximated (to the nearest grid step) channel coefficientsh i select the columns of D with the pair of channel coefficients (τ (γ i ) , ν (γ i )), to overall represent the triplet τ (γ i ) , ν (γ i ) ,h p emulating the true channel parameters (τ p , ν p , h p ). Thus, the approximated received samples expression in vectorized form can be written asȳ where x = vec (X) and z = vec (Z). Moreover, by defining a selection matrix S, of dimension |P| × N M , to choose |P| symbols (pilots) among the total N M (P is the set of pilots and |·| indicates its cardinality), the transmitted vector of pilots x pl,OFDM (read: "x-pilot") takes the form The estimation of the channel coefficients can be carried on by solving the problem known as LASSO [28], i.e., where SD, under this configuration, takes the role of the sensing matrix of the CS configuration, and λ is the LASSO regularizer (see details in Appendix A). Notice that y pl is obtained through the transmission over the actual channel, and thus differs fromȳ pl , obtained through the process of approximation of the channel matrix described before. The incurred approximation error should be kept small, by choosing appropriately the grid Γ with sufficiently fine discretization. In any case, the residual approximation error between y pl and y pl is automatically included in the minimization or the overall quadratic error term y pl − x pl,OFDM SDh 2 2 in (14). The LASSO minimization problem has been extensively studied in literature. It can be solved using many different algorithms [29], [30], [32], and it was also adopted for the specific case of channel estimation [20]. As a final outcome, the estimated channel matrix resulting from the minimization of (14) and subsequently passed to the symbol detector is given byĤ For completeness and for the sake of reproducibility of our results, the details of the used LASSO solver, together with an analysis on its complexity, are described in Appendix A.

A. Pilot Scheme
The optimization of a deterministic sensing matrix for CS problems such as LASSO is up to now one of the most studied open problems in CS theory. In fact, the typical performance guarantees of CS require properties such as the restricted isometry property [20], [34], for which explicit constructions are not available and even checking the property for a given randomly generated matrix is exponentially complex [35]. On the other hand, ensembles of randomly generated matrices have the property of satisfying these properties with high probability [20]. Hence, here we resort to a pseudo-random pilot placement on the 2-dimensional time-frequency grid. We have verified by simulation that such random placement achieves with high probability the best performance with respect to regular "lattice" placements (e.g., equally spaced combs of subcarriers) usually specified in wireless standards [3]. An example of a random pilot scheme is depicted in Fig.  1. Generally, almost every configuration of a fixed number of pilots randomly placed within the 2-dimensional grid provides similar performance in terms of channel estimation. If pilots are not placed randomly but follow some periodic pattern, the algorithm for solving the LASSO produces far inferior results. This behavior is caused by the periodic sampling of a random Fourier matrix (i.e., H or Dh sp ). This is the reason why commonly used pilot schemes (see, e.g., [3] and references therein), generally structured or periodic, are not suitable for the CS-based estimation of OFDM systems (assuming that the OFDM channel is represented by a Fourier matrix). In fact, it is well-known that deterministic contribution of sensing matrices with small restricted isometry property (RIP) constant is a very hard problem (see [36] and references therein). In contrast, the RIP constant and therefore CS reconstruction guarantees of random sensing matrices have been extensively analyzed in the theoretical CS literature. In particular, it is well-known that "random DFT" matrices (i.e., matrices obtained by a random selection of the rows of a unitary DFT matrix) have very good properties, analyzed for example in the classical papers [37] and [38]. While there is no result saying that a pseudo-random selection of the pilot symbols position is the best possible (in fact, by definition, there must be a deterministic choice that performs better than average, by the usual random coding argument), an explicit algorithmic construction that performs better than random selection has not yet been found. This is a well-known open problem in compressed sensing. Since the focus of this paper is not to exhibit the best possible scheme, and random selection has theoretical guarantees of good performance with high probability, we have resorted to random selection. Moreover, with CS, the number of measurements, i.e., pilot symbols, is much less than the dimension of the target signal that we want to estimate. In this regime, standard LS-type channel estimation would fail since the underlying regression problem is under-determined. This is precisely the regime where novel compressed sensing schemes as the LASSO scheme used in our paper (which is pretty much the state of the art for noisy measurements as in our case) allow good estimation with a much reduced number of pilot symbols.
We aim at maximizing the overall achievable rate under random pilot placement. Hence, we can optimize the number of pilots per block to seek the optimal tradeoff between CSI estimation quality and pilot overhead (see (40) in the following and numerical results in Sec. IV).

B. Received Samples Expression -Real and Approximated Channel Conditions
Without entering into details of the complete input-output derivation of a CP OFDM system which can be found, e.g., in [39], we only provide the received sample expression. By considering real and approximated channel conditions, the received samples at time instant n and subcarrier m are respectively given by (16) and (17) at the top of next page, in which the ICI-free approximation follows the assumption ν max /∆f 1, and the last equality follows by using the orthogonality property. Note that (17) is equivalent to (6). For numerical simulations we generally consider the approximated channel model (17) in which the assumption on the absence of ICI holds. However, we also show the results with a comparison of OFDM under real channel conditions, i.e., considering (16), showing the performance degradation in the case where the ICI-free assumption is not well satisfied.

III. OTFS MODULATION AND THE PROPOSED ESTIMATION ALGORITHM A. OTFS Input-Output Relation
In this section we concisely review the derivation of the input-output relation of OTFS modulation (see, e.g., [18], [40]) and cast it in the notation of this paper.
After transmission over the channel defined in (1), the continuous received signal without noise is and the output of the receiver filter-bank adopting a generic receive shaping pulse g rx (t) is given in (21). By sampling  (23), having defined the cross ambiguity function ds as in [41], set h p = h p e j2πνpτp , and imposed the term e −j2πmn ∆f T = 1, ∀n , m, which is always true under the assumption T = 1/∆f . Since X[n, m] is generated via ISFFT, the received signal in the Dopplerdelay domain is obtained by the application of the symplectic finite Fourier transform (SFFT) where the ISI coefficient of the Doppler-delay pair [k , l ] seen by sample [k, l] is given by with Ψ p k,k [l, l ] defined as in (26). Using rectangular shaping pulses and after a suitable approximation of the cross ambiguity function as illustrated in the derivation in [18] and omitted here for the sake of brevity, the simplified version of Ψ p k,k [l, l ] is given by (27).

B. Pilot Scheme
In this section we describe the pilot scheme inspired by [4]. Using (24) expressed in matrix form, the OTFS input-output relation is given by where z denotes the AWGN with zero mean and covariance matrix σ 2 I N M . Note that Ψ p implicitly takes into account a Doppler-delay pair (τ p , ν p ), i.e., Ψ p Ψ p (τ p , ν p ). Without entering into mathematical details, the effect of the channel to symbols arranged in blocks is described in the following.
Consider a block composed by all zero symbols but one non-zero with enough energy to be well distinguishable and positioned anywhere within the block (the position of the symbol does not influence the result, since the channel shift effect is circular within the block), transmitted over the timefrequency selective channel in (3). At the receiver, most of the energy concentrates at positions on the 2-dimensional block (one per multipath component), with some diffusion to the surrounding positions according to the Dirichlet kernel functions appearing in the OTFS channel matrix expression in (27) (see [18] and references therein for more details). Examples of blocks of transmitted symbols and received samples in the case of a single multipath component are depicted in Fig. 2 (a) and Fig. 2 (b), respectively. Since the channel is composed of P components and is linear, in general the resulting received signal is formed by the superposition of the effects of P multipath components, i.e., a single transmit symbol will be shifted in P different positions, each of which has some surrounding diffusion as qualitatively shown in Fig. 2. Intutively, the estimation of the pairs (τ p , ν p ) follows by searching the peaks of the magnitude of the received samples grid (as suggested in [4]). This intuitive estimation procedure is, however, only able to provide the integer parts of the Doppler and delay shifts, associated to the Doppler-delay grid point collecting the maximum energy. The fractional parts are linked to the dissipation of the energy around the peak points and must be treated separately, as we will do via Algorithm 1 and Algorithm 2 in the following.
The approximation of the channel behavior to integer Doppler and delay shifts, as it was done in [4], yields accurate results only under the non-realistic conditions of integer Doppler and deleay shifts. Thus, based on our previously proposed ML estimator in [18], we will extend the idea of [4] providing a reliable estimation algorithm for general sparse channels with non-integer shifts.
A block of dimension N × M of transmitted symbols contains both information bearing symbols and pilot symbols. The arrangement of pilot symbols consists of a rectangular region placed in the block containing two types of symbols (see Fig. 3): in the OTFS input-output relation (27) are rapidly decreasing functions, hence, perfect orthogonality between information symbols and pilots cannot be achieved, but, at least, the Doppler-delay ISI can be reduced. • Peak Pilot: A pilot symbol with high energy, collecting the energy of the whole pilot field, is placed at the grid center. Its shifts in the Doppler-delay grid are used to provide the initial coarse estimation of the Doppler-delay pairs, which results to be fast and simple. Given this pilot arrangement, the number of pilot symbols has to be optimized to match the optimal performance-overhead tradeoff, while keeping constant the total block energy.
Note that in OFDM, symbols are independent, i.e., ISI and ICI free for modulation definition, and the pilot vector x pl,OFDM , defined in (12) through a selection matrix S, results to be a subset of symbols x. Differently, in OTFS, since the channel, as depicted in Fig. 2, behaves per block and not per symbol as in OFDM, the processing at the receiver could not be based on a subset of samples but must take into account the entire block. As a result, the vector of pilots x pl,OTFS has dimension N M × 1 and is composed of all zero entries (the positions of unknown data symbols are set to zero) but one, i.e., the peak pilot (Fig. 3).

C. Channel Estimation
The proposed channel estimation scheme is based on the ML approach of [18], 5 providing a parameter estimation of Doppler, delay, and complex channel gain associated to each 5 In [18] we proposed the ML method to estimate the Doppler shift and delay of the main path assuming LoS in the backscattered wave for a joint radar and communication application with OTFS modulation format. Since in [18] the estimation of the radar parameters is performed at the transmitter side, all modulation symbols in the block are known as they are generated by the transmitter itself. Therefore, they can be all treated as pilot symbols.
Here we use the same ML approach, but applied to the specific pilot pattern considered in this paper. multipath component. In the following, we provide a concise description of the scheme, since it plays a central role within the channel estimation algorithm for OTFS.
The objective is to estimate the set of parameters θ = {h p , τ p , ν p } ∈ T P , with T = C × R × R. By defining the ML function as the ML solution becomeŝ θ = arg min θ∈T P l(y|θ, x pl,OTFS ).
For a fixed set of {τ p , ν p }, the ML estimator of {h p } is given by solving the following set of equations By expanding (29), solving the system of equations in (31) to find the complex channel gains h p for every multipath component, and substituting these results in (29), after some long but relatively simple algebra (not given explicitly for the sake of brevity), we find that the minimization w.r.t. θ reduces to maximizing the function where S(τ p , ν p , φ p ) and I p ({h q } q =p , θ) (S p and I p in short hand notation) denote the useful signal and the interference term for the multipath component p, given respectively by Within the block of dimension N ×M it is well distinguishable the centered pilot with high energy , surrounded first by zero pilots (the hollow zone) and after by information symbols (here, for convenience, with unit energy).
The algorithm to obtain the estimation of Doppler, delay, and complex channel coefficient of each multipath component is described in the following.

Algorithm 1: Multipath Parameters Estimation
Result: The set (ĥ p ,τ p ,ν p ), for p = 0, . . . , P − 1.  After the definition of the ML approach, Algorithm 2 describes the actual CSI estimation, i.e., the estimation of the unknown parameters of each multipath component. Note that, as for OFDM, if the number of multipath components P is not available at the receiver, we instead select all local maxima or peaks (of the groups of estimates) whose magnitude is above a certain threshold (to be defined).
Coarse Estimation: By analyzing on-grid Doppler and delay shifts, get the first coarse estimation of the pairs (τ p ,ν p ) through the shifts of the peak pilot w.r.t. the Doppler-delay grid, by selecting the first P local maxima; It: Let i = 0, 1, 2, . . . be the iteration number; for i = 1, 2, . . . do • Grid step and interval refinement: Refine the granularity of the step around the estimated values within a refinement interval. The finer the step size and the larger the interval, the greater the computational complexity; • Use the ML approach described in Alg. 1 to get a finer estimation of the unknown parameters; • Select the first P local maxima. end The iterative process allows to refine the estimation through iterations while keeping the computational cost limited and speed-up the minimization of the estimation error. As indicated in [18], few number of iterations, e.g., less than 5, is enough to reach the algorithm convergence. The definition of the refinement interval and grid step is not mandatory and depends on the hardware capabilities of the system. Note that other iterative schemes based on coarse and refined estimation over discretized Doppler-delay-angle grids can be found in literature (see, e.g., [26] and references therein). However, our scheme relies on a rigorous derivation of OTFS with rectangular pulses and fractional Doppler and delay shifts, thus differs significantly from the approach proposed in [26], where OTFS is seen as a pre-and post-processing for an inner OFDM modulation, for which CS algorithms can be applied (due to the orthogonality of received samples, as discussed in Section II).
Remark 1: Note that, by considering the pilot scheme of Fig. 3 (and also the definition of x pl,OTFS ), the multiplication between the channel matrix Ψ p and the pilot vector x pl,OTFS is just a column selection of matrix Ψ p , which significantly simplifies all the equations involved. We did not explicitly take advantage of this aspect, by keeping the treatment as general as possible (one can think to modify the pilot pattern configuration of Fig. 3), but note that the reduction in terms of computational complexity could be remarkable.

IV. COMPARISON IN TERMS OF PRAGRAMATIC CAPACITY
Before proceeding to the numerical results, we introduce the simulation setting in terms of the pilot schemes and their overhead, further details on the channel estimation algorithms, the performance metric adopted for the comparison, and the soft-output symbol detection algorithms for OFDM and OTFS. All these details are listed in the following with the aim of providing a clear definition of the comparison of these two modulation formats.
• In general, we denote by D and by P the set of data and pilot symbols, respectively, in a N M frame of length N M for both OFDM and OTFS. An important aspect in the system optimization is the number of pilots |P| (|·| denoting the cardinality of a set). Consider first the OFDM modulation. It is well known in the CS literature that the minimum number of pilots (or measurements, from CS literature) to recover a sparse signal is given by the logarithmic scaling factor [42] |P| ≥ P log G P , where P and G are the number of non-zero components and the dimension of the vector to be estimated, respectively. Thus, given a multipath channel with P paths and a sensing matrix of dimension |P| × G (defined in (7)), the (asymptotically) minimum number of pilot symbols necessary to solve the minimization problem in (14) is given by (36). As a consequence, the required |P| slowly increases with G, i.e., with the resolution of the Doppler-delay grid (see also the Appendix A). Moreover, (36) provides only the scaling law of the number of measurements (up to some constant factor larger than 1) and the actual optimal system performance might be achieved for a number of pilots larger than the bound in (36). Note that this is not a precise quantitative analysis but it just gives a qualitative idea or intuition on how large the number of pilots per block should be, knowing that the aforementioned CS-based conditions are always given up to constant factors that depend on the specific problem, SNR, shape of the sensing matrix, and other variables. In other words, the analytical optimization of the number of pilots is an intractable problem. Thus, we based our comparison on a brute-force search over a suitable set of overhead values, identifying the best tradeoff between number of pilots and channel estimation performance for the given scenarios. For OTFS modulation, the pilot scheme presented in Section III-B and its estimation algorithm are independent of the block dimension, and depend only on the maximum Doppler shift and delay of the channel. Moreover, if the dimension of the block increases, one can set more pilots to zero to raise the power of the peak pilot, allowing the detection of low power scattering components when the number of paths P is not known a-priori, using the threshold-based approach explained in the following.
However, limits on the block dimension come, in first place, from important restrictions on OTFS detection computational complexity [18], and then from the assumption that the channel parameters are time-invariant, which breaks down if the block becomes too large. Thus, realistic block sizes have to be considered for both OFDM and OTFS. • For both OFDM and OTFS, during the channel estimation process, a "peak selection" has to be performed. In a genie-aided scenario where the number of propagation paths P is a-priori known, this results in the search of the P local maxima of the objective function. However, this information may not be available at the receiver and, in such a case, the algorithms should select all local maxima above a certain threshold (to be defined). By considering scattering components with decreasing power, which is generally the case, once the pathloss brings the power below the threshold, the corresponding component is neglected in the construction of the (estimated) channel matrix. Clearly, this results in a less accurate CSI, but, at the same time, the contribution of low-or very-low-energy paths has a minor impact. In our simulation results we consider the genie-aided case where P is known, while the threshold-based scheme is a conceptually straightforward extension. • In the numerical results, we assess the performance of both schemes in terms of pragmatic capacity, i.e., the mutual information of the virtual channel having at its input the constellation symbols and at its output the detector soft-outputs [10], [11]. The pragmatic capacity is representative of the achievable rate under the assumption of separate detection and decoding, i.e., without "turbo" reprocessing of the decoder output. By considering a sequence of N M symbols {x k } belonging to any constellation C, let {x k } be the noisy estimates of the transmitted symbol. The pragmatic capacity is simply defined as the symbol-by-symbol mutual information (see [10], [11], [18] for more details), which can be easily computed by Monte Carlo simulations as where P (x k |x k ) is the a-posteriori probability mass function of symbols x k ∈ C given the detector softoutputx k , while D is the set of information symbols, i.e., excluding the pilots. Note that, since the numerator sums |D| ≤ N M terms, while the denominator is the block size N M , the pilot overhead emerges naturally. An indication of the minimum length of the sequence to obtain reliable pragmatic capacity results is given in [43]. • For OFDM, we adopt the linear minimum mean square error (LMMSE) detector, whose soft-output under nonperfect CSI, i.e., employing the estimated channel matrix H, 6 is given bŷ Note that for OFDM, whose (estimated) channel matrix is diagonal, the LMMSE detector reduces to the symbolby-symbol detection given bŷ which significantly simplifies the computational complexity at the detector. On the other hand, the same detector cannot be adopted for OTFS modulation, because of the costly matrix inversion of a non-diagonal matrix in (38). Hence, different solutions have been proposed in the literature [18], [44]- [46]. However, some of these approaches rely on non-realistic model or channel assumptions (e.g., Doppler shifts and delays integer multiples of the symbol grid) and therefore their performance degrades significantly when applied to realistic channel conditions [18]. For this reason, in place of the very-highcomplexity block-based LMMSE detector (38) for OTFS, we consider the low-complexity message-passing (MP) soft-output algorithm proposed in [18], which achieves linear complexity per block (i.e., constant complexity per symbol, comparable with the symbol-by-symbol LMMSE detector for OFDM). As a reference benchmark, the mutual information for the considered input constellation transmitted over an AWGN channel (thus in the absence of fading), denoted to as AWGN (symmetric) capacity C sym AWGN , [47], will be also computed. At high SNR, where the estimation error is supposed to be small, the gap between pragmatic capacity curves and this benchmark is only due to the presence of the overhead of pilot symbols within the transmitted block. Asymptotically, the achievable rate loss R x is given by and the rate simply becomes the AWGN capacity multiplied by the fraction of data symbols per block, i.e., |D| /N M = 1 − |P| /N M . Moreover, as already said, in order to make a fair comparison, the overhead introduced by a CP for OFDM and by a generic GI (between blocks) for OTFS must be taken into account. While a GI interposed between two OTFS blocks, to avoid IBI, introduces a small-to-negligible overhead (especially when the dimension of the block increases), the CP overhead of OFDM is kept constant within the entire block (whatever its dimension), introducing a considerable loss in terms of pragmatic capacity. For instance, by considering a CP of length T /4, being T the symbol time, the loss is  This means that, with a modualtion constellation of size |C|, OFDM saturates at 0.8 · log 2 |C| bits/symbol. The overall loss takes into account both the pilot overhead and the presence of a CP and/or GI (also of length T /4, for consistency). • As anticipated at the beginning of this paper (Section II), in order to restrict to the classical low-complexity symbol-by-symbol minimum mean square error (MMSE) estimation for OFDM we have neglected the ICI. As already seen in (16), the ICI depends on the ratio between the subcarrier spacing ∆f and the maximum Doppler shift introduced by the channel. In order to have negligible ICI, the necessary condition is ∆f ν max , or, equivalently, ν max /∆f 1. Since ∆f = B/M , with B total bandwidth, the condition may not be satisfied when the number of subcarriers M becomes too large, even for moderate Doppler. In this paper we insist on neglecting ICI and consider the range of system parameters for which this assumption is indeed virtually exact. Furthermore, we notice that while OFDM incurs in this additional limitation, OTFS remains not sensitive to the Doppler shift.

A. Simulation Results
We present results in terms of pragmatic capacity vs. SNR for OTFS and OFDM with quadrature phase-shift keying (QPSK) modulated symbols, for a time-frequency multipath channel with P components and affected by AWGN, under the CSI estimation schemes and pilot schemes of Section II for OFDM and Section III for OTFS, respectively. As said, as  a reference benchmark, we plot the mutual information for a QPSK input constellation transmitted over an AWGN channel (thus in the absence of fading) C sym AWGN and without any pilot overhead. The system parameters are listed in Table I. Fig. 4 shows the performance of OFDM for a different number of pilot symbols. For the case P = 1, it easy to note that the performance slightly changes for different pilot overheads, whose percentage is indicated in the legend. On the other hand, as suggested by (36), if the number of non-zero components to be estimated increases, i.e., with P = 4, the channel estimation algorithm needs more pilots to work efficiently. Given these results, from now on, we will consider a pilot overhead of 3.125%, achieving, in our setup, the best tradeoff between estimation accuracy and achievable pragmatic capacity (for any number of scattering components). Fig. 5 shows the performance of OTFS with different detection algorithms. The MP soft-output detection approach of [18] is able to almost achieve the AWGN capacity under non-perfect CSI for a low number of scattering components, i.e., P = 1, together with a remarkable reduction of the computational complexity [18]. However, in line with the results of [18], the detector performance slightly decreases with increasing multipath. Note that the small loss w.r.t. C sym AWGN under non-perfect CSI is an indicator of the performance of the channel estimation algorithm, which results to be very accurate (otherwise the curve would have deviated from the benchmark). In light of these results, we see that there is no reason to adopt the LMMSE estimator for OTFS, which results in high complexity and worse performance (see [18] for a more detailed analysis). Hence, from now on, for the comparison with OFDM modulation, we consider the MP approach of [18].
In Fig. 6, we plot the pragmatic capacity vs. SNR for OFDM and OTFS under the configurations mentioned above. First of all, it is possible to note that the performance decreases only slightly while increasing the number of multipath components, deteriorates the pragmatic capacity, while a per block GI for OTFS introduces a negligible loss. In Fig. 7, we plot the pragmatic capacity of OFDM and OTFS for a fixed value of SNR, i.e., 18 dB, while changing the ratio between the maximum Doppler shift and the subcarrier spacing, i.e., ν max /∆f , for different number of subcarriers M (N = 50 for all cases). In this case, in particular for OFDM, the received samples are obtained by considering a real channel taking into account the ICI, i.e., (16), while the channel estimation works under the hypothesis of an ideal interference-free channel. By first taking into account OFDM, intuitively, the performance degrades when the ICI becomes significant. Note that the estimation performance of the LASSO solver is independent of the number of subcarrier M and, for this reason, whatever the choice of ν max and M , the performance of OFDM depends only on their ratio. Fig. 7 shows that the PC performance starts decreasing significantly for ν max /∆f 0.15. Almost the same behavior is shown for different number of subcarriers M (not reported here for the sake of space limitation), except for the percentage of pilot loss due to different block dimensions, supporting what stated above. However, as pointed out in Table II, while the performance is almost constant, the maximum supportable Doppler (or velocity), inversely proportional to M , is not. For these reasons, as expected, OFDM is not independent of the block dimension and the system has to be defined properly to operate in the range where the ICI is negligible.
Also in Fig. 7 we report the performance of OTFS in the same conditions, from which it is evident that OTFS is insensitive to the Doppler effects. This means also that the simulation results of OTFS depicted in Fig. 6 are valid for any Doppler spread.

V. CONCLUSIONS
In this paper we carried out a fair comparison between OFDM and OTFS modulation formats in terms of maximum achievable rate for practical separated detection and decoding, quantified by the Pragmatic Capacity measured at the softdetector output. We considered two pilot schemes and channel estimation algorithms each one specifically suited for the given modulation scheme. Both pilot and CSI estimation schemes are able to achieve very good performance (near genie-aided) under time-varying communication channel in the sparsity regime of a small number of number of multipath components. This conclusion is fully supported by numerical results, where simulation curves achieve the theoretical benchmark under nonperfect CSI, proving the quality of the proposed approaches.
OTFS achieves a better communication rate mainly because of the presence of a per symbol Guard Interval rather than a per-symbol Cyclic Prefix as in OFDM. This of course comes at the cost of a more complex channel estimation scheme, working on large block-wise operations.
In terms of soft-output data detection, the use of our Message-Passing soft-output detector, previously proposed in [18], yields constant per-symbol complexity for OTFS, which is the same scaling law of symbol-by-symbol MMSE detection for OFDM. Although we do not claim that the complexity of the two detectors is identical, in fact the actual complexity differ for some implementation-based constant.
Finally, we can observe that OTFS is indeed insensitive to the magnitude of the Doppler shifts, while the performance of OFDM degrades significantly even under small-to-moderate Doppler values if the number of subcarriers increases. Therefore, OTFS is effectively a good candidate for high-mobility systems in rural environments (e.g., high speed trains [48]) or aerial environments (e.g., UAVs [49]), where Doppler shifts may be large, and the propagation channel contains typically the line-of-sight and a few other reflection components (e.g., ground reflection, hills, large buildings), and it is therefore sparse in the Doppler-delay domain.

APPENDIX A THE LASSO SOLVER
For the sake of completeness and reproducibility of the results, in this Appendix we give the details of the algorithm used to solve the LASSO minimization problem in (14). 0) Initialization: By defining the known support matrix A X pl SD, in which X pl is a matrix of dimension |P| × G composed of G equal vectors x pl , i.e., X pl [x pl , . . . , x pl ], let Λ A H A and initialize the step size as [29] 1 in which · F indicates the Frobenius norm and the trace(·) operation takes the sum of the matrix main diagonal elements. The threshold t is set as t = λ , where λ is the LASSO regularizer appearing in (14). The vector of estimated valuesĥ is initialized to all zeros. 1) Iterations i = 1, 2, . . . a) Soft Thresholding: with ψ st (·, t) soft thresholding operator with threshold t (see [29], [50]), compute b) Nesterov's Acceleration Factor (Optional) [29]: Introduce a tuning coefficient α i ∈ [0, 1], which can be fixed or variable in t, and computê with α i defined, e.g., in [29]- [31] and reviewed for completeness in Appendix B. c) Shrink: Remove the entries of y and β, the columns of A, and the entries ofĥ, corresponding to the zero entries ofĥ. 2) Restoring: Restore the estimated vectorĥ to its full dimension (this operation is necessary after the shrink of the vectors during the iterations). We used as stopping criterion the maximum number of iterations. Note that the shrinking operation is allowed because zero entries of vectorĥ at iteration i cannot assume a value = 0 at iteration i > i [32]. From a complexity point of view, the first iterations are the most costly, while the algorithm can run > 10 6 times keeping the complexity almost constant and the computational time linear (when the number of iterations is large enough, i.e., far away from starting costly ones).

A. Complexity of the LASSO Solver and Step Size Refinement
The sensing matrix D is composed of G columns of length N M . While the dimensions N and M depends on system settings and can be somehow controlled or tuned, the dimension G takes into account the estimation precision, or granularity, of the searching grid Γ. Hence, the larger the dimension G, the more reliable the result. Using some examples: • If Γ is equivalent to the Doppler-delay grid (delay and Doppler shifts integer multiple of the grid), G = N M and D is a N M ×N M matrix. Blockwise operations adopted by any LASSO solver are feasible in this framework. • If the step size for both Doppler and delay axis is a fraction 1/ρ of the Doppler-delay grid step, G O N M · ρ 2 and D is approximately a N M × N M · ρ 2 matrix. Clearly, increasing the granularity of the grid quickly increases the complexity.
In order to overcome the complexity induced by searching grid with fine granularity, it is possible to refine the step size in successive phases, rather than directly defining a low fractional value for the entire grid. The proposed refinement scheme is illustrated in Algorithm 3. During the Peak Selection step, if the number of multipath components P is not available at the receiver, instead select all local maxima or peaks (of the groups of estimates) whose magnitude is above a certain threshold (to be defined).

Algorithm 3: Refinement of the Granularity
Result: Fine estimationĥ for LASSO problem (14). Coarse Estimation: For any LASSO solver, get a first coarse estimationĥ such that the searching grid Γ is equivalent to the Doppler-delay grid (i.e., delay and Doppler shifts integer multiple of the grid). In this case G = N M and D is a N M × N M matrix; for Iteration i = 1, 2, . . . do • Peak Selection: Select the first P local maxima ofĥ; • Step Refinement Around Maxima: Build a new sensing matrix based on an extension of matrix D such that the step size around the peaks is decreased (i.e., the granularity and the precision are increased); • Finer Estimation: For any LASSO solver, get a finer estimationĥ. end Note that during the Coarse Estimation step, i.e., when D is an N M × N M matrix, it is possible to adopt the approach proposed in [51] to solve the LASSO minimization in (14). The algorithm of [51], benefiting of the hierarchical structure of vector h, is able to provide a first coarse and reliable estimation optimizing the computational complexity. However, if h takes off-grid values, the approach of [51] becomes inappropriate, as confirmed by the presented simulation results. For this reason, after a Coarse Estimation, i.e., within the Iteration step, another LASSO solver must be chosen to obtain the best performance in terms of channel estimation.

APPENDIX B NESTEROV'S ACCELERATION FACTOR
The Nesterov's acceleration factor governs the dependency between two successive estimations, while remarkably reducing the convergence time of the algorithm [30]. The choice of the optimization coefficient α i is not mandatory. By following the pioneering work [31], which inspired many other works, e.g., [29], α i can be recursively defined as with ξ 0 = 1. Another choice simply based on the iteration index i is α i = (i − 1)/(i + 2) [30], [31]. Both solutions describe a curve growing from an initial value ("far" from 1) up to 1. The associated plots, with their similar behaviors, can be seen in Fig. 8. The most conservative choice is α i = 1, for which the dependency from the previous estimated values is maximized, while the less conservative choice is α i = 0, completely forgetting the previous estimated values. Intuitively, a small value of α i is preferable for the first noisy iterations, while a parameter α i near to one should be chosen when the reliability of the estimation increases.