Non-Iterative Downlink Training Sequence Design Based on Sum Rate Maximization in FDD Massive MIMO Systems

This paper considers the problem of downlink (DL) training sequence design with limited coherence time for frequency division duplex (FDD) massive MIMO systems in a general scenario of single-stage precoding and distinct spatial correlations between users. To this end, a computationally feasible solution for designing the DL training sequences is proposed using the principle of linear superposition of sequences constructed from the users’ channel covariance matrices. Based on the non-iterative superposition training structure and the <inline-formula> <tex-math notation="LaTeX">$P$ </tex-math></inline-formula>-degrees of freedom (<inline-formula> <tex-math notation="LaTeX">$P$ </tex-math></inline-formula>-DoF) channel model, a novel closed-form solution for the optimum training sequence length that maximizes the DL achievable sum rate is provided for the eigenbeamforming (BF) precoder. Additionally, a simplified analysis that characterizes the sum rate performance of the BF and regularized zero forcing (RZF) precoders in closed-form is developed based on the method of random matrix theory and the <inline-formula> <tex-math notation="LaTeX">$P$ </tex-math></inline-formula>-DoF channel model. The results show that the superposition training sequences achieve almost the same rate performances as state-of-the-art training sequence designs. The analysis of the complexity results demonstrates that more than four orders-of-magnitude reduction in the computational complexity is achieved using the superposition training design, which signifies the feasibility of this approach for practical implementations compared with state-of-the-art iterative algorithms for DL training designs. Importantly, the results indicate that the analytical solution for the optimum training sequence length with the <inline-formula> <tex-math notation="LaTeX">$P$ </tex-math></inline-formula>-DoF channel model can be effectively used with high accuracy to predict the sum rate performance in the more realistic one ring (OR) channel model, and thus, near optimal solutions can be readily obtained without resorting to computationally intensive optimization techniques.


I. INTRODUCTION
Next generation cellular systems require to maximize spectral efficiency to satisfy the rapidly increasing demand for wireless data services [1], whilst reducing both the cost and energy consumed [2], [3]. Massive multiple-input multiple-output (massive MIMO), proposed in [4], is introduced as one of the most promising technologies to achieve this goal. In particular, massive MIMO transmission has several advantages such as: (a) allowing the use of linear precoding schemes with low complexity signal processing; (b) achieving a uniform quality of service across the entire cell; (c) providing immunity The associate editor coordinating the review of this manuscript and approving it for publication was Jie Tang. against fading; and (d) reducing the base station (BS) energy consumption [5]- [10].
To achieve the full potential of massive MIMO, sufficiently accurate and timely estimates of the channel state information (CSI) at the BS are required [11], [12]. Early research on massive MIMO systems focused on the time division duplex (TDD) operation, where the required CSI is obtained by sending a superposition of orthogonal sequences over a length of T p symbols in the uplink (UL) direction during each coherence interval [5], [6], [8], [13], [14]. The authors in [15], [16] found that the optimum number of UL training symbols is proportional to the number of user terminals (UTs) K and independent of the number of BS antennas N that can be made as large as required. Under UL and downlink (DL) VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ channel reciprocity that holds in TDD systems, the UL CSI estimates are used for designing the DL precoder without the requirement for DL CSI estimation. However, in FDD systems, the DL and UL channels occupy different frequency bands [17], and hence, estimation of the DL CSI using UL training sequences is not possible. As such, the framework analysis for the optimum training design developed in [15], [16] cannot be used to predict the performance of an FDD massive MIMO system. To obtain the CSI in FDD systems, the UTs would need to estimate the DL channels of each of the N BS antennas and send the quantized channel estimates back to the BS to design the precoder [18]. This is generally deemed unfeasible for the FDD massive MIMO systems with large N since the overhead for the DL CSI estimation is proportional to the number of BS antennas N [11], [12], [19]. As such, the available coherence time would be largely occupied by the channel training, leaving insufficient time for transmitting useful data to the UTs [19], [20].
To address the challenge of FDD operation in massive MIMO systems, several studies have investigated the design of DL training sequences using different channel models and design criteria, see e.g., [21]- [28]. In particular, the research in [21]- [23] explores the joint use of the spatial and temporal channel correlations in which the training sequences are designed based on the criterion of minimizing the minimum mean squared error (MSE) of the channel estimate in a scenario where all users exhibit a common spatial correlation. However, in practice, users could exhibit distinct spatial correlation patterns; therefore, the optimization framework of the DL training sequences developed in [21]- [23] does not hold in the general scenario with heterogeneous user channels. Another line of research studies have focused on the design of training sequences for FDD massive MIMO systems by utilising a two-stage-precoding technique, termed as joint spatial division and multiplexing (JSDM) [24], [29]. Specifically, the research in [24], [29] exploits correlations in the spatial domain, where the users within each group exhibit the same spatial correlation, and a linear superposition of each group correlation matrix is used to perform the first of two stages of precoding, thus forming a beam for each group. As such, the training sequence length in the DL can be scaled linearly with the number of user groups, which can be less than N , resulting in a feasible pilot overhead requirement for FDD operation. While the two-stage precoding technique helps to constrain the training sequence length and maximizes the sum rate criterion, sophisticated scheduling and clustering algorithms of the user groups, and of the users inside each group, are essential, thus constraining the approach. Furthermore, the research in [24], [29] does not address the challenge of designing the DL training sequence in single-stage precoding with K distinct covariance matrices, and thus, cannot predict the optimum training length that maximizes the sum rate performance in this preferred scenario. The research in [25]- [28] considered a general scenario of single-stage precoding with distinct spatial correlations, where the training sequences in [25], [26] and [27], [28] are designed iteratively by using different iterative algorithms as a solution to a sum conditional mutual information (SCMI) maximization criterion and a sum mean square error (SMSE) minimization criterion, respectively. While advanced iterative algorithms have been developed in the aforementioned research, they provide no closed-form solution for the optimum DL training sequence that maximizes the sum rate with limited coherence time. Furthermore, the limited coherence time interval implies that the CSI estimation should be determined more frequently, and thus, iterative-based solutions for the DL training sequence design may be infeasible.

A. CONTRIBUTIONS AND PAPER FINDINGS
This paper addresses the challenge of DL channel estimation in an FDD massive MIMO communication system with single-stage precoding and limited coherence time using a non-iterative approach for the DL training sequence design. To this end, the principle of linear superposition, in which the DL training sequences are constructed from the eigenvectors of the K distinct correlation matrices, is proposed, which allows a feasible solution for DL channel estimation to be achieved with a reduced design complexity, thereby avoiding the design of existing training sequences that require computationally demanding iterative algorithms. Based on the superposition training approach and the P-degrees of freedom (P-DoF), a new analytical closed-form solution for the optimum training sequence length that maximizes the DL achievable sum rate in an FDD massive MIMO system is provided for the eigenbeamforming (BF) precoder. In addition, an asymptotic random matrix theory along with the P-DoF channel model are adopted in this paper to provide a straightforward analysis of the sum rate for the BF and regularized zero forcing (RZF) precoders. Comparisons between the sum rates of the superposition training design and the state-of-the-art sequences designed based on iterative algorithms [25], [28], are conducted based on the P-DoF and the one ring (OR) [30] channel models. Furthermore, the computational complexity of the superposition sequence design is analyzed and compared with the state-of-the-art iterative algorithms.
We found that the diversity of spatial correlations between multiple users significantly improves the sum rate performance in comparison to uncorrelated channels with identical covariance matrices. The numerical results demonstrate that the superposition training sequences achieve almost the same rate performances as state-of-the-art training designs while reducing the computational complexity. Importantly, the results show that the pilot length that is optimized for the BF precoder is also sufficient to predict the rate performance of the BF and RZF precoders in the more realistic OR channel model. Overall, the proposed design paradigm allows a pragmatic DL training design for an FDD massive MIMO system to be achieved with a significant computational complexity reduction.

B. PAPER ORGANIZATION AND NOTATION
The paper is organized as follows. In Section II, the system model is introduced. In Section III, we explain the channel estimation process based on the DL training sequence together with the problem formulation. In Section IV, the SINR analyses of the BF and RZF precoders based on the random matrix theory are developed, which are then used in Section V with P-DoF channel model to provide a closed-form solution of the optimum training sequence length for the BF precoder and an explicit mathematical analyses of the DL sum rates for the BF and RZF precoders. In Section VI, numerical results are provided in order to characterize the system performance and validate the analyses. Finally, the paper is concluded in Section VII.
Notation: In the present paper, an upper boldface symbol stands for a matrix whereas a lower boldface symbol stands for a vector. CN (0, R) denotes the circularly symmetric complex Gaussian (CSCG) probability distribution with mean 0 and covariance matrix R. The term E[·] refers to the expectation operator. I N denotes the N × N identity matrix. The operators trace, transpose, Hermitian transpose, inverse and absolute value are denoted by tr(·), (·) T , (·) H , (·) −1 , and |·|, respectively. [A] :, j:m denotes a submatrix containing columns j through m of matrix A. We use [a] k and [A] k,l to denote the element in the kth row of vector a and the element in the kth row and lth column of matrix A, respectively.

II. SYSTEM MODEL
The present model in this paper considers a single-cell, DL mobile wireless communications system where the BS is equipped with a uniform linear array (ULA) of N antennas and employs single-stage precoding, which serves K single antenna user terminals (UTs) with N K . Non-line-of-sight (NLOS) Rayleigh fading channels over a single-frequency band are considered with an overall coherence time denoted by T c ∈ Z + and enumerated in symbols per transmission block.
As depicted in Fig. 1, the available coherence time T c is divided into the training duration T p , the feedback time T f and the data transmission duration T d . As the purpose of the present paper is to concentrate on the DL training sequence design, minimizing the training sequence length over a limited coherence time, the UL feedback time T f and associated error rate are assumed to be zero, as considered in [15], [22]- [24], [28], [29], [31].
The DL received signal during the data-phase at the k-th UT may be written as [5], [6] where h k ∈ C N is the complex channel vector between the BS and the k-th UT and additive noise n k is modeled as a zero mean unit variance CSCG random variable. The DL transmit vector s ∈ C N is given as where W = w 1 , . . . , w K ∈ C N ×K is the precoding matrix at the BS and x = [x 1 , . . . , x K ] T ∈ C K is a zero mean CSCG vector of data symbols satisfying E xx H = I K . When the matrix power normalization technique is used [6], [32], the normalization constant ξ can be written as which ensures that E s 2 = K and the average per-user BS transmit power is ρ d during the data-phase. To this end, a DL achievable sum rate, C , for the massive MIMO system under consideration can be expressed as where the associated signal-to-interference-plus-noise ratio SINR term at the k-th UT is given by (5) [5], [6]. The term 1/ρ d is the inverse of the per user signal-to-noise-ratio (SNR) during the data-phase.
The sum rate lower bound in (4) depends on the channel statistics, the channel estimates and the linear precoding technique used at the BS. We consider specifically two commonly prevailing types of linear precoders, the eigenbeamforming (BF) or maximum ratio transmitter (MRT) precoder and the regularized zero forcing (RZF) precoder as defined in (6) [6], where ζ is the regularization parameter, which is considered to be the inverse of the per user SNR, 1/ρ d [6]. The termĤ is the estimate of the DL channels H = [h 1 , h 2 , . . . , h K ] H ∈ C K ×N , where each channel h k , k = 1, . . . , K , is modeled as a zero mean, independent CSCG random vector. The following section explains howĤ, which is required at the BS for precoding, is estimated. The k-th user's correlation matrix, The covariance matrix R k is considered to be locally stationary, varying more slowly than the instantaneous channel of the coherence time [22], [24], [29], [33], [34], and thus, may be accurately estimated by either the FDD or TDD schemes considered [35]- [40].

III. CHANNEL ESTIMATION AND PROBLEM FORMULATION
This section addresses the problem of channel estimation using DL training sequences in an FDD massive MIMO communication system.

A. CHANNEL ESTIMATION USING DOWNLINK TRAINING SEQUENCES
To estimate the DL channel, the BS transmits predetermined pilot sequences of duration or length T p during the trainingphase. The received training-signal, y k ∈ C T p , at the k-th user is given by where S p ∈ C N ×T p is the spatio-temporal common pilot matrix, which is normalized as tr S H p S p = T p so that the average transmitted power during the training-phase is equal to ρ p . The receiver noise n k ∈ C T p exhibits a CSCG distribution CN 0, I T p . Since the channel vector h k follows a CSCG distribution with known statistics at the BS, linear filters, which exploit the channel statistics to optimize channel estimation performance, can be used. To this end, an optimized channel estimation performance in the DL with T p < N is achieved by utilizing Bayesian estimation, i.e. employing minimum-mean square-error (MMSE) filter, which makes use of channel and noise statistics. Accordingly, the k-th user's covariance matrix with MMSE channel estimation ∀k is expressed as [41] In this paper, the structure of a common pilot matrix S p is designed by jointly considering all the K distinct channel covariance matrices where the effective eigenvectors of each of the matrices are combined using the principle of linear superposition. In particular, a unique pilot sequence for the training-phase is developed based on the eigenvalue decomposition (EVD) of the channel covariance matrices where U k = [u k,1 , . . . , u k,N ] ∈ C N ×N is a unitary matrix of the eigenvectors and k ∈ R N ×N is a diagonal matrix of the eigenvalues of R k arranged in descending order λ k,1 ≥ λ k,2 ≥ · · · ≥ λ k,N . Specifically, the pilot matrix S p ∈ C N ×T p is constructed from the superposition of the first T p eigenvectors of R k , corresponding to the largest eigenvalues, as expressed in (10).
The pilot matrix in (10) is normalized by the Frobenius norm to satisfy the power constraint tr S H p S p = T p . In principle, increasing T p allows for more pilot signal energy to be received, but it comes at the cost of reduced spectral efficiency due to a shorter data transmission phase. Therefore, the energy in the channel that is related to the last eigenvector columns [U k ] :, T p +1:N is not used in precoding.

B. FORMULATION OF THE OPTIMIZATION PROBLEM
Maximizing the downlink sum rate over the training-phase duration in the massive MIMO system under consideration equates to the optimization problem defined in (11).
Though the problem setup may seem straightforward, solving (11) for arbitrary correlation matrices R k and finite values of N is computationally demanding since the expectations in (5) need to be evaluated for different choices of T p using extensive Monte Carlo simulations. A computationally feasible solution to finding an optimum training sequence length T * p may be obtained by invoking asymptotic random matrix theory (RMT) methods [6], [15], [32], [42].
To date, the optimum pilot length aiming to maximize the achievable sum rate performance of FDD massive MIMO systems is obtained by exhaustively searching from all possible combinations of 1 ≤ T p ≤ min N , T c . However, with the scaling up of the BS array and increasing of the coherence time, such an exhaustive search might be time and resource consuming, and hence, practically infeasible. To the best of our knowledge, finding a closed-form solution for the optimum training sequence length T * p that maximizes in DL single-stage precoding of an FDD massive MIMO system, has not been solved due to the technical challenge of deriving a closed-form solution.

IV. ACHIEVABLE SUM RATE ANALYSIS
In this section, we provide the expressions that accurately approximate the SINR k and downlink sum rate for the massive MIMO system under consideration based on the asymptotic random matrix theory approach in [6]. In particular, an asymptotically tight approximation of the SINR k , denoted SINR k , for the SINR k equation defined in (5) is obtained as indicated in (12) when N and K grow without bound while the ratio K /N > 0 is kept constant.
Consistent with previous literature of large system analysis [6], [15], [20], [32], [43], our numerical results show that these approximations are also highly valid for practical, finite values of N and K . This allows us to replace the SINR k term in (5) for the BF and RZF precoders with the approximations given in this section, so that the optimization problem may be rewritten as in (13).
The following propositions for the BF and RZF precoders are modified and repurposed versions of Theorem 4 and Theorem 6 in [6], respectively. Particularly, the precoders under consideration are designed based on an imperfect channel estimation in the downlink of a single-cell FDD massive MIMO system. As such, the MMSE channel estimation is obtained based on the downlink training sequence that is given in (10). Further details on the asymptotic random matrix theory methods can be found in [6], [42].
Proposition 1: Let SINR BF k denote the SINR k for BF precoding as given in (5). An asymptotically tight approximation of SINR BF k for the regime where N and K go to infinity with a given ratio reads which denotes the precoder normalization, and k is the k-th user's covariance matrix with MMSE channel estimation, which is provided in (8).
The expression (14) is generally valid for any channel correlation model and training sequence type. Unlike the expression for BF precoding, the SINR approximation for the RZF precoder is given in terms of several auxiliary variables. These variables arise from the asymptotic random matrix theory analysis and, in general, need to be numerically solved before the SINR approximation is obtained. A simplified analysis where the performance of the RZF precoder can be characterized in closed-form is provided in Section V based on the P-DoF channel model. Proposition 2: Let SINR RZF k denote the SINR k for RZF precoding as given in (5). An asymptotically tight approximation of SINR RZF k for a general correlation model is given as where the termξ ∈ R is obtained later in (23). Parameter δ k ∈ R is a unique solution to a fixed-point equation that arises from the asymptotic random matrix theory analysis and needs to be numerically solved. An analytical closed-form solution for δ k is provided in subsection V-C based on the P-DoF channel model. Recalling the covariance matrix of the MMSE channel estimate k given in (8) and defining a recursion on integer t, where t = 1, 2, . . . , with an initial value δ (0) k = 1/ζ for all k where ζ > 0 is the regularization parameter in (6), the variable δ k ∈ R is found numerically by the standard fixed-point algorithm as the limit After the solution of the fixed-point equations in (16) and (17) is numerically obtained, it is substituted into to obtain random matrix T ∈ C N ×N . Auxiliary matrixT ∈ C N ×N is given bȳ where J ∈ C K ×K andv ∈ C K are obtained from the expressions given in (21) and (22). [ Parameterξ ∈ R in (15) corresponds to the precoder normalization and is obtained by substituting the matrices T and T intoξ The auxiliary variable µ k,i ∈ R in (15) is obtained using the dominated convergence and the continuous mapping theorems as developed in [6], which is modified here for the massive MIMO system under consideration, and thus, it is provided from the expressions given in (24), (25) and (26), where v ∈ C K denotes The SINR approximation given in (15) is generally valid for any channel correlation model and training sequence type. Determining the sum rate for the RZF precoder in a form that is useful for the optimization problem defined in (13) is still challenging. In order to solve this and gain further insight into the problem, we utilize the analytical P-DoF channel model considered in [6] in the next section. This supports the formulation of both SINR BF k and SINR RZF k , leading to the successful computation of their respective sum rates with very low computational complexity.

V. CLOSED-FORM SUM RATE ANALYSIS AND TRAINING SEQUENCE OPTIMIZATION BASED ON THE P-DoF CHANNEL MODEL
In practice, field measurements have shown that the MIMO channel coefficients are spatially correlated in outdoor [44] and indoor propagation environments [45], [46]. The correlation model depends on the number of degrees of freedom offered by the physical channel, which can be much smaller than the number of BS antennas N . Therefore, the correlation model can be modelled as a P-dimensional subspace, where P is the number of angular spatial directions or bins. These angular bins correspond to the number of significant multipaths in the angular domain. In this section, an analytical physical channel model [6], [10], [47], [48] is considered, where the angular domain is separated into a finite P number of directions i.e. P-DoF. The P-DoF channel model is used to obtain a closed-form solution for the achievable sum rate in the BF and RZF precoders. The model also allows the SINR to be expressed in closed-form for both the BF and RZF precoders and the optimum training sequence length to be mathematically derived and expressed in an analytical form for the BF precoder but not the RZF precoder, the latter requiring numerical computation.
From [6], [10], [47], [48], the analytical P-DoF channel model for the system under consideration is written as where P/N ∈ (0, 1], the elements of b k ∈ C P are i.i.d. CN 0, 1) and A k ∈ C N ×P is constructed from P ≤ N columns of an arbitrary N ×N unitary matrix so that A H k A k = I P . Clearly, R k has rank P and the channel is stochastically rank deficient if P < N . In general, the ratio P/N controls the degrees of freedom of the channel and represents the extent of correlation or amount of scattering in the channel, and thus, models the radio environment [6], [11], [47]- [50]. The smaller P/N is, the more concentrated the channel energy is in the non-zero eigenmodes and the more correlated the channel is. The normalization factor in (28) guarantees that the total average power of the channel is E tr( tr(R k ) = NK , as desired (see Section II). The covariance matrices of the users for the channel defined in (28) where the expectation is taken over b k .

A. CHANNEL COVARIANCE MATRICES AND ANGLE OF ARRIVAL (AoA) SUPPORTS
In this subsection, we present special cases of the channel covariance matrices namely; non-overlapping and overlapping angle of arrival (AoA) supports.

1) NON-OVERLAPPING AoA SUPPORTS
While the channel of different users associated with the BS can be random, they might exist in mutually orthogonal subspaces. This behaviour of the channel is known as a non-overlapping AoA support [24], [29], [51]- [53], in which the AoA of desired and interfering users are disjoint. In practice, the scenario of the non-overlapping signal subspaces between the desired and the interfering users can be realized when the ULA at the BS has a large number of antenna elements and when the users are well separated in the angular domain [51]. In the multiuser scenario with non-overlapping AoA supports, the covariance matrices of the users are all statistically different and independent of each other.
Recalling the EVD of R k given in (9) and the definition of non-overlapping AoA supports [51], in which the channel covariance matrices can be asymptotically orthogonal and satisfy In this special condition, the transmitted pilot sequences between users do not interfere. The unitary symmetric structure of the discrete Fourier transform (DFT) matrix can be used to obtain perfectly non-overlapping AoA supports with the P-DoF channel model (28). For a ULA with a large number of antenna elements, we expect to have similar performances for the channel model defined in (28) and that discussed in Section VII [29], due to their asymptotic equivalence. For the particular case of the DFT based channel model, the pilot matrix in (10) was constructed based on the principle of linear superposition of subsets formed from the columns of the DFT matrix. To this end, the SINR k expression for BF precoding, defined by (14), can be further simplified for the particular system under consideration. Specifically, for the P-DoF channel model and based on non-overlapping AoA supports, it is straightforward to obtain simplified expressions for tr( k ) and tr(R k k ) in (14) as where λ k,n ≥ 0 ∀k = 1, . . . , K and ∀n = 1, . . . , T p are the ordered eigenvalues for the channel covariance matrices R k namely, λ k,1 = λ k,2 = · · · = λ k,P = N /P and λ k,P+1 = · · · = λ k,N = 0 for each user. Substituting R k of the P-DoF channel model into (8) yields, after some algebraic manipulations, the channel estimate covariance matrix as where U k,m ∈ C N ×m denotes a matrix constructed from m eigenvectors of R k .

2) OVERLAPPING AoA SUPPORTS
When the users exhibit completely overlapping AoA supports, the covariance matrices of the users are all statistically the same and of the form R k = R ∀k = 1, . . . , K . This implies that the EVD of R is given by (9) where the pilot matrix S p ∈ C N ×T p in (10) is constructed from the first T p eigenvectors of U k = U ∀k = 1, . . . , K , corresponding to the largest eigenvalues of R. Specifically, for equal channel covariance matrices, the channel estimates of all the users are statistically equal. Straightforward algebra provides a simplified expression for the covariance matrix of the MMSE channel estimate with overlapping AoA supports, which is obtained in a manner similar to the non-overlapping AoA supports, discussed earlier.
Since the channel has P ≤ N degrees of freedom and the energy consumed by the training-phase is ρ p T p , choosing T p > P while keeping ρ p constant leads to the same channel estimation performance as T p = P but unnecessarily consumes more energy in the training-phase. Therefore, T p ≤ P is always assumed in the following. Though the transmitted power per user during the training-phase decreases by a factor of K due to the channel estimates of K users, the numerical results in Section VI show that the achievable sum rate is maximized when the users have orthogonal covariance matrices. This implies that the channel is divided into K interference free single users, and thus, suggests that the achievable sum rate performance benefits from the diversity of spatial channel correlations across multiple users during the data-phase. Thus, the more distinct the spatial correlations exhibited by the users, the less the residual interference is and, thus, enhanced system performance in terms of data rate can be achieved.
While the P-DoF channel model introduced in this section may seem highly idealistic, the numerical results in Section VI show that the optimal training sequence length selection is very similar also in more realistic channel models, such as the OR model [30], which justifies its use. The use of an analytical P-DoF channel model provides a straightforward system design methodology.

B. TRAINING SEQUENCE LENGTH OPTIMIZATION AND CLOSED-FORM SUM RATE ANALYSIS FOR BF PRECODING IN FDD SYSTEMS
In this subsection, we show how the optimum DL training sequence length is obtained analytically in a closed-form for the BF precoder at high SNR in the P-DoF channel.
We start by combining (32) with (30)- (31). Straightforward algebra provides asymptotic tight approximations for the SINR for the BF precoder with overlapping SINR BF and non-overlapping SINR BF ⊥ AoA supports, respectively, as Remark 1: Comparing (33) and (34) to Eq. (28) in [6], which provides an SINR approximation for the BF precoder in TDD massive MIMO when length K uplink pilot sequences are transmitted, shows that the effect of DL channel estimation on SINR is to replace the DoF of the channel P by the training sequence length T p ≤ P. Clearly, the choice T p = P maximizes (34) and yields the same SINR as with UL channel estimation. With this choice, however, the loss in achievable rate due to training, (1 − T p /T c ), dominates the increase in SINR when P becomes sufficiently large. This intuitive statement is formalized for the case of the high SNR region described in Propositions 3 & 4 below. An interference-free scenario is obtained in the SINR expression (34) with the non-overlapping AoA supports due to the independent correlation matrices between users. In contrast, the interference is maximized in the SINR expression (33) with the fully overlapping AoA supports due to the condition of common correlation for all users R k = R and k = ∀k = 1, . . . , K . Both SINR terms (33) and (34) are equal when the number of users K = 1 or ρ d = P/N . The necessary condition for the SINR with the non-overlapping AoA supports (34) to be higher than the SINR with the overlapping AoA supports (33) is that ρ d > P/N and K > 1. In practice, when ρ d > 0 dB, SINR BF ⊥ > SINR BF .
Substituting first (33) into (13) and then (34) provides a fast numerical optimization method for the BF precoder. Importantly, at high SNR, i.e. when ρ d = ρ p = ρ → ∞, analytical solutions emerge. In this special case the average achievable sum rate with overlapping AoA supportsC BF,∞ and non-overlappingC BF,∞ ⊥ , respectively, further simplify tō which leads to Propositions 3 and 4. Proposition 3: For K users, P-DoF channel model with overlapping AoA supports and channel coherence time T c , the downlink training sequence length T * p that maximizes the average achievable sum rate, in a massive MIMO system using BF precoding, at high SNR and with uniform power allocation ρ d = ρ p , is The Lambert W -function W (·) is defined in [54] and e is Euler's number. In (37), · and · denote the ceiling and floor functions, respectively, which accommodates the necessary integer value of T p given that τ ∈ R.
Proof: A proof of Proposition 3 is presented in Appendix A.
Proposition 4: For K users, P-DoF channel mode with non-overlapping AoA supports and channel coherence time T c , the novel downlink training sequence length T * p that maximizes the average achievable sum rate, in a massive MIMO system using BF precoding, at high SNR and with uniform power allocation is where The proof of Proposition 4 is similar to the proof of Proposition 3 and, thus, we have omitted for brevity. Clearly, Proposition 4 is the same as Proposition 3 with K = 1. From Propositions 3 & 4, the optimal DL training sequence lengths for the BF precoder in the P-DoF channel at high SNR are characterized as follows. T * p equals the degrees of freedom of the channel when P is less than τ and saturates at, or below, τ , when the DoF exceeds (38), (40) with the overlapping and non-overlapping AoA supports, respectively. The numerical results in the next section confirm that the same behavior is observed also at moderate SNRs, as desired.
The achievable sum rate with DL channel estimation and BF precoding can also be upper bounded using Propositions 3 & 4. Specifically, given K and T c , the achievable sum rates for BF precoding with overlapping and non-overlapping AoA supports and ρ d = ρ p are upper bounded respectively bȳ Since τ does not depend on P or N , the rate saturates at a constant level below (41), (42) when the DoF exceeds (38), (40), i.e. P > τ . This also means that the asymptotic sum rate of the P-DoF channel with BF precoding, as a function of N , is independent of the ratio P/N ∈ (0, 1], although the more rank deficient the correlation matrix is the more BS antennas are needed to approach (41), (42). It should be pointed out that this behavior is in contrast to BF precoding with UL training where the SINR, and hence the sum rate, grows without bound as a function of P when K and T c are fixed. Importantly, the numerical results in Section VI show that the optimum pilot sequence length obtained from Proposition 4 for the BF precoder based on the P-DoF channel model is correspondingly similar to those based on a practical OR channel model. The observation means near optimal solutions can be readily found without resorting to computationally intensive optimization techniques.

C. CLOSED-FORM SUM RATE ANALYSIS FOR THE RZF PRECODER IN FDD SYSTEMS
This subsection provides an analytical closed-form expression for the achievable sum rate when the BS employs RZF precoding and the P-DoF channel model, as defined in (28), is used. Though the derivation is protracted, it is algebraically straightforward, therefore, the details have been omitted for brevity. The following Propositions summarize the key results.

Proposition 5: An asymptotically tight approximation for the SINR for RZF precoding based on an analytical P-DoF channel model with overlapping AoA supports is given in terms of auxiliary variablesδ,ξ andμ ∈ R as
whereδ is a closed-form solution to the fixed-point equation in (16)-(17) when R k = R and k = ∀k = 1, . . . , K and reads whereZ = Pζ 1 + 1/ρ p with training power scaled by a factor of K (i.e. ρ p K ). Parametersξ andμ are simplified versions of (23) and (24) with overlapping AoA supports, which are obtained from the expressions given in (45) and (46) as shown at the bottom of the next page, withQ = K +Z +δZ .

Proposition 6: An asymptotically tight approximation for the SINR for RZF precoding based on the P-DoF channel model with non-overlapping AoA supports is given in terms of auxiliary variablesδ,ξ andμ ∈ R as
Parameterδ is a closed-form solution to the fixed-point equations (16)-(17) when k is of the form (32) and reads δ = where Z = Pζ 1 + P K N ρ p , while the variablesξ andμ are simplified forms of (23) and (24), respectively, and given as where we have denoted Q = 1 + Z +δZ for notational convenience. From Propositions 5 & 6, an approximation for the achievable rate for RZF precoding can be readily calculated using (13) when the relevant system parameters in terms of the P, N , K , ζ , and ρ d are known. Due to the complexity of the SINR expression, obtaining an analytical solution for the optimum training sequence length, as in the case of BF precoding, is still very challenging. Nonetheless, the expressions of the SINR for the RZF precoder in (43) and (47) are now given in simplified closed-form, which makes evaluation of the achievable sum rate numerically straightforward. As such, the optimization problem in (13) is now feasible even for a brute-force search. The numerical experiments also show that the BF and RZF precoders have very similar optimal training sequence lengths and, thus, T * p can be reliably chosen also for the RZF precoder using Proposition 4.
In the following section, numerical results from analysis and simulation are presented for the BF and RZF precoders when the P-DoF and OR channel models are used. The salient system parameters explored are P, N , T c , SNR and K . For the P-DoF channel model, strong and weak channel correlation are indicated by the ratios P/N = 0.1 and P/N = 1, respectively. The parameterisation of the OR channel model follows [25], [55] Table 1.

VI. NUMERICAL RESULTS AND DISCUSSION
This section presents several simulation and theoretical results, which characterize the system performance of the BF and RZF precoders, in correlated and uncorrelated channels. The impact of increasing the number of BS antennas while keeping the coherence time fixed on the achievable sum rate of a massive MIMO system is investigated. Comparison between the sum rate of the UL channel estimation as used in a conventional TDD massive MIMO system and the DL channel estimation in an FDD system is also provided. Results that characterize the achievable sum rate performance of the proposed superposition sequence design are presented and compared with the state-of-the-art sequence designs. Furthermore, the computational complexity of the proposed superposition training design is analysed and compared with the state-of-the-art sequence designs. Fig. 3 show plots of the achievable sum rate and the optimum training sequence length T * p , respectively, versus the number of BS antennas N , comparing precoder performance in correlated and uncorrelated channels when the P-DoF channel model is used with T c = 100 symbols, SNR = 10 dB and K = 10 users. Curves for the BF and RZF precoders are plotted for three computational methods as follows: numerical (BF & RZF) based on equations (13), (34), (33), (47), (43); analytical (BF only) based on equations (36), (35) using the pilot sequence lengths that are chosen according to Propositions 4 & 3, respectively; and simulated (BF & RZF) based on equation (11). These computational methods provide validation between the theoretical and simulated performances, which show excellent agreement throughout.

Fig. 2 and
In Fig. 2, the achievable sum rate of the BF precoder, for the uncorrelated channel (i.e. P/N = 1), increases steeply with N before saturating at about 14 bit/s/Hz for values of N > 30. In contrast, the sum rate for the BF precoder in the correlated channel with P/N = 0.1 increases more gradually before saturating at ∼35 bit/s/Hz, for values of N > 200. The saturation of the sum rate for the BF precoder, regardless  of the level of correlation in the channel, follows the behavior predicted by (42) & (41) for correlated and uncorrelated channels, respectively. For the RZF precoder in the uncorrelated channel, the sum rate again increases rapidly reaching a peak value of ∼41 bit/s/Hz at N = 29. For values of N > 29, the sum rate slowly decreases monotonically, plateauing at ∼21 bit/s/Hz for N = 500. However, the sum rate for the RZF precoder in the correlated channel with P/N = 0.1 increases more rapidly before reaching a larger peak value of ∼75 bit/s/Hz at N = 150. For values of N > 150, the sum rate slightly decreases monotonically reaching ∼67 bit/s/Hz at N = 500. As the number of BS antennas N increases, the sum rate decreases due to the residual interference caused by imperfect channel estimation. In particular, as N gets large, the residual interference increases because the capability of the RZF precoder to cancel the interference decreases. The results show that the BF and RZF precoders achieve a significant improvement in the maximum value of achievable sum rate when the channel is strongly correlated. In Fig. 3, for the uncorrelated channel, when P/N = 1, the optimum training sequence length T * p initially increases linearly with N for both the BF and RZF precoders. This region of the plots corresponds to when P < τ in (39). Conversely, when P > τ defines a region where T * p saturates. For the BF precoder, T * p saturates at 34 symbols when N = 34 whereas for the RZF precoder T * p continues to increase gradually before saturating at 40 symbols when N = 400. For the correlated channel, where P/N = 0.1, a similar linear characteristic is observed for N up to 230 for the BF precoder and 200 for the RZF precoder. After these regions, T * p saturates at 23 symbols for the BF precoder while T * p continues to increase slightly for the RZF precoder. The results in Fig. 2 & Fig. 3 confirm that maximizing the sum rate leads to a feasible optimum training sequence length when DL channel estimation is used in an FDD massive MIMO system. Importantly, the results show that a feasible sum rate can be realized even with the uncorrelated channels, i.e., P/N = 1. This is justified by the fact that the achievable sum rate is obtained by optimizing the training sequence length through the maximization of (1 − T p /T c ) K k=1 log 2 (1 + SINR k ), instead of only minimizing the mean square error of the channel estimate, as typically considered in the conventional analyses of FDD and TDD systems. Furthermore, the results confirm that excellent agreement between the numerical, analytical and simulated results was obtained, which underpins the contributions of this research.

B. COMPARING SYSTEM PERFORMANCE OF THE PROPOSED TRAINING SEQUENCE DESIGN AND THE STATE-OF-THE-ART DESIGNS
Having demonstrated the feasibility of a DL training sequence based on superposition, we compare the achievable sum rate performance of the superposition sequence design with the SMSE/SCMI designs. We compare the system performances based on both the P-DoF and OR channel models. Fig. 4 were obtained for the correlated P-DoF channel model with P/N = 0.1 whereas the other salient system parameters remain unchanged at T c = 100 symbols, SNR = 10 dB and K = 10 users. Note that the curves for the superposition sequence design correspond to those already presented in Fig. 2. Fig. 4 demonstrates that for both types of precoders, all the training sequence designs exhibit essentially the same sum rate performances. For example, with a BF precoder, the rates saturate at 35 bit/s/Hz, whereas in the RZF precoder the rates peak at ∼75 bit/s/Hz for N = 150. The dotted lines for SMSE, and the dash-dotted lines for SCMI are indistinguishable from the solid line of the proposed superposition sequence design for both BF and RZF precoders. Importantly, Fig. 4 also shows that the analytical closed-form solution for the optimum pilot sequence length mathematically derived in Proposition 4 for the DL BF precoder with a correlated channel can be reliably selected for the DL RZF precoder (see green markers). Though not plotted here, this observation remains valid for the case of uncorrelated channels, i.e., P/N = 1, where the optimum pilot sequence length of the BF precoder in Proposition 3 can be effectively used to predict the rate performance of the RZF precoder for all the training sequence designs.

2) SYSTEM PERFORMANCE FOR THE OR CHANNEL MODEL
So far, the results presented are based on the analytical P-DoF channel model. Being analytical, this model facilitates the simulation, numerical calculation and theoretical analysis of the system performance, giving excellent agreement between the three types of results. It also allows the channel correlation factor to be straightforwardly set in computations. In order to validate the applicability of the P-DoF channel model, this section considers the application of an alternative channel model called the OR scattering channel model [30], which is a more practical channel model frequently encountered in the open literature on MIMO evaluation. The OR scattering channel model represents an environment where all scatterers are located on a ring around the UT and there is no local scattering around the BS.
The system geometric parameters of the user's channel covariance matrices R k in the OR model are determined by the angular spread ω, angles of arrival θ k , and normalized antenna spacing D in wavelengths. Specifically, the (m, n)th element of R k is given in Toeplitz form [25], [30] as where represents the intervals/ranges of the AoAs distribution (i.e. ∈ [−60 • , 60 • ]) since a 120 • sector is considered in this paper. The integration in (51) is computed numerically whereas the DL instantaneous channel realization is given by h k = R 1/2 khk , where the elements ofh k ∈ C N are independent and identically distributed with zero mean and unit variance [25]. Due to the different scattering geometries associated with each user's position in a geographic area, the angular support of each user's channel appears random. The randomness in the user locations captures the fact that the angular supports of the users may partially overlap. In addition, when the BS antennas are closely spaced and the amount of scattering around the UT is limited, as indicated by both D and ω being small, some of the eigenvalues of R k are close to zero, making R k effectively rank deficient. In contrast to the P-DoF channel model, in the OR channel model the non-zero eigenvalues are not usually equal.
An approximation of the actual rank r k ∈ Z + for large but finite N that contain the effective non-zero eigenvalues of the channel covariance matrix in (51) is given by [29] where β k = min{1, f (D, ω, θ k )} is the asymptotic normalized rank of the channel covariance matrix While (52) can accurately predict the rank of the OR channel model that is related to the number of non-zero eigenvalues of R k , this number may differ between users due to different θ k . Hence, the maximum number of the effective non-zero eigenvalues across all of the users is selected, i.e. r = max k=1, ..., K {r k }, to ensure that all the relevant eigenvectors of each R k , corresponding to the largest eigenvalues over all the users, are accounted for. When the rank of the OR channel model is obtained, the training sequence length can be reliably selected based on the results derived in Section V-B for the analytical P-DoF channel model. Fig. 5 and Fig. 6 plot the achievable sum rate versus the number of BS antennas N , comparing the proposed superposition training sequence design (10) with the SMSE/SCMI designs based on the OR scattering channel model for the BF and RZF precoders, respectively. Fig. 5 and Fig. 6 are obtained with T c = 100 symbols, SNR = 10 dB and K = 10 users. The solid lines depict numerical analysis based on random matrix theory, while the colored markers denote simulation. Fig. 5 shows almost the same rate performances are obtained in all the training sequence designs for the BF precoder. In Fig. 6, marginal loss in the rate performance is obtained with the proposed superposition training design in comparison with the state-of-the-art SMSE/SCMI designs.
As expected, results in Fig. 6 shows that the RZF precoder achieves greater sum rate than the BF precoder in Fig. 5, both in the correlated and the uncorrelated channels. Training sequences based on the superposition design achieve a slightly lower sum rate than either the SMSE or SCMI designs when RZF precoding is used, which is attributed to VOLUME 8, 2020  the nonoptimal cancellation of interference in the superposition design. Also, the results in Fig. 5 and Fig. 6 demonstrate that large improvements in the sum rate performances are obtained for both the BF and RZF precoders when the channels are correlated i.e. ω = 5 • , D = 1/2. The results also confirm that excellent agreement between numerical and simulated modelling is obtained with the more realistic OR model. Fig. 7 plots achievable sum rate versus N for the BF and RZF precoders showing the impact of using the optimum pilot sequence length obtained by Proposition 4 for the BF precoder with the P-DoF channel model over the OR scattering channel model. The curves for the BF and RZF precoders in Fig. 7 are obtained numerically based on equations (13), (14), (15), which were developed using the method of RMT. The parameter values for the OR channel model, are ω = 5 • and D = 1/2. The other salient system parameters remain unchanged at T c = 100 symbols, SNR = 10 dB and K = 10. The solid lines depict the superposition training sequence design (10) and dotted lines and dash-dotted lines represent the SMSE/SCMI designs, respectively, while the colored markers denote the optimum pilot sequence length obtained by Proposition 4 for the BF precoder using the P-DoF channel model. The result in Fig. 7 confirms that the optimum training sequence length for the BF precoder with the P-DoF channel model provides close agreement also in the more realistic OR channel model for both the BF and RZF precoders. In particular, it is possible to apply Proposition 4 to the three training sequence designs in order to obviate the need to search for an optimum training sequence length for all training methods in the more realistic OR channel model. Though not plotted here, similar behaviour is observed when the channels are relatively uncorrelated, i.e., ω = 20 • and D = 1, where the optimum pilot length obtained from Proposition 3 for the BF precoder based on the P-DoF channel model can also be used with high accuracy to predict the sum rate performance in the more realistic OR channel model for both the BF and RZF precoders. Overall, Fig. 7 demonstrates the effectiveness of using the P-DoF channel model to provide a practical system design approach, which accurately predicts the performance in more realistic channel models.

C. COMPARING SYSTEM PERFORMANCE FOR DOWNLINK FDD AND UPLINK TDD CHANNEL ESTIMATION
Having demonstrated the performance of DL training sequence and channel estimation of an FDD multiuser massive MIMO system, it is pertinent to compare the achievable sum rate performance with that obtained for UL channel estimation as conventionally used in a TDD system whereby the uplink-downlink channel with perfect reciprocity is considered. In a TDD system, a superposition of orthogonal UL training sequences is transmitted by the users to the BS, and the BS estimates the UL channel by using an MMSE channel estimator. For reciprocal channels, the number of pilot symbols required for the UL channel estimation is proportional to the number of users K , which reflects the DoF on the UL. Fig. 8 plots the achievable sum rate versus the number of BS antennas N , comparing the performance of the proposed superposition training design in (10) for the DL channel estimation of an FDD system with the UL channel estimation of a TDD system in the OR channel model. In particular, Fig. 8 is obtained with T c = 100 symbols, SNR = 10 dB and K = 10 users, where the parameter values for the OR channel model are chosen with angular spread ω = 5 • , and normalized antenna spacing D = 1/2. These parameter values imply relatively strong correlation. The results in Fig. 8 show that for the system parameters considered, over a practical number of BS array sizes of N < 250 antennas, the DL and UL sum rate performances are comparable in the more realistic OR channel model with correlated channels under both the BF and RZF precoders considered. Specifically, Fig. 8 demonstrates that DL channel estimation with the proposed superposition training design is effective in strongly correlated channels.

D. COMPUTATIONAL COMPLEXITY ANALYSIS AND PERFORMANCE EVALUATION OF DIFFERENT TRAINING DESIGNS IN FDD
In this subsection, we present the computational complexity analysis of the superposition training sequence design and the SMSE/SCMI sequence designs. We compare the overall computational complexity, which is obtained by multiplying the number of iterations each algorithm needs to converge by the number of complex floating point operations (flops) involved per iteration. The superposition training design requires only one iteration. For a fair comparison, we use the same error tolerance = 0.001 for the SMSE and SCMI sequence designs. Table 2 summarizes the complexity analysis in flops per iteration [56] for the three training sequence designs. Parameters t d and t h represent the number of iteration required to optimize the step size in the SMSE algorithm. The variable X is given as X = T 3 p + T 2 p + N 2 (6T p + 2)+3NT p (2T p − 1) − 2. Also, r denotes the maximum number of effective non-zero eigenvalues (i.e. the rank) of the channel covariance matrices across the users. Below are the details to explain how the computational complexity analysis of the proposed superposition sequence design was developed in Table 2. Particularly, the analysis is obtained by counting the number of multiplications and additions [56] of each step in the superposition pilot scheme. Appendix B explains further how the computational complexity of state-of-the-art iterative algorithms were obtained in Table 2.
• Obtaining the EVD of an N × N rank deficient matrix in (9) for K users needs KN 2 r flops.
• In order to obtain the term K k=1 U k with an N × T p matrix in (10), (K − 1)(NT p ) flops are required.
• The scalar matrix multiplication with an N × T p matrix needs NT p flops.
• Combining all the flops calculated above, leads to the complexity analysis of the superposition pilot scheme in Table 2. Fig. 9 shows plots of the computational complexity versus the number of BS antennas N , comparing the superimposed training sequence design with the SMSE and SCMI training sequence designs. The results in Fig. 9 were obtained for the OR model with ω = 5 • , D = 1/2 using the pilot sequence length for the BF precoder. 1 The other salient system parameters remain unchanged at T c = 100 symbols, SNR = 10 dB and K = 10 users. The results in Fig. 9 indicates the complexity of the superposition sequence design remains significantly lower than the sequence designs based on iterative algorithms. In particular, the results demonstrate that more then a four orders-of-magnitude reduction in computational complexity is obtained using the proposed superposition approach. Hence, signifying the feasibility of the superposition training sequence design for practical implementations compared with state-of-the-art iterative algorithms. This result is a significant outcome from the research. Table 3 shows the relative increase in complexity   of SCMI and SMSE over superposition for representative antenna arrays of N = 50, 100 and 150 elements. The table highlights the considerable reduction in complexity achieved by superposition.

VII. CONCLUSIONS
In this paper, the principle of linear superposition of sequences constructed from the users' channel correlation matrices is proposed to provide a feasible solution for DL channel estimation in an FDD multiuser massive MIMO communications systems without resorting to computationally intensive iterative algorithms. Based on the superimposed training structure, we have provided a novel analytical closed-form solution for the optimum training sequence length T * p that maximizes the DL achievable sum rate (1 − T p /T c ) K k=1 log 2 (1 + SINR k ), defined by (11) in the BF precoder. Additionally, an analytical approximation for the achievable sum rate of the BF and RZF precoders has been provided using asymptotic random matrix theory and the P-DoF channel model, which is used to avoid executing extensive Monte Carlo simulations and allowed an analytical solution for the optimization problem considered to be obtained with low-complexity. The numerical results showed that these approximations are accurate for practical, finite systems parameters in terms of N and K . Results characterizing the system performance for the BF and RZF precoders are presented for an analytical P-DoF and the OR channel models. The numerical results showed that the proposed training sequences offer comparable sum rate performances to the state-of-the-art sequences while reducing the computational complexity substantially. Furthermore, comparison between the correlated channels with K independent channel covariance matrices and uncorrelated channels with identical channel covariance matrices R k = I N ∀k = 1, . . . , K was also provided. The analyses of the results have shown that the diversity of spatial correlations between multiple users significantly enhances the achievable sum rate of an FDD massive MIMO system using DL channel estimation. This paper also provided comparisons between the achievable sum rates of the UL CSI estimation as conventionally used in a TDD system and the DL CSI estimation in an FDD system. The results showed that for practical BS array sizes of N < 250 antennas and limited coherence time, the sum rate of an FDD system using DL channel estimation is comparable to the performance of a TDD system in relatively strongly correlated channels. Our findings are supported by a rigorous mathematical analysis, which tightly agrees with our simulated results, which underpin the key contribution of this research. Importantly, using the framework analysis developed in this paper, we found that the optimum T * p that is analytically optimized for the BF precoder is sufficient to predict the achievable sum rate performance of the RZF precoder, which remains near optimal. This observation also remains highly valid in the more realistic OR channel model. This result leads to large reductions in the computational complexity of the proposed approach. Overall, the proposed design paradigm opens up the possibility for FDD massive MIMO systems operating in a general scenario of single-stage precoding and distinct spatial correlations with limited coherence time. Future work will investigate the effects of other system parameters and configurations including removing the assumption of an error and delay free feedback channel, taking into account multi-cell multiuser operation, and considering a uniform planar array of antenna elements. The authors are also interested in applying the paper's findings to millimetre wave bands.

APPENDIX A PROOF OF PROPOSITION 3
To find the optimum training sequence length for BF precoding in a P-DoF channel at high SNR, as given in Proposition 3, we first relax the requirement for the training sequence length