Optimizing Pilot Spacing in MU-MIMO Systems Operating Over Aging Channels

In the uplink of multiuser multiple input multiple output (MU-MIMO) systems operating over aging channels, pilot spacing is crucial for acquiring channel state information and achieving high signal-to-interference-plus-noise ratio (SINR). Somewhat surprisingly, very few works examine the impact of pilot spacing on the correlation structure of subsequent channel estimates and the resulting quality of channel state information considering channel aging. In this paper, we consider a fast-fading environment characterized by its exponentially decaying autocorrelation function, and model pilot spacing as a sampling problem to capture the inherent trade-off between the quality of channel state information and the number of symbols available for information carrying data symbols. We first establish a quasi-closed form for the achievable deterministic equivalent SINR when the channel estimation algorithm utilizes multiple pilot signals. Next, we establish upper bounds on the achievable SINR and spectral efficiency, as a function of pilot spacing, which helps to find the optimum pilot spacing within a limited search space. Our key insight is that to maximize the achievable SINR and the spectral efficiency of MU-MIMO systems, proper pilot spacing must be applied to control the impact of the aging channel and to tune the trade-off between pilot and data symbols.


I. INTRODUCTION
In wireless communications, pilot symbol-assisted channel estimation and prediction are used to achieve reliable coherent reception, and thereby to provide a variety of high quality services in a spectrum efficient manner.In most practical systems, the transmitter and receiver nodes acquire and predict channel state information by employing predefined pilot sequences during the training phase, after which information symbols can be appropriately modulated and precoded at the transmitter and estimated at the receiver.Since the elapsed time between pilot transmissions and the transmit power level of pilot symbols have a large impact on the quality of channel estimation, a large number of papers investigated the optimal spacing and power control of pilot signals in both single and multiple antenna systems [1]- [12].
Specifically in the uplink of multiuser multiple input multiple output (MU-MIMO) systems, several papers proposed pilot-based channel estimation and receiver algorithms assuming that the complex vector channel undergoes block fading, meaning that the channel is constant between two subsequent channel estimation instances [13]- [16].In the block fading model, the evolution of the channel is memoryless in the sense that each channel realization is drawn independently of previous channel instances from some characteristic distribution.While the block fading model is useful for obtaining analytical expressions for the achievable signal-to-interference-plus-noise ratio (SINR) and capacity [15], [17], it fails to capture the correlation between subsequent channel realizations and the aging of the channel between estimation instances [6], [7], [11], [12].
Due to the importance of capturing the evolution of the wireless channel in time, several papers developed time-varying channel models, as an alternative to block fading models, whose states are advantageously estimated and predicted by means of suitably spaced pilot signals.In particular, a large number of related works assume that the wireless channel can be represented as an autoregressive (AR) process whose states are estimated and predicted using Kalman filters, which exploit the correlation between subsequent channel realizations [3], [4], [6], [10], [12].These papers assume that the coefficients of the related AR process are known, and the current and future states of the process (and thereby of the wireless channel) can be well estimated.Other important related works concentrate on estimating the coefficients of AR processes based on suitable pilot-based observations and measurements [18]- [20].In our recent work [12], it was shown that when an AR process is a good model of the wireless channel and the AR coefficients are well estimated, not only the channel estimation can exploit the memoryful property of the channel, but also a new MU-MIMO receiver can be designed, which minimizes the mean squared error (MSE) of the received data symbols by exploiting the correlation between subsequent channel states.It is important to realize that the above references build on discrete time AR models, in which the state transition matrix is an input of the model and can be estimated by some suitable system identification technique, such as the one proposed in [20].However, these papers do not ask the question of how often the channel state of a continuous time channel should be observed by suitably spaced pilot signals to realize a certain state transition matrix in the AR model of the channel.
Specifically, a key characteristic of a continuous time Rayleigh fading environment is that the autocorrelation func-arXiv:2204.00213v1[eess.SP] 1 Apr 2022 tion of the associated stochastic process is a zeroth-order Bessel function, which must be properly modelled [21], [22].This requirement is problematic when developing discrete-time AR models, since it is well-known that Rayleigh fading cannot be perfectly modelled with any finite order AR process (since the autocorrelation function of discrete time AR processes does not follow a Bessel function), although the statistics of AR process can approximate those of Rayleigh fading [23], [24].
Recognizing the importance of modeling fast fading, including Rayleigh fading, channels with proper autocorrelation function as a basis for pilot spacing optimization, papers [25], [26] use a continuous time process as a representation of the wireless channel, and address the problem of pilot spacing as a sampling problem.According to this approach, pilot placement can be considered as a sampling problem of the fading variations, and the quality of the channel estimate is determined by the density and accuracy of channel sampling [26].However, these papers consider single input single output (SISO) systems, do not deal with the problem of pilot and data power control, and are not applicable to MU-MIMO systems employing a minimum mean squared error (MMSE) receiver, which was proposed in, for example, [12].On the other hand, paper [6] analyzes the impact of channel aging on the performance of MIMO systems, without investigating the interplay between pilot spacing and the resulting state transition matrix of the AR model of the fast fading channel.The most important related works, their assumptions and key performance metrics are listed and compared with those of the current paper in Table I.
In this paper, we are interested in determining the average SINR in the uplink of MU-MIMO systems operating in fast fading as a function of pilot spacing, pilot/data power allocation, number of antennas and spatially multiplexed users.Specifically, we ask the following two important questions, which are not answered by previous works: • What is the average SINR in a closed or quasi-closed form in the uplink of MU-MIMO systems in fast fading in the presence of antenna correlation?How does the average SINR depend on pilot spacing and pilot/data power control?• What is the optimum pilot spacing and pilot/data power allocation as a function of the number of antennas and the Doppler frequency associated with the continuous time fast fading channel?In the light of the above discussion and questions, the main contributions of the present paper are as follows: • Proposition 1 derives the asymptotic deterministic equivalent SINR of any user in a MU-MIMO system in every data slot for fast fading channels that can be characterized by an associated AR process; • Theorem 1 and Proposition 2 establish an upper bound on the achievable SINR as a function of pilot spacing, which is instrumental for determining the optimum pilot spacing.
• Proposition 3, building on Proposition 2, provides and upper bound on the average achievable spectral efficiency, which is instrumental in limiting the search space for the optimal frame size as a function of the Doppler frequency.
In addition, we believe that the engineering insights drawn from the numerical studies are useful when designing pilot spacing, for example in the form of determining the number of reference signals in an uplink frame structure, for MU-MIMO systems.
Specifically, to answer the above questions, we proceed as follows.In the next section, we present our system model, which admits correlated wireless channels between any of the single-antenna mobile terminal and the receive antennas of the base station (BS).Next, Section III focuses on channel estimation, which is based on subsequent pilot-based measurements and an MMSE-interpolation for the channel states in between estimation instances.Section IV proposes an algorithm to determine the average SINR.Section V studies the impact of pilot spacing and power control on the achievable SINR and the spectral efficiency (SE) of all users in the system.That section investigates the impact of pilot spacing on the achievable SINR and establishes an upper bound on this SINR.We show that this upper bound is monotonically decreasing as the function of pilot spacing.This property is very useful, because it enables to limit the search space of the possible pilot spacings when looking for the optimum pilot spacing in Section VI.That section also considers the special case when the channel coefficients associated with the different receive antennas are uncorrelated and identically distributed.It turns out that in this special case a simplified SINR expression can be derived.Section VII presents numerical results and discusses engineering insights.Finally, Section VIII draws conclusions.

A. Uplink Pilot Signal Model
By extending the single antenna channel model of [25], each transmitting mobile station (MS) uses a single time slot to send F pilot symbols, followed by ∆ time slots, each of which containing F data symbols according to Figure 1.Each symbol is transmitted within a coherent time slot of duration T .Thus, the total frame duration is (1 + ∆)T , such that each frame consists of 1 pilot and ∆ data time slots, which we will index with i = 1 . . .∆. User-k transmits each of the F pilot symbols with transmit power P p,k , and each data symbol in slot-i with transmit power P k (i), k = 1 . . .K. To simplify notation, in the sequel we tag User-1, and will drop index k = 1 when referring to the tagged user.
Assuming that the coherence bandwidth accommodates at least F pilot symbols, this system allows to create F orthogonal pilot sequences.To facilitate spatial multiplexing and channel state information at the receiver (CSIR) acquisition at the BS, the MSs use orthogonal complex sequences, such as shifted Zadoff-Chu sequences of length τ p = F , which we denote as: ∆: number of data slots T : symbol duration F frequencies t ĥ(0) ĥ((∆+1)T ) h(t) ĥ(t) Figure 1.Pilot (P) and data (D) symbols in the time-frequency domains of the system in the (0, (∆ + 1)T ) interval.The solid line above the time-frequency resource grid represents the continuous time complex channel h(t), while the dashed line represents the MMSE channel estimate ĥ(t).Notice that in each time slot of length T all symbols are either pilot or data symbols.
whose elements satisfy |s i | 2 = 1.Under this assumption, the system can spatially multiplex K ≤ F MSs. Focusing on the received pilot signal from the tagged user at the BS, the received pilot signal takes the form of [12]: where h(t) ∈ C Nr×1 ∼ CN (0, C), that is, h(t) is a complex normal distributed column vector with mean vector 0 and covariance matrix C ∈ C Nr×Nr .Furthermore, α denotes large scale fading, P p denotes the pilot power of the tagged user, and N(t) ∈ C Nr×τp is the additive white Gaussian noise (AWGN) with element-wise variance σ 2 p .It will be convenient to introduce Ỹp (t) by stacking the columns of Y p (t) as: where vec is the column stacking vector operator, Ỹp (t), Ñ(t) ∈ C τpNr×1 and S s ⊗ I Nr ∈ C τpNr×Nr is such that S H S = τ p I Nr , where I Nr is the identity matrix of size N r .

B. Channel Model
In (2), the channel h(t) evolves continuously according to a multivariate complex stochastic process with stationary covariance matrix C. That is, for symbol duration T , the channel (h(t)) evolves according to the following AR process: where the transition matrix of the AR process is denoted by A. This AR model has been commonly used to approximate Rayleigh fading channels in e.g.[28].Equation ( 4) implies that the autocorrelation function of the channel process is: Consequently, the autocorrelation function of the fast fading channel (h(t)) is modelled as: where matrix Q describes the correlation decay, such that: Similarly, for user k, In each pilot slot, the BS utilizes MMSE channel estimation to obtain the channel estimate of each user, as it will be detailed in Section III.Without loss of generality, to simplify the notation, hereafter we assume that the time unit is T and iT = i.

C. Data Signal Model
When spatially multiplexing K MU-MIMO users, the received data signal at the BS at time t is [12]: where y(t) ∈ C Nr×1 ; and x k (t) denotes the transmitted data symbol of User-k at time t with transmit power P k .Furthermore, n d (t) ∼ CN 0, σ 2 d I Nr is the AWGN at the receiver.

III. CHANNEL ESTIMATION
In this section, we are interested in calculating the MMSE estimation of the channel in each slot i, based on received pilot signals, as a function of the frame size corresponding to pilot spacing (see ∆ in Figure 1).Note that estimating the channel at the receiver can be based on multiple received pilot signals both before and after the actual data slot i.While using pilot signals that are received before data slot i requires to store the samples of the received pilot, using pilot signals that arrive after data slot i necessarily induces some delay in estimating the transmitted data symbol.In the numerical section, we will refer to specific channel estimation strategies as, for example, "1 before, 1 after" or "2 before, 1 after" depending on the number of utilized pilot signals received prior to or following data slot i for CSIR acquisition.In the sequel we use the specific case of "2 before, 1 after" to illustrate the operation of the MMSE channel estimation scheme, that is when the receiver uses the pilot signals Ỹp (−∆ − 1), Ỹp (0), and Ỹp (∆ + 1) for CSIR acquisition.We are also interested in determining the distribution of the resulting channel estimation error, whose covariance matrix, denoted by Z(∆, i), will play an important role in subsequently determining the deterministic equivalent of the SINR.

A. MMSE Channel Estimation and Channel Estimation Error
As illustrated in Figure 1, in each data slot i, the BS utilizes the MMSE estimates of the channel obtained in the neighboring pilot slots, for example at (−∆−1), 0 and (∆+1), using the respective received pilot signals according to (3), that is Ỹp (−∆−1) , Ỹp (0) and Ỹp (∆+1) , using the following lemma.
Lemma 1.The MMSE channel estimator approximates the autoregressive fast fading channel in time slot i based on the received pilots at (−∆ − 1), 0 and (∆ + 1) as where Proof.The MMSE channel estimator aims at minimizing the MSE between the channel estimate ĥMMSE (∆, i) = H (∆, i) Ŷp (∆) and the channel h(i), that is The solution of this quadratic optimization problem is Let Using h(∆), we have Since h(∆) and Ñ(∆) are independent and Therefore, for F(∆) and a(∆, i), we have The MMSE estimate of the channel is then expressed as: Next, we are interested in deriving the distribution of the estimated channel and the channel estimation error, since these will be important for understanding the impact of pilot spacing on the achievable SINR and spectral efficiency of the MU-MIMO system.To this end, the following two corollaries of Lemma 1 and (20) will be important in the sequel.
An immediate consequence of Corollary 1 is the following corollary regarding the covariance of the channel estimation error, as a function of pilot spacing.
Corollary 2. The channel estimation error in slot i, ĥMMSE (∆, i)−h(∆, i), is complex normal distributed with zero mean vector and covariance matrix given by: In the following section we will calculate the SINR of the received data symbols.For simplicity of notation, we use ĥMMSE (∆, i) = ĥ(∆, i), and introduce b(∆, i) α P (i) ĥ(∆, i)

B. Summary
This section derived the MMSE channel estimator (Lemma 1) that uses the received pilot signals both before and after a given data slot i and depends on the frame size ∆ (pilot spacing).As important corollaries of the channel estimation scheme, we established the distribution of both the estimated channel (Corollary 1) and the associated channel estimation error in each data slot i (Corollary 2), as functions of both the employed pilot spacing and pilot power.These results serve as a starting point for deriving the achievable SINR and spectral efficiency.

A. Instantaneous SINR
We start with recalling an important lemma from [29], which calculates the instantaneous SINR in an AR fast fading environment when the BS uses the MMSE estimation of the fading channel, and employs the optimal linear receiver: where J(∆, i) ∈ C Nr×Nr is defined as where When using the above receiver, which minimizes the MSE of the received data symbols in the presence of channel estimation errors, the following result from [29] will be useful in the sequel: Lemma 2 (See [29], Lemma 3).Assume that the receiver employs MMSE symbol estimation, that is it employs the optimal linear receiver G (∆, i) given in (24).Then the instantaneous SINR of the estimated data symbols of the tagged user, γ(∆, i) is given as: where For the AR fading case considered in this paper, based on the definitions of b(∆, i), J(∆, i) and J(∆, i), the instantaneous SINR of the tagged user is then expressed as: B. Slot-by-Slot Deterministic Equivalent of the SINR as a Function of Pilot Spacing ∆ We can now prove the following important proposition that gives the asymptotic deterministic equivalent of the instantaneous SINR in data slot i, γ(∆, i), when the number of antennas N r approaches infinity.This asymptotic equivalent SINR gives a good approximation of averaging the instantaneous SINR of the tagged user [15], [29], [30].
Proposition 1.The asymptotic deterministic equivalent SINR of the tagged user in data slot i can be calculated as: where T(∆, i) is defined as: and δ m (∆, i) are the solutions of the following system of K equations for ∀m = 1, . . ., K.
The above system of K equations gives the deterministic equivalent of the SINR of the tagged user, and a different set of K equations must be used for each user.
Proof.The b k (∆, i) vectors are independent for k = 1 . . .K, and the covariance matrix of b k (∆, i) is Φ k (∆, i) (c.f. ( 23)).We can then express the expected value of the SINR of the tagged user as follows: The proposition is established by invoking Theorem 1 in [30], which is applicable in multiuser systems and gives the value of the deterministic equivalent of γ(∆, i) implicitly using a system of K equations and noticing that γ(∆, i) = δ 1 (∆, i), since δ 1 (∆, i) = tr Φ(∆, i)T(∆, i) according to (31).

C. Summary
This section established the instantaneous slot-by-slot SINR of a tagged user (γ(i)) of a MU-MIMO system operating over a fast fading channels modelled as AR processes, by applying our previous result obtained for discrete-time AR channels reported in [12].Next, we invoked Theorem 1 in [30], to establish the deterministic equivalent SINR for each slot, as a function of the frame size (pilot spacing) ∆, see Proposition 1.These results serve as a basis for formulating the pilot spacing optimization problem over the frame size and pilot power as optimization variables.

V. PILOT SPACING AND POWER CONTROL
In this section, we study the impact of pilot spacing and power control on the achievable SINR and the SE of all users in the system.The asymptotic SE of the i-th data symbol of user k is where γk (∆, i) denotes the average SINR of user k when sending the i-th data symbol, and when ∆ data symbols are sent between every pair of pilot symbols.Consequently, the average SE of user k over the (∆ + 1) slot long frame is which can be optimized over ∆.More importantly, the aggregate average SE of the MU-MIMO system for the K users can be expressed as: A. An Upper Bound of the Deterministic Equivalent SINR and the SE Let us assume that Q k = q k I Nr , that is the channel vector h k (t) consists of independent AR processes in the spatial domain, implying that: where q k is a scalar, q * k denotes complex conjugation, and let qk Re(q k ) < 0.
Note that the exponential approximation of the autocorrelation function of the fast fading process expressed in (36) is related to the Doppler frequency of Rayleigh fading through: where J 0 (.) is the zeroth order Bessel function [31].Based on the exponential approximation of this Rayleigh fading process in (36), the Doppler frequency of the approximate model is obtained from 2πf D i = Re(q * k i), i.e. f D = 2π/q k .To optimize (35), we first find an upper bound of SE k (∆, i) via an upper bound of γk (∆, i).To simplify the notation, the following discussion refers to the tagged user, and later we utilize that the same relations hold for all users.We introduce the following upper bound of γ(∆, i): where Z (u) (∆, i) and Φ (u) (∆, i) are given by with η being a constant, Σ Theorem 1.If q < 0 and with a e 2q then γ(∆, i) ≤ γ(u) (∆, i).
Proof.We prove the theorem based on the following inequal-ities γ(∆, i) which are proved in consecutive lemmas.
Lemma 3. Let A, B and C be positive definite matrices and D be any matrix, such that A B (i.e.B − A is a positive semidefinite matrix), then Lemma 4. For q < 0 and η satisfying (42), the following relation holds Proof.The proof is in Appendix A.
Having prepared with Lemma 3 and Lemma 4, we can prove the (a), (b) and (c) inequalities in (43) by Lemma 5 ((a) part) and Lemma 6 ((b) and (c) parts) as follows.
Lemma 5.The deterministic equivalent SINR of the tagged user satisfies Proof.The proof is in Appendix B. Lemma 6.When the conditions of Theorem 1 hold, we have Proof.When the conditions of Theorem 1 hold, Lemma 4 implies that Φ(∆, i) Using the first relation and the Lemma 3 gives (50), while using the second relation and Lemma 3 gives (51).

B. Useful Properties of the Upper Bounds on the Deterministic Equivalent SINR and Overall System Spectral Efficiency
Theorem 1 is useful, because it establishes an upper bound, denoted by γ(u) (∆, i), of the deterministic equivalent of the SINR, γ(∆, i).
To use the γ(u) (∆, i) upper bound for limiting the search space for an optimal γ(∆, i) in Section VI, we need the following properties of the upper bound.
Proof.The proof is in Appendix C.
Similarly, the SINR of user k satisfies the inequality γk (∆, i) ≤ γ(u) k (∆, i) where γ(u) k (∆, i) is defined in a similar way as γ(u) Since our most important performance measure is the overall SE, we are interested in establishing a corresponding upper bound on the overall SE of the system.To this end, we introduce the related upper bound on the SE of user k: and bound the aggregate average SE of the MU-MIMO system (c.f. ( 35)).Notice that the denominator in SE (u) k is ∆ while the denominator in SE k is ∆ + 1.This will be necessary for the monotonicity property in Proposition 3. Proposition 3.
Proof.The proof is in Appendix D.

C. Summary
This section first established an upper bound on the deterministic equivalent SINR in Theorem 1. Next, Proposition 2 and Proposition 3 have stated some useful properties of this upper bound and a corresponding upper bound on the overall system spectral efficiency.Specifically, Proposition 3 suggests that the upper bound on the spectral efficiency of the system is monotonically decreasing in ∆ and tends to zero as ∆ approaches infinity.As we will see in the next section, this property can be exploited to limit the search space for finding the optimal ∆.

VI. A HEURISTIC ALGORITHM TO FIND THE OPTIMUM PILOT POWER AND FRAME SIZE (PILOT SPACING)
A. A Heuristic Algorithm for Finding the Optimal ∆ In this section we build on the property of the system-wide spectral efficiency, as stated by Proposition 3, to develop a heuristic algorithm to find the optimal ∆.While we cannot prove a convexity or non-convexity property of SE(∆), we can utilize the fact that SE(∆) ≤ SE (u) (∆) as follows.As Algorithm 1 scans through the possible values of ∆, it checks Algorithm 1: Optimum frame size algorithm using an SE upper bound Input: Q, C, Σ, α 2 , P tot 8 Calculate E k (∆, i) using ( 11) Calculate Z k (∆, i) using (22) 10 Calculate Φ k (∆, i) using ( 23) Calculate β k (∆, i) using (25) 12 Calculate γk (∆, i) using ( 29) Calculate SE k (∆, i) using ( 33) is one less than the currently examined ∆ (Line 17).As it will be exemplified in Figure 7 in the numerical section, the key is to notice that the SE upper bound determines the search space of the possible ∆ values, where the associated SE can possibly exceed the currently found highest SE.Specifically, the search space can be limited to (Line 18): where SE (u) −1 denotes the inverse function of SE (u) (.) and SE ∆ SE(∆) as calculated in (35).

B. The Case of Independent and Identical Channel Coefficients
In the special case where the elements of the vector h(i) are independent stochastically identical stochastic processes, the covariance matrices become real multiples of the identity matrix e(i) In this special case, calculating the deterministic equivalent of the SINR by Proposition 1 simplifies to solving a set of scalar equations as stated in the following corollary.
Corollary 3. In this special case, the deterministic equivalent of the SINR in slot i, γ(i), can be obtained as the solution of the scalar equation Proof.Since the matrices Φ k (i) and Z k (i) are constant multiple of identity matrices, (31) can then be rewritten as for k = 1, . . ., K. Using γ(i) = δ 1 (i) and comparing (62) for different values of k we get Substituting the rightmost expression of (63) into (62) with k = 1 and rearranging gives the corollary.

VII. NUMERICAL RESULTS
In this section, we consider a single cell of a MU-MIMO (K = 2) system with N r = 10 and N r = 100 receive antennas, in which the wireless channel between the served MS and the BS is Rayleigh fading according to (37), which we approximate with (36).
The MU-MIMO case with greater number of users (K > 2) gives similar results albeit with somewhat lower SINR values from the point of view of the tagged user.The BS estimates the state of the wireless channel based on the properly (i.e.∆×T ) spaced the pilot signals using MMSE channel estimation and interpolation according to Lemma 1, and uses MMSE symbol estimation employing the optimal linear receiver G (iT ) in each slot as given in (24).Specifically, except for the results shown in Figure 8, in each time slot i = 1 . . .∆, the BS uses one pilot signal transmitted by the MS at the beginning of the frame at time instance i = 0 and one pilot sent at the beginning of the next frame at time instance i = ∆ + 1.We refer to these two pilot signals as sent "before" and "after" time slot i.In practice, the BS can store the received data symbols until it receives the pilot signal in slot i = ∆ + 1 before using an MMSE interpolation of the channel states between i = 0 and i = ∆ + 1.Furthermore, we will assume that the BS estimates perfectly the autocorrelation function of the channel, including the associated maximum Doppler frequency and, consequently, the characterizing zeroth order Bessel function.The most important system parameters are listed in Table III.Here we assume that the slot duration (T ) corresponds to a symbol duration in 5G orthogonal frequency division multiplexing (OFDM) systems using 122 MHz clock frequency, which can be used up to 20 GHz carrier frequencies [33].Note that the numerical results presented below are obtained by using the results on the deterministic equivalent of the SINR and the corresponding average spectral efficiency.
Figure 2 shows the achieved spectral efficiency averaged over the data slots i = 1 . . .∆, that is averaged over the data slots of a frame of size ∆ + 1.Short frames imply that Figure 3. Spectral efficiency for each data slot i = 1 . . .∆ when the frame size is kept fixed (∆ = 50).At high maximum Doppler frequency, the spectral efficiency is low at the "middle" slot, while at low maximum Doppler the spectral efficiency reaches its maximum at the middle slots.
the pilot overhead is relatively large, which results in poor spectral efficiency.On the other hand, too large frames (that is when ∆ is too large) make the channel estimation quality in the "middle" time slots poor, since for these time slots both available channel estimates ĥ(0) and ĥ(∆ + 1) convey little useful information, especially at high Doppler frequencies when the channel ages rapidly.Indeed, as seen in Figure 2, the frame size has a large impact on the achievable spectral efficiency, suggesting that the optimum frame size depends critically on the Doppler frequency.As we can see, the spectral efficiency as a function of the frame size is in general neither monotone nor concave, and is hence hard to optimize.
Figure 3 shows the spectral efficiency for each data slot i = 1 . . .∆ within a frame of size ∆ = 50.At lower Doppler frequencies, that is when the channel fades relatively slowly, the channel state information acquisition in the middle slots benefits from using the estimates at i = 0 and i = ∆ + 1, and making an MMSE interpolation of the channel coefficients as proposed in Lemma 1.However, at a high Doppler frequency, the channel state in the middle data slots are weakly correlated with the channel estimates ĥ(0) and ĥ(∆ + 1), which makes the MMSE channel estimation error in Corollary 1 large.This insight suggests that in such cases, the optimum frame size is much less than when the Doppler frequency is low.
The average spectral efficiency as a function of the pilot/data power ratio and the frame size is shown in Figure 4.This figure clearly shows that setting the proper frame size and tuning the pilot/data power ratio are both important to maximize the average spectral efficiency of the system.The optimal frame size and power configuration are different for different Doppler frequencies, which in turn emphasizes the importance of accurate Doppler frequency estimates.
The optimal frame size as a function of the maximum Doppler frequency is shown in Figure 5.The optimal frame size decreases rapidly, as the Doppler effect increases.As this figure shows, a much larger frame size is optimal when the number of antennas is high and the MS uses high pilot power to achieve a high pilot signal-to-noise ratio (SNR).Figure 6 shows the achieved spectral efficiency when the frame size is set optimally, as a function of the maximum Doppler frequency.At f D = 500 Hz, for example, when the optimal frame size is 8 (see also Figures 2 and 5), the achieved spectral efficiency when using N r = 10 antennas is a bit below 1 bps/Hz.We can see that setting the optimal frame size is indeed important, because it helps to make the achievable spectral efficiency quite robust with respect to even a significant increase in the Doppler frequency.
Figure 7 illustrates the upper bounds on spectral efficiency as a function of the frame size for different Doppler frequencies.Recall from Figure 2 that the spectral efficiency of the system is a non-concave function of the frame size.Therefore, limiting the possible frame sizes that can optimize spectral efficiency is useful, which can be achieved by the upper bounds shown in the figure.Since the upper bound is monotonically decreasing, finding a point of the spectral efficiency curve (see the curve marked with f D = 500 Hz and its upper bounding curve) with a negative derivative helps to find the range of possible frame sizes that maximize spectral efficiency.For f D = 500 Hz, as illustrated in the figure, larger frame sizes than ∆ = 41 would lead to a lower upper bound than the spectral efficiency achieved at ∆ = 7.Therefore, when searching for the optimal Figure 6.Optimal spectral efficiency as a function of the maximum Doppler frequency, that is the spectral efficiency when using the optimal frame size as shown in Figure 5. ∆, once we found that the spectral efficiency at ∆ = 8 is less than at ∆ = 7 (negative derivative), the search space is limited to (7,41).
Figure 8 compares the average spectral efficiency when the system uses different number of pilot signals to estimate the channel state for each data slot within the frame.Specifically, three schemes are compared: • 2 before, 1 after (2b, 1a): Three channel estimates using the pilot signals at the beginning of the current frame and the preceding frame and at the end of the current frame are used to interpolate the channel state at every data slot in the current frame.• 1 before, 1 after (1b, 1a): The two neighboring pilot signals (that is in the beginning and at the end of the current frame) are used.• 2 before (2b): The pilot signals at the beginning of the current and preceding frames are used.This scheme has an advantage over the previous schemes in that decoding the received data symbols is possible "on the fly" without having to await the upcoming pilot signal at the end of the current frame.
Notice that the "1b, 1a" scheme outperforms the "2b" scheme, because the channel estimation instances are closer to 1500 Hz, when using 1 or 2 pilot symbols preceding that time slot and 0 or 1 pilot symbols after that time slot for channel estimation.Three combinations of these channel estimation schemes are denoted as "2b, 1a", "1b, 1a" and "2b", where "b" refers to utilizing the pilot symbols sent before and "a" refers to utilizing the pilot symbol sent after time slot i. the data transmission instance in time.Furthermore, the "2b, 1a" scheme further improves the SE performance, although this improvement over the "1b, 1a" scheme is marginal.More importantly, we can observe that the optimal pilot spacing is similar in these three schemes, but depends heavily on the Doppler frequency.
Finally, Figure 9 examines the negative impact of Doppler frequency estimation errors when the Doppler frequency of the channel is under or overestimated.The figure shows the spectral efficiency as a function of the frame size for the cases when f D = 200 Hz and f D = 500 Hz.For both cases, the Doppler frequency is either correctly estimated or overestimated (to 5f D ) or underestimated (to 0.2f D ).On the one hand, this figure clearly illustrates the performance degradation in terms of average spectral efficiency when the receiver underestimates or overestimates the maximum Doppler frequency.On the other hand, when using the optimal frame size, the spectral efficiency performance of these schemes are rather similar in most cases.

VIII. CONCLUSIONS
This paper investigated the fundamental trade-off between using resources in the time domain for pilot signals and data signals in the uplink of MU-MIMO systems operating over fast fading wireless channels that age between subsequent pilot signals.While previous works indicated that when the autocorrelation coefficient between subsequent channel realization instances in discrete time is high, both the channel estimation and the MU-MIMO receiver can take advantage of the memoryful property of the channel in the time domain.However, previous works do not answer the question how often the channel should be observed and estimated such that the subsequent channel samples are sufficiently correlated while taking into account that pilot signals do not carry information bearing symbols and degrade the overall spectral efficiency.To find the optimal pilot spacing, we first established the deterministic equivalent of the achievable SINR and the associated overall spectral efficiency of the MU-MIMO system.We then used some useful properties of an upper bound of this spectral efficiency, which allowed us to limit the search space for the optimal pilot spacing (∆).The numerical results indicate that the optimal pilot spacing is sensitive to the Doppler frequency of the channel and that proper pilot spacing has a significant impact on the achievable spectral efficiency.

APPENDIX A PROOF OF LEMMA 4
Proof.Notice that and the eigenvalues of M 3 (∆, i) are: where 0 < a < 1.For all i ≥ 1 the smallest eigenvalue is λ 2 (i), which monotone increases with i.That is, When η < λ 2 (1) according to (42), we have Utilizing that the spectrum of a Kronecker product σ for ∀i ≥ 1, we further have which implies according to (44).The statement of the lemma comes from (69) using (45), M (u) (∆) = ηI 3 ⊗ C, and noting that where R(i) and ρ(∆, i) are defined in (36) and (41).

APPENDIX D PROOF OF PROPOSITION 3
Proof.From Theorem 1 and (52) the inequality follows.For monotonicity, notice that ρ k (∆ + 1, i) < ρ k (∆, i) and ρ k (∆ + 1, i + 1) < ρ k (∆, i).Since by Proposition 2 the upper bound of the SINR is increasing with ρ k we have From which it follows that SE that is SE k (∆) is decreasing in ∆.
We show that for any ε > 0, there is some M such that Due to qk < 0, we have ρ k (∆, i) < ρ k (1, 1), which implies for all ∆ and i.Let A log 1 + γ(u) k (1, 1) and N such that N ε − 2A > 0, and set Since qk < 0, we have ρ k (∆, i) < 3 max(e 2q k (∆+1+i) , e 2q k i , e 2q k (∆+1−i) ) = 3e 2q k min(∆+1+i,i,∆+1−i) , and it follows that for ∆ N ≤ i ≤ (N −1)∆ N ρ k (∆, i) < 3e 2q k ∆ N . (81) Notice that by equation (77) we can choose some large M , such that We can now show that when M = ∆, then SE (u) k (∆) < ε.To this end, we split up the sum in the numerator of (76), that is ∆ i=1 log(1 + γ(u) (∆, i)), into three terms, and bound the first and third terms using the general upper bound A, and the middle term by : SE where the last equation is due to the definition of in (80), which completes the proof.
tr D H AD ≤ tr D H BD (45) tr (AC) ≤ tr (BC) (46) tr CA −1 ≥ tr CB −1 .(47) Proof.A −1 B −1 is given in [32, p. 495, Corollary 7.7.4(a)].(45) follows from the fact that D H (B − A)D is a positive semidefinite matrix since B − A is a positive semidefinite matrix and for any x x H D H (B − A)Dx = y H (B − A)y ≥ 0 (48) where y Dx.Let C = D H D be the Cholesky decomposition of C then (46) and (47) follows from (45), by utilizing the cyclic property of the trace operator.

2 Figure 2 .
Figure 2. Spectral efficiency as a function of frame size (∆) with maximum Doppler frequency f D = 50, 500, 1500 Hz with Nr = 10 (lower three curves) and Nr = 100 (upper three curves).At higher maximum Doppler frequency, the optimum frame size is smaller than at low Doppler frequency.

Figure 4 .
Figure 4. Spectral efficiency as a function of the pilot/data power ratio and the frame size for maximum Doppler frequency f D = 500 Hz and f D = 1500 Hz when Nr = 10.In both cases, the spectral efficiency depends heavily on the employed pilot power and pilot spacing (frame size).

Figure 5 .
Figure 5. Optimal frame size as a function of the maximum Doppler frequency for different values of the employed pilot power (Pp = 50 mW and Pp = 125 mW) when the BS is equipped with Nr = 10 and Nr = 100 receive antennas.

Figure 7 .
Figure 7. Upper bounding the achievable spectral efficiency as a function of the frame size (∆) at f D = 500 Hz and f D = 1500 Hz.Note that the upper bound is monotonically decreasing, which helps to limit the search space for the optimum frame size.

Figure 8 .
Figure 8. Spectral efficiency in each time slot for f D = 500 Hz and f D = 1500 Hz, when using 1 or 2 pilot symbols preceding that time slot and 0 or 1 pilot symbols after that time slot for channel estimation.Three combinations of these channel estimation schemes are denoted as "2b, 1a", "1b, 1a" and "2b", where "b" refers to utilizing the pilot symbols sent before and "a" refers to utilizing the pilot symbol sent after time slot i.

Figure 9 .
Figure 9. Spectral efficiency as a function of the frame size ∆ when the receiver under or overestimates the actual Doppler frequency of the channel (f D = 200 Hz and f D = 500 Hz).Overestimating the actual Doppler frequency causes significant spectral efficiency degradation for most frame sizes.