Differential Downlink Transmission in Massive MU-MIMO Systems

In this paper, a differential downlink transmission scheme is proposed for a massive multiple-input multiple-output (MIMO) system without explicit channel estimation. In particular, we use a downlink precoding technique combined with a different encoding scheme to simplify the overall system complexity. A novel precoder is proposed, which, with a large number of transmit antennas, can effectively precancel the multiple access interference (MAI) for each user, thus enhancing the system performance. Maximizing the worst case signal-to-interference-plus-noise ratio (SINR) is used to optimize the precoder for the users in which full power space profile (PSP) knowledge is available to the base station (BS). In addition, we provide two suboptimal solutions based on the matched and the orthogonality approach of the PSP to separate the data streams of multiple users. The decision feedback differential detection (DFDD) technique is employed to further improve the performance. The proposed schemes eliminate the MAI, enhance system performance, and achieve a simple low complexity transmission scheme. Moreover, transmission overheads are significantly reduced using the proposed scheme, since it avoids explicit channel estimation at both ends. The Monte Carlo simulation results demonstrate the effectiveness of the proposed schemes.


I. INTRODUCTION
Multiple-input multiple-output (MIMO) technology helps in improving wireless multiple access and can be used to increase the spectral efficiency and improve the link reliability at low power operation [1], [2].With multiple transmit antennas at the base station (BS), the system can spatially multiplex multiple data streams for multiple users at the same frequency and time.The spatial multiplexing property becomes more effective as the number of antennas becomes large where the system is referred to as massive MIMO [3].Such properties render a massive MIMO architecture as an important part of many wireless communications standards, such as LTE and 5G networks.
Much of the research on MIMO downlink transmission designs assumes perfect channel state information (CSI) at the transmitter.The availability of CSI at both ends The associate editor coordinating the review of this manuscript and approving it for publication was Pietro Savazzi.makes it possible for the system to eliminate the multiple access interference (MAI) between users.However, due to various reasons, such as pilot contamination from training sequence reuse in massive MIMO, perfect CSI estimation is unattainable [4].In [5], the authors proposed a framework that uses the block diagonalisation method to cancel MAI between users.The proposed method provides a substantial gain in terms of spatial diversity with a low decoding complexity.However, for the decoding process, each user still needs to know the channel in order to decode the information signal coherently.
The authors in [6] proposed a downlink spreading scheme combined with differential detection (DD) to eliminate the need of estimating the CSI at the BS and users.The scheme provides both low complexity transceivers and good performance.However, for large number of users, the proposed scheme in [6] does not provide a comprehensive high rate differential scheme in a downlink scenario due to the long length of the spreading code.In [7], the authors proposed a full rate downlink algebraic transmission scheme combined with a differential space-time scheme.The proposed scheme provides a full rate full diversity system and does not require any knowledge of the CSI to separate the data streams of multiple users.In this approach, however, the BS typically employs only a few antennas, and thus the corresponding improvement in spectral efficiency and system simplicity is still relatively modest.
In order to improve the spectral efficiency and to simplify the required signal processing, a massive MIMO downlink system is employed [8], [9], where the BS is equipped with a very large number of transmit antennas.In practice, the demodulation reference signals (DM-RS) are used to support channel estimation and data demodulation.In DM-RS, the estimation of the channel for coherent detection is often obtained by training and tracking, e.g. using reference signals (RS), or pilots.However, it is not always feasible to use training-based schemes, with systems that have a large number of antennas.As the number of transmit antennas grows large such as in the case of massive MIMO, the channel estimation process, system overheads, latency, and power consumption will grow proportionately [10].Discussion of DM-RS improvements are ongoing in 3GPP release 15 standardization [11], hence, it is natural to adopt differential modulation with massive MIMO to reduce the overhead and latency of DM-RS.
A well-established method to enhance DD is multiple symbol differential detection (MSDD).The authors in [12] point out a 3dB performance improvement simply by demodulating the received symbols jointly as a block, instead of one at a time using the MSDD detection technique.The authors of [13], [14] developed MSDD detection for the uplink MIMO system in ultra-wideband (UWB) systems.Essentially, the authors in [14] adopted decision feedback differential detection (DFDD) for a massive MIMO system, as this approach improves the performance of MSDD.However, the multiuser transmission scheme in [14] suffers from severe MAI without a proper precoding design scheme.Furthermore, prior research on MSDD and MAI cancellation has mainly been focused on uplink transmission, where cancellation was implemented at the BS receiver, and therefore complexity was not a significant concern [13], [14].For downlink transmission, however, interference cancellation at end users increases receiver complexity, and for this reason, we account for interference cancellation at the BS instead of receivers.In particular, in order to have low complexity receivers, it is assumed that the transmitted signals are precoded at the BS.
In this paper, we therefore propose a differential MIMO downlink transmission framework, in which a BS is equipped with a massive antenna array that precodes transmit signals to separate the data streams of multiple users.In particular, to achieve a low-complexity differential massive MIMO system, a novel downlink precoding design is proposed by employing knowledge of the power space profile (PSP) of users.It is assumed that the PSP for each user is estimated at the BS, since it can tolerate more complexity compared to receivers.Once the PSPs are estimated at the BS, the transmitter computes the precoder.More precisely, we provide an optimal solution for the precoder based on a max-min signalto-interference-plus-noise ratio (SINR) problem formulation.The optimized precoder can effectively precancel the interference between users, thus enhancing overall system performance.In addition, we provide two suboptimal solutions suitable for the low interference system based on the matched and the orthogonality approach of PSP of each user.The proposed schemes facilitate precanceling MAI, enhance system performance, and provide simple transmitter and receiver schemes.Consequently, since the proposed scheme avoids channel estimation, the system overheads and latency will be reduced significantly.
The remainder of this paper is organized as follows.Section II introduces the system model of the differential massive MIMO system.Section III describes the downlink transmit precoding approach.Section IV presents differential detection for a massive MIMO system.In Section V, simulation results are shown.Finally, conclusions are drawn in Section VI.
Notations: Vectors are denoted by boldface lower case letters and matrices by boldface upper case letters.I n , 1 n , and 0 m×n denote an n × n identity matrix, n × 1 identity vector and m×n zero matrix, respectively.The operators (.) T , (.) * (.) H , trace(.), log(•), log 2 (•), |•|, • F , and diag(•) denote transpose, complex conjugate, conjugate transpose, trace of a matrix, natural logarithm, logarithm to base 2, absolute value of a scalar, Frobenius norm of a matrix, and diagonal of a matrix, respectively.(•), (•) denote real and imaginary part of a complex number, respectively.{x n } denotes a set of all vectors indexed by n.C m×n and R m×n denote the set of all complex and the set of all real m × n matrices, respectively.E denotes the expected value of a discrete random variable.cov(x, y) and var(x) denote the covariance between the random variables x and y, and the variance of x, respectively.

II. SYSTEM MODEL
Consider a single-cell massive MIMO downlink broadcast channel.The BS has n t transmit antennas, which simultaneously transmit multiple streams to K single-antenna users, as shown in Fig. 1.The number of transmit antennas is assumed to be very large1 (n t 1).We assumed that all users are equipped with a single-antenna for the decoding process which is a realistic assumption for the massive MIMO system, where the large number of transmit antennas at the BS provides a mutual orthogonality among the vector-valued channels to the users (so-called favorable propagation) [16].For downlink massive MIMO transmission, multiple-antennas at each user increases receiver complexity and overhead.Instead, we would like to have a simple, inexpensive, and power efficient single-antenna receiver.Further, equivalent capacity can be achieved by serving K single-antenna users instead of one user having K -multiple-antennas users, thereby serving more users in the cell [16].

A. DIFFERENTIAL MASSIVE MIMO SYSTEM MODEL
For any kth user, is the information vector with elements drawn from an M -ary PSK constellation as: where N denotes the block length of the coherence time intervals.In the context of differential massive MIMO system, a sequence of symbols of the kth user s k,τ , 1 ≤ τ ≤ N , is differentially encoded into the transmit symbol vector b k ∈ C 1×(N +1) via the rule The transmit information signal vector b k comprises the initial reference symbol b k,0 = 1 and the following We consider that no prior information about the channel is available at the BS.The channel vector between the BS and user k, , models independent fast fading and slow fading PSP attenuation, where the PSP is denoted as g k,m , 1 ≤ m ≤ n t , we specify PSP later in Section II-B.It is assumed that the channel coefficients remain constant over the block length and vary independently from one block to another.The coefficient h k,m can be written as where hk,m is the fast-fading coefficient from the kth user to the mth transmit antenna of the BS, which is modeled as an independent over m and identically distributed (i.i.d.) complex Gaussian random variable with zero-mean and unitvariance, i.e., hk,m ∼ CN (0, 1).g k,m models the PSP attenuation between the mth antenna at the BS and user k, which is assumed to be independent over m and to be constant over many coherence time intervals N and known a priori to the BS.We consider that the value of hk,m remain stationary for a sufficiently long transmission time.Then, we have where , and Therefore, the variance of {h k } is determined by the user PSP, where the channel variance is equal to the power profile, i.e., h k,m ∼ CN (0, g k,m ).
We assume that the multiuser system adopts a linear transmission and reception strategy.The BS performs transmit beamforming and communicates simultaneously with all users.The instantaneous transmitted signal matrix B ∈ C n t ×(N +1) for the kth user can then be expressed as where is the normalized differential transmit precoder (beamformer) of the kth user, where u k 2 = 1.p k is the downlink average transmit power of the kth user.We consider a total power constraint at the BS is The received signal vector y k ∈ C 1×(N +1) for the kth user is given by where the term is the MAI component against the kth user, and z k ∈ C 1×(N +1) is the noise vector modeled as zero-mean complex circularly symmetric Gaussian random variables, i.e., z k ∼ CN (0, σ 2 z k I n t ).Assuming that the information transmitted symbols b k are uncorrelated, the average SINR k at the kth user can be expressed as follows

B. CO-LOCATED ANTENNA SYSTEM WITH A GEOMETRICAL MODEL
Now, we summarize the PSP model construction following the approach in [14].As shown in Fig. 2, the users are randomly distributed in front of a large uniform antenna array at the BS.We assume that the BS has full knowledge of any user's location information.The location of the users is determined by the following parameters: r k,m is the distance between the antenna index m and the kth user; l k is the direct orthogonal distance between the kth user and the array; and, l a is the antenna spacing. 2 Let m k denote the antenna element closest to the kth user according to the Euclidean distance.
From algebraic geometry, we have where l k,r def = l k /l a denotes the normalized relative distance of the kth user to the array.We assume that the average transmit power obeys the path loss model with path loss exponent γ .Hence, the path loss for the kth user at antenna m is given by Using exponential and logarithmic properties, we have 2 .
Since log(1 + x) ≈ x for small x, we have where Therefore, the PSP is well approximated by a Gaussian function with mean m k and channel variance ζ 2 k .[17]- [19].In the uplink, we assume that each user transmits N (i.i.d.) symbols as

Remark 1: In practical systems, the power space profile (PSP), g k,m , of each user (which includes the path loss exponent) varies very slowly with time compared to the fast fading coefficients hk,m . In this context, for massive MIMO systems, it is reasonable to assume that the PSPs of the users of the system are known at the BS
For each symbol the BS can calculate the PSP for each user accurately by averaging the received uplink signal over different data slots indexed by τ as [17] Ĝk where r k,τ is the uplink received signal vector for the kth user at the antennas of the BS during the τ th time slot, which is given as Hence, with the assumption of channel reciprocity, the PSP is calculated for each user during the uplink as in (14), which is assumed to be equivalent to the PSP in the downlink.Note that PSP profile estimation is less challenging than estimating the actual channel state information, in which the PSP can remain constant over many coherence time intervals.However, the actual estimation process of g k,m is beyond the scope of this paper, thus we assumed the PSP profiles to be (perfectly) known in our system.

III. DOWNLINK TRANSMIT PRECODING
In massive MIMO, transmit precoding is used to cancel inter-user interference.Conventional transmit precoding design requires channel knowledge at the transmitter.However, in massive MIMO, the number of transmit antennas is very large, i.e., n t 1.Hence, the estimation of all channel coefficients h k,m quickly becomes unfeasible.Instead, differential transmit precoding schemes could be considered which avoid the need for explicit channel estimation.After estimating the PSP profile g k,m at the BS, we use this knowledge to design the transmit precoder for each user k to separate different users.Now, we present an asymptotic analysis of SINR and the proposed precoder design strategies for the differential massive MIMO framework.

A. ASYMPTOTIC ANALYSIS OF SINR
As a consequence of employing large number of antennas at the BS n t → ∞ (as our case of massive MIMO), the downlink channel vectors of independent users have a large degree of orthogonality, i.e., The orthogonality between different user's channels is determine by the orthogonality between the small fading vectors { hk }, and the orthogonality between the PSPs {g k }.Theorem 1: From the law of large random numbers and under the most favorable propagation conditions, where the VOLUME 7, 2019 column-vectors of the propagation vectors are asymptotically orthogonal, the expected value of SINR k can be calculated when n t → ∞.Since {h H k } has Gaussian distribution with zero-mean and covariance of G k = diag g k , hence the desired signal √ p k h H k u k is also Gaussian distributed with zero mean and variance p k n t m=1 g k,m u 2 k,m , where the sum of multiple Gaussian variables is also a Gaussian variable.Similarly, the interference component of SINR k is also a Gaussian signal with variance K q=1,q =k p q n t m=1 g k,m u 2 q,m .The variance for the AWGN noise is σ 2 z k .Therefore, the expected value of SINR k as n t → ∞ is Proof: See Appendix I

B. SUBOPTIMAL PRECODERS 1) MATCHED PSP PRECODER
The first precoder design strategy is to match the beamformer vector to the PSP profile of the transmit antennas to separate different users, i.e., u 2 k,m ∼ g k,m , which can be written as where we assume the BS has full knowledge of the channel parameter ζ k , and the antenna index m k which is the closest to the user k with maximum average power.For the power allocation in matched PSP scheme, we allocate the downlink transmit power equally between users, i.e., p k = P/K .

2) ORTHOGONAL PSP PRECODER
In the orthogonal PSP precoder, the beamformer for each user has to be distinguished and identified from other users.In the orthogonal precoder scheme, each user is assigned a unique orthogonal PSP to enhance data separation between users.The orthogonal PSP for each user is then multiplexed by its own power profile.The orthogonal precoder for each user can be constructed using the Gram-Schmidt process (GSP).Let the represent the user's PSP vector.The elements of vector v k are computed by matching their value to the power profile of the transmit antennas, i.e., v 2 k,m ∼ g k,m .The Gram-Schmidt process takes a finite, linearly independent set which spans the same K -dimensional subspace of R n t ×1 as S. We define the projection operator as [20] where v, v denotes the inner product of the vectors v and v, i.e., v, v = vT v for vectors in R n t ×1 .The Gram-Schmidt process then works as follows: Note that, the Gram-Schmidt precoder for the first user is equal to the original PSP for the first user, i.e., v1 = v 1 , hence the user's separation works only for the received signal of the first user.To enhance the separation for the received signal of other users, we multiply each element of the orthonormal vector vk by its own specific original power profile elements of v k and then normalize them, which yields where • denotes the Hadamard product.For power allocation in orthogonal PSP precoder, we allocate the downlink transmit power equally between users, i.e., p k = P/K .

C. OPTIMAL PSP PRECODERS
In this precoder, we consider the joint optimization of power and downlink precoder for the PSP among all users simultaneously using the max-min formulation problem.A maxmin formulation guarantees a fair quality of service among all users.

1) SINR OPTIMAL PSP PRECODER
In optimal PSP precoder, we maximize the worst case SINR jointly among all user.Starting from (17), the corresponding optimization problem can be written as maximise subject to Problem ( 22) can be recast as maximise subject to where f k and c k are defined as and It can be seen that the cost-function in (23a) is non-linear and non-convex over the optimization variables p k , and In the following, we provide optimal solutions for the design problems.The feasibility of problem ( 23) can be examined by solving it with the objective function replaced by constant values, i.e., finding a common domain which satisfies all problem constraints.Without loss of generality, we assume that our problem is feasible.Next, we solve our optimization problem optimally through recasting the non-convex constraints.Now, let's define a Kn t ×1 vector v as In addition, let's define other variables w k and wk of size Kn t × 1 as and where 0 m×1 denotes an m×1 vector whose elements are zero.
Next, the SINR k optimization problem in (23) may be written in a more convenient form by using ( 26), (27), and (28), which yields subject to To convexify the cost-function (29a), which comprises a product of fractional terms, we substitute the numerators and denominators of the fractions by exponential variables as follows [21] e Then, by using the properties of the exponential and according to (30a) and (30a), the problem in (29) can be formalized as maximise subject to It can be seen that the exponential parameters e α k and e αk in (32d) and (32e) are constrained by the expressions on the right hand sides of (30a) and (30a), respectively.The objective function in (32a) consists of an exponential function which is non-convex, and thus we can linearize it using the monotonicity property of the exponential function.Hence, the objective function in (32a) can be defined as follows Next, to deal with the non-convex constraint (32e), we linearize the exponential term e αk using the first order Taylor approximation as follows [22] e αk = e αk 1 + αk − αk , ∀k, where αk is the point where the linear approximation is made.Therefore, from (33) and (34), problem (32) can be reformulated as subject to Now the above problem (35) is convex and can be solved iteratively using CVX optimization software [23].The initial value of αk is updated by the optimized value of αk , ∀k, obtained in the previous iteration.The iterations continue until the error, K k=1 | αk − αk |, converges to a certain threshold.Algorithm 1 is provided to solve the above optimization function.Here solve problem (35) using CVX and calculate v [i] , α [i] , α[i] .7: until Convergence.8: end while 9: Find u k and p k of each user from v as in (25)

and (26).
Remark 2: There is an alternative approach for designing the transmit precoder based on maximizing the worst case of signal-to-leakage-noise ratio (SLNR).The SLNR is defined as the ratio of received signal power at the desired user to received signal power at the other users (the leakage) [24].
The average SLNR k at the kth user can be expressed as The proof of ( 37) is similar to the proof of ( 17) in Theorem 1.
The optimization solution for maximizing the worst case SLNR of ( 37), (max-min SLNR), jointly among all users provides the same performance as in the proposed optimal PSP SINR precoder, (max-min SINR).

D. COMPUTATIONAL COMPLEXITY ANALYSIS FOR THE PSP PRECODERS
In this section, we quantify the computational complexity for the proposed PSP precoders for the optimal and the suboptimal solutions.The computational process is done based on the size of input data, the floating point operation (FLOPs), the type of the optimization problems, the number of the required iterations, and the methods used in finding the solution.

1) COMPLEXITY OF SUBOPTIMAL SOLUTIONS
The notion of FLOPs is introduced.We use the total number of FLOPs to measure the computational complexity of matrix operations.We summarize the total FLOPs needed for some matrix operations below [25]: • Multiplication of m × n and n × p complex matrices: O(8mnp − 2mp); • Inversion of an m × m real matrix using Gauss-Jordan elimination: O(4m 3 /3).
• Hadamard product for two m × m matrices: O(m).
According to the aforementioned summary of FLOPs operations, the computational complexity of the suboptimal PSP precoder is 2) COMPLEXITY OF OPTIMAL SOLUTION Now, we calculate the complexity of optimizing the downlink PSP precoder which is formulated as a linear programming (LP) problem in (35).The computational complexity of such LP problems has been studied in Chapter 6 in [26] where the complexity is calculated in terms of the number of optimization variables n, number of constraints m and the size of input data dim(p), where p is the vector of input data.To apply the complexity evaluation steps given in chapter 6 in [26], problem (35) is recast into its standard LP form.This can be achieved by replacing the min operator in the objective function by new slack variable π and K scalar constraints (see (39d)).Therefore, (39) is an equivalent and standard LP recast of the original problem (35).Note that the constraint (35d) is linearized in a similar way as used for (35e) since the used CVX's solvers such as SDPT3 and SeDuMi do not support the exponential function. maximise Problem (39) contains n = (n t +2)K +1 scalar variables, m = (n t + 3)K scalar constraints, and require the input data vector p = [n, m, w T 1 , . . ., w T K , wT 1 , . . ., wT K , α1 , . . ., αK , α1 , . . ., αK ].According to these problem parameters, the complexity of achieving a per-iteration solution within the an accuracy is [26] O( 1) where O( 1) is the complexity of a real operation.According to (40) and the aforementioned problem parameters, the periteration complexity asymptotically (as n t , K → ∞ and n t K ) converges to Obviously, from (38) and (41), the optimal PSP precoder has lower computational complexity than the suboptimal PSP precoders, where the main parameters are the total number of users K and the total number of transmit antennas n t .We will explore more on the comparison between (38) and (41) in Section V.

IV. DIFFERENTIAL DETECTION FOR MASSIVE MIMO WITH DOWNLINK TRANSMISSION
In this section, the differential encoding and decoding process for the downlink transmission in a massive MIMO system is discussed.Here, we assume that neither the transmitter nor the receiver has prior knowledge of the CSI.

A. MULTIPLE SYMBOLS DIFFERENTIAL DETECTION
The simpler suboptimal method of implementing DD with massive MIMO is to encode the transmitted data differentially and to decode only the last two consecutive received symbols, e.g.N = 2, without any knowledge of the CSI.In contrast, the optimal method is to decode a block of N consecutive information symbols jointly without any knowledge of the CSI by performing MSDD, e.g.N 2, which results in a 3dB performance improvement compared to DD3 [12], [13].
In MSDD for the downlink system, the differential transmissions are implemented in blocks, in which each user k receives the sum of all the transmit waveforms of other users; then, the received signal blocks for each user must be detected independently.The measurements at the receiver are collected by spatial autocorrelation, and then we resort to the generalized likelihood ratio test (GLRT) optimization criterion whereby the maximization of the likelihood function is performed not only over the unknown symbols but also over unknown channels [27].
When using the M -ary PSK constellation, the MSDD detection problem can be simplified as [13] bk = arg max where ) is the autocorrelation matrix of the received signal comprised of the correlation coefficients y k,l,τ , τ = 1, . . ., N , l = 0, . . ., τ − 1, between the lth and the τ th received differential signals.As we are interested in the information symbols s k , it can be seen that s k is directly obtained as In (42), the differential decoder uses one side of the complex-conjugate symmetry of the correlation coefficients, thus y k,l,τ = y * k,τ,l .Further, the diagonal elements of Y k can be neglected as they do not influence the decision metrics, i.e., y k,l,l = y k,τ,τ = 0.

B. DECISION FEEDBACK DIFFERENTIAL DETECTION
In order to improve the performance further, DFDD is adopted in this paper.This approach leads to better performance compared to MSDD.Different from [14], we construct the DFDD for the downlink transmission instead of the uplink.In DFDD, the decisions are made successively, adding all previous decisions in the decision of the current symbol.In this decoding algorithm, the decoder detects symbols one by one.After finding the best candidate for the first symbol, the effects of this symbol in all of the receiver equations are added and considered.Then, the second symbol is detected from the new sets of equations.The effects of the first and second detected symbols are added and then considered to derive a new set of equations.The process continues until all symbols are detected.Of course, the order in which the symbols are detected will impact the end solution.The algorithm includes three steps, i.e. decision, process, and ordering.

1) DECISION PROCESS
From the description given above and starting with b k,0 = 1, the decision process means that the information symbols in (43) are detected one by one as where and quantizes the phase of a complex number x ∈ C to the M phase values of M -ary PSK, and computes the quantization error, respectively.The operation x in (47) takes as input a real number x and gives as output a reduction into the interval [π, 2π ].The purpose of this step is to decide which transmitted symbol to detect at each stage of the decoding.

2) OPTIMUM DECISION ORDERING
It is well known from decision feedback equalization in MIMO systems, also known as BLAST [28], that sorting the decisions in an optimized order improves performance.The symbol with lowest quantization error in (48) is the best in this step.The decision order can be achieved by reordering the columns and rows of the Y k matrix.That is, we first denote the index for the best transmitted symbols in the Y k matrix by ( τ0 , τ1 , Then, we define the symbols transmitted in the τi th index by ).Now, we start by setting the initial transmitted symbol to identity, i.e., b k, τ0 = 1.Then, the first decided symbol should be the τ1 th symbol, where and the estimate for the b k, τ1 symbol is obtained from Taking the previous decision into account, the symbol that is decided next can be obtained successively from and its value can be obtained from This ordering scheme has attempted to provide reliable decisions for the first decided symbols, which will impact the decision for subsequent symbols, and thus improve performance.Further, it must be noted that the actual realizations of the channel vectors {h k } are not needed to decode the information signals.

V. SIMULATION RESULTS AND DISCUSSION
In this section, the performance of the differential massive MIMO downlink transmission is examined.We assume the channel is modeled as quasi-static, where the block fading channel between the transmitter and receiver is constant (but unknown) during N successive channel uses, i.e., the block length of the coherence time intervals.The fast fading coefficients for each user hk = [ hk,1 , • • • , hk,n t ] T are mutually independent and modeled as independent and identically distributed (i.i.d.) complex Gaussian random variable with zero-mean and unit-variance, i.e., hk,m ∼ CN (0, 1).Throughout this section, we assume the following; urban area cellular radio model for γ , one receive antenna per user, the noise power σ 2 z k = 0 dB, the constellation size is 4-PSK, the length of the transmission block is set to N = 200, and we use the DFDD detection technique for differential detection.Table 1 shows the values of PSP parameters to be used in (13) whenever needed throughout the simulation section.Note that using ζ without the superscript k means that the values of ζ are equal for all users, i.e., The Monte Carlo simulation is used to evaluate the performance in terms of bit error rate (BER).

A. SINGLE-USER SCENARIO
The BER performance curve is first simulated and plotted for only one user.We assume that the user's location is in front of the center of the antenna array, i.e., m 1 = 50.The BS has n t = 100 transmit antennas.We examined this case using the three proposed precoders, e.g., matched PSP precoder, orthogonal PSP precoder, and optimal PSP precoder.In addition to this we compared them against the unity precoder (equal power allocation), where the precoder vector elements are all set to one, i.e., {u k } = 1 n t and then normalized.The channel parameter is set to ζ = 10.When there is no interference, Fig. 3 shows that the performance of the proposed precoders schemes, e.g., matched, orthogonal, and optimal precoders, outperforms the one that does not perform any kind of optimization for the precoding vector, e.g., the unity precoder.Clearly, in the interference-free system, the performance of the optimal PSP precoder is slightly better than the other two precoders but the difference is very small.It should be noted that in a coherent system, it is well known that the matched (to the channel) filter maximizes the SNR for the single user case.This is valid for both conventional and massive MIMO systems.However, in a noncoherent system, the matched PSP precoder is matched only to the PSP and not to the channel itself.Therefore, the matched PSP precoder does not necessarily maximize the SNR.In the optimal PSP precoder design, the optimizer tends to allocate the power to the channels that have significant gains.In other words, as the PSP coefficients are positive, the optimized precoder (that maximizes the SNR and improves the BER) will have only coefficients corresponding to the largest coefficients of the PSP greater than zero and the rest are equal to zero.

B. MULTIPLE-USER SCENARIO
Fig. 4 shows the coefficients of the proposed precoders, i.e., matched, orthogonal, and optimal PSP in the case of K = 3 users and n t = 100.The users are placed in front of the uniform array at equal distance l k from the BS but with different positions (angles) m 1 = 20, m 2 = 50, and m 3 = 80.Since we assume l k is equal for all users, then In , the BS first uses (13) to generate the PSP, {g k }, for each user, in which the BS uses them as an input to the three designed precoders.shows the generated PSP for the three users (blue: user 1; red: user 2; black: user 3).We observe that in the matched PSP precoder in , the precoder coefficients for the three users overlap significantly.For the orthogonal precoder in , the overlap between the precoder coefficients is reduced by using the Gram-Schmidt process.In the optimal PSP precoder in , the overlap between the precoder coefficients is minimized and the user is mostly separated.It is worth mentioning that if the following three conditions are satisfied, namely n t is very large, ζ k is small, and l k is small, we have g T k u q ≈ 0 for k = q.The value of ζ k is affected by the user's distance l k from the BS, the shorter the user's distance to the BS the smaller the value of ζ k , which minimizes the interference between users.In Fig. 5, we compare the performance of the proposed PSP precoders in terms of BER.We assume K = 3, m 1 = 20, m 2 = 50, m 3 = 80, n t = 100.In Fig. 5, for any value of ζ , the performance of the optimal PSP precoder outperforms the other precoders.The matched PSP precoder is not robust against interference at high BS power and thus has the worst performance.In the case of ζ = 3 for all users, the performance of the precoders is almost the same and this is because of using small value of ζ in which the users do not overlap and hence are separated very well.
In Fig. 5 also, in the presence of interference between users, the value of the power profile parameters such as ζ can impact the precoders' performance.In Fig. 5, we show the effect of adjusting ζ on the performance of the matched, orthogonal, and optimal precoders.Note that when we increase the value of channel variance for all users from ζ = 3 to ζ = 5 and then ζ = 10, the power profile significantly overlaps between users hence causing a degradation in the system performance.Hence, for large orthogonality between users' channels (small value of channel parameter ζ ), the performance of matched precoder design is close to the optimal design performance.The larger the orthogonality the closer the performance.
In Fig. 6, we investigate the impact of increasing the number of users on the system performance in terms of BER using the three proposed precoders.We considered: n t = 100, ζ = 5, and K = 2, K = 4, and K = 5.For K = 2, the positions are set to [25 75], for K = 4, the positions are set to [20 40 60 80], and for K = 5, the positions are set to [20 35 50 65 80].It is shown that differential massive MIMO systems with fewer users outperform those with a large number of users.However, using an optimal PSP precoder with the most appropriate number of n t and/or value  of ζ can minimize the overlap between users and thereby reduce loss of performance.
In Fig. 7, we examine the influence of increasing the number of transmit antennas, e.g., n t = 100 to n t = 200, on the system performance using the optimal PSP precoder.Three users, K = 3, are placed in front of the uniform array at different positions [20 50 80] and different distance l k from the BS, which yields ζ 1 = 5, ζ 2 = 10, and ζ 3 = 15.From Fig. 7, it can be seen that differential massive MIMO systems with higher number of transmit antennas outperform those with lower number of antennas.Therefore, as n t → ∞ the degree of orthogonality between users becomes large which can minimize the interference between users and improve the overall performance of the system.The larger the number of transmit antennas the better the performance.
In Fig. 8 and Fig. 9, we show the computational complexity of the system.In Fig. 8, we first set the number of users to K = 6 and increase the number of transmit antenna n t .FIGURE 8. Comparison of the computational complexity for suboptimal PSP precoders and optimal PSP precoder with K = 6 and = 0.5.

FIGURE 9.
Comparison of the computational complexity for suboptimal PSP precoders and optimal PSP precoder with n t = 100 and = 0.5.
Similarly, in Fig. 9, the number of transmit antennas is fixed to be n t = 100 while the number of users in the system increases gradually.From both figures, the computational complexity of the suboptimal PSP precoders are higher than the optimal PSP precoder.We also observe that varying number of transmit antennas at the BS has higher impact on the complexity than varying the number of users.Therefore, the optimal PSP precoder yields a low complexity scheme while providing good performance.

VI. CONCLUSION
This paper proposed three precoding schemes, namely the matched, orthogonal and optimal PSP precoders, for downlink transmission in massive MIMO systems with differential encoding and detection.With a large number of transmit antennas at the BS and full knowledge of the PSP, the proposed low-complexity downlink precoding techniques allow MAI between users to be eliminated.In a multiuser scenario, the optimal PSP precoder can effectively separate the data streams of different users, thus enhancing the system performance.In the detection scenario, the DFDD technique is used to detect the differential information signals.
Simulations show that the proposed schemes are effective precoding techniques for a massive MIMO system in a scenario where the channel is unknown at both the transmitter and receiver.
For future work, it is of interest to investigate the following; the estimation of g k,m in real wireless communication systems; the capacity of noncoherent massive MIMO systems and compare it with that of coherent massive MIMO systems; a Rician fading channel could be tested as it is more suitable for the case of small cells; and finally the case of correlated transmitted symbols.

APPENDIX I. PROOFS OF THEOREM 1
The expected value of SINR k at the kth user can be expressed as follows To By using the result of (56) in (54), we have Now, we consider the expectation E .
By using the Taylor series expansion, we can write this expectation as [29] E 4 Please note that the expectation is over the fast-fading randomness.
= E where X k = 1 and Y k = K q=1 q =k p q |h H k u q | 2 + σ 2 z k .Now, we calculate the values of E{Y k } and var(Y k ).Following similar calculation used to obtain (55) and (56), we have E{|h H k u q | 2 } = n t m=1 g k,m u 2 q,m .Therefore On the other hand, we have p 2 q E I h * k,i hk,j √ g k,i g k,j u q,i u q,j 2 = K q=1 q =k p 2 q I g k,i g k,j u 2 q,i 2 q,j .
Based on (59), (60) and the orthogonality between g k and u q as n t → ∞ for k = q, the following inequality (E{Y k }) 2 var(Y k ) is true.Further, cov(X k , Y k ) = 0. Applying these results to the series expansion in (58) we get Substituting (61) into (57), we obtain This concludes the proof.

FIGURE 2 .
FIGURE 2. Geometric system model of user's location.

TABLE 1 .
(13)values of PSP parameters to be used in(13).
calculate the expected value of the norm |h H k u k | 2 , we first expand it as follows = {k, i} i × {k, j} j | {k, i} = {k, j} .Since E{| hk,m | 2 } = 1 and E{ h * k,i hk,j } = 0 are always true, 4 then we have