Double-Sided Massive MIMO Transceivers for MmWave Communications

We propose practical transceiver structures for double-sided massive multiple-input-multiple-output (MIMO) systems. Unlike standard massive MIMO, both transmit and receive sides are equipped with high-dimensional antenna arrays. We leverage the multi-layer filtering architecture and propose novel layered transceiver schemes with practical channel state information requirements to simplify the complexity of our double-sided massive MIMO system. We conduct a comprehensive simulation campaign to investigate the performance of the proposed transceivers under different channel propagation conditions and to identify the most suitable strategy. Our results show that the covariance matrix eigenfilter design at the outer transceiver layer combined with maximum eigenmode transmission precoding/minimum mean square error combining at the inner transceiver layer yields the best achievable sum rate performance for different propagation conditions and multi-user interference levels.


I. INTRODUCTION
M ASSIVE multiple-input multiple-output (MIMO) is one of the key technologies of modern mobile communication systems [1]- [3]. It consists of employing a large number of antennas at the base station (BS) to provide a significant beamforming gain and to simultaneously serve several users. The canonical massive MIMO model [4] considers time division duplex (TDD) operation at sub-6 GHz frequencies, which allows for relatively simple channel state information (CSI) acquisition. The ever-increasing demand for system capacity and applicability in more general scenarios calls for novel massive MIMO extensions. For example, there are research efforts for developing novel massive MIMO techniques in different scenarios, including: frequency division duplex (FDD) [5], cell-free systems [6], large intelligent surface aided MIMO [7], and millimeterwave (mmWave) systems [8]- [10].
MmWave massive MIMO has attracted much interest due to the promise of large available bandwidth and less strict regulation [8]. These features are crucial for novel application scenarios such as wireless backhauling [11]- [13] and vehicle-to-vehicle communications [14]. However, mmWave systems face many propagation challenges such as atmospheric attenuation, strong free space loss, and material absorption [8]. Massive MIMO has been proposed to compensate for these issues with large beamforming gain. Most works, however, only consider users with a small number of antennas relative to the BS. Double-sided massive MIMO refers to the scenario wherein both BS and user equipment (UE) employ large antenna arrays. Therefore, this extension is even more suited than the standard massive MIMO implementation to operate at mmWave ranges, since it offers larger beamforming gain to offset the important signal propagation losses. Implementing this double-sided scenario in classical BS-smartphone links may not be realistic due to physical constraints in the latter. However, we can mention many application scenarios that may strongly benefit from this technology, including MIMO heterogeneous networks with wireless backhauling [15], terahertz communication systems [16]- [18] and mmWave unmanned aerial vehicle communications [19].
Low-complexity transceivers for double-sided massive MIMO systems were first investigated in [20]. The authors were interested in evaluating the effect of spatial antenna correlation on system performance. To this end, the Kronecker correlation model was adopted and the system performance was evaluated assuming linear transceiver schemes and perfect CSI. It was found that the impact of antenna correlation on performance strongly depends on the transceiver architecture. Specifically, zero-forcing (ZF) precoding and maximum eigenmode reception (MER) showed robustness against strong antenna correlation provided that the number of served users is not as large as the number of BS antennas. Hybrid analog/digital (A/D) and fully-digital double-sided massive MIMO transceivers were investigated in [21]. Partial ZF (PZF) and channel matching were proposed for both hybrid A/D and fully-digital strategies. However, it is not discussed whether the proposed transceiver architectures have practical CSI requirements. The transceiver strategies of [20] and [21] rely on the perfect knowledge of the channel matrix of all users. As the size of these matrices is very large (due to the double-sided massive MIMO assumption), feedback and channel estimation techniques may become overwhelming.
A potential solution to the complexity of double-sided massive MIMO systems is multi-layer filtering [22]- [24]. In this method, the filter matrix is decomposed as a product of lower-dimensional filter matrices, wherein each matrix (layer) is designed to achieve a single filtering task. The main motivation behind this idea is to enable efficient and lowcomplexity filtering in massive MIMO, which is challenging due to the large number of antennas. An attractive advantage of the multi-layer strategy is that, by decoupling the filter design problem for each layer, one can formulate simple subproblems, which may be less computationally expensive than optimizing a large-dimensional full filter matrix. Another appealing feature is the successive dimensionality reduction. Each layer is associated with an effective channel matrix whose dimensions are smaller than those of the original channel. Therefore, the channel training overhead for these layers is reduced [24]. The layered filter architecture allows designing each layer according to different CSI requirements. For example, in a two-layer approach, the first layer may depend on second-order channel statistics, while the second layer is based on the instantaneous knowledge of the lowdimensional effective channel generated by the composition of the first layer filters and the actual physical channel.
In [22], a two-layer joint spatial division and multiplexing (JSDM) filter is presented. The first layer consists of a prebeamforming stage to group UEs with similar covariance eigenspace, while the second layer manages multi-user in-terference. We present a two-layer equalizer scheme for a single-user multi-stream massive MIMO system in [23]. The first layer consists of a spatial ZF equalizer and the second layer is a low-dimensional minimum mean square error (MMSE) filter applied to the effective channel. We show that the proposed layered filtering approach is less complex than the standard MMSE equalizer since we decouple the filtering operation into simpler operations. In [25], a novel Grassmannian product codebook scheme was proposed for limited feedback FDD massive MIMO systems with two-layer precoding filters. Analytical asymptotic approximations of the achievable transmission rate were obtained for the imperfect CSI scenario. In [24], the two-layer idea is generalized to the three-layer scenario: the first layer cancels the inter-cell interference, the second layer increases the desired signal power and the third layer mitigates intra-cell interference. The multi-layer framework of [24] generalizes JSDM to also suppress inter-cell interference. The multi-layer strategy was recently applied to a cloud radio access network using full-dimension MIMO in [26] and novel precoding schemes combined with the multi-layer strategy were also presented in [27].
The main contributions of the present work are: • We propose low-complexity multi-layer double-sided massive MIMO transceivers with practical CSI requirements; • We provide a novel outer layer filter design method based on partial CSI knowledge, herein referred to as semi-orthogonal path selection; • We conduct a comprehensive simulation-based study of several double-sided massive MIMO transceivers, including the proposed ones. We also conduct benchmark simulations to discuss the advantages of the proposed methods; • We discuss the applicability of the presented methods for different mmWave channel setups and indicate the propagation conditions where multiple data stream transmission per UE is feasible.
We provide the signal, system and channel models as well as details on CSI acquisition in Section II. We introduce our transceiver schemes and discuss their computational complexity in Section III. We present our simulation results and discussions in Section IV and we conclude our paper in Section V.
Notation: Vectors and matrices are written as lowercase and uppercase boldface letters, respectively, e.g., x and X. The (i, j)-th entry of X is written as [X] i,j . The transpose and the conjugate transpose (Hermitian) of X are represented by X T and X H , respectively. The N -dimensional identity matrix is represented by I N and the (M × N )dimensional null matrix by 0 M ×N . The imaginary unit is referred to as  = √ −1. The Euclidean norm, the Frobenius norm, the matrix trace, the determinant, and the statistical expected value are respectively denoted by · 2 , · F , Tr(·), det(·), and E [·]. The Diag(·) operator transforms an input  vector into a diagonal matrix and Blkdiag(·) forms a blockdiagonal matrix from the matrix inputs. The operator rank(·) denotes the argument matrix's rank, span(·) refers to the space spanned by the argument vectors, and #(·) denotes the argument set's cardinality. The uniform distribution from a to b is denoted U(a, b). The complex Gaussian distribution with mean µ and covariance matrix Σ is written as CN (µ, Σ). O(·) stands for the Big-O complexity notation and ! = denotes equality by construction.

II. SYSTEM MODEL
Let us consider the single-cell multi-user MIMO system depicted in Figure 1. Assuming downlink operation, a single base station equipped with N t antennas communicates with U UEs, each having N r antennas. We assume the doublesided massive scenario, i.e., the BS and UEs are equipped with a large number (≥ 64) of antennas. We consider multistream transmission: the BS sends N s data streams in parallel to each UE. To this end, the BS employs linear precoding filters F u ∈ C Nt×Ns , u ∈ {1, . . . , U }, to encode the N s data streams corresponding to UE u into the N t BS antennas. Then, UE u applies the combining filter W u ∈ C Nr×Ns to the signals received from its N r antennas to estimate its corresponding N s data streams.
Assuming narrow-band block fading, the input-output relationship of our system model can be written as where H u ∈ C Nr×Nt denotes the downlink channel matrix between the BS and UE u, s u ∈ C Ns the data symbols intended to UE u and b u ∈ C Nr the noise vector. We assume that R s,u = E s u s H u = (1/N s )I Ns and b u ∼ CN (0 Nr×1 , σ 2 n I Nr ) for all u ∈ {1, . . . , U }. The total transmit power of the BS is denoted by P t and the system signal to noise ratio is defined as SNR = P t /σ 2 n . It is possible to improve the achievable sum rate by optimizing the power allocation. However, we assume equal power allocation among users for analysis simplicity. The precoding matrices are thus designed to satisfy the power constraint F u 2 F = P t /U .

A. CHANNEL MODEL
We model double-sided massive MIMO channels using the narrow-band clustered channel model with L paths [28]- [30]. The downlink channel matrix H u ∈ C Nr×Nt between the BS and UE u can be expressed as where α ,u denotes the complex channel gain of path , a t,u ∈ C Nt and a r,u ∈ C Nr the transmit and receive array response vectors evaluated at azimuth {φ (t,u) , φ (r,u) } and elevation {θ (t,u) , θ (r,u) } angle pairs, respectively. The departure and arrival angles are taken from continuous distributions which depend on the environment. We assume that all paths are statistically independent and that the number of paths L is the same for all BS-UE links to simplify the analysis. This can be achieved by selecting the L strongest paths for each link. We model the complex channel gains α ,u as independent and identically distributed (i.i.d.) circular symmetric Gaussian random variables with zero mean and variance σ 2 α . At mmWave bands, the number of paths L is typically much smaller than the numbers of antennas N t , N r at BS and UE, respectively [8]. Using matrix notation, (2) can be rewritten as VOLUME 4, 2016 The rank of H u depends on the angular distribution of the paths. For example, if the angles are independently taken from a uniform distribution and assuming L ≤ min(N t , N r ), then we have that rank(H u ) = L with probability 1.
In our simulations, we consider uniform linear arrays (ULAs) at both transmit and receive sides without loss of generality. In fact, any type of array geometry compatible with (2) is valid for this work. The considered ULAs are comprised of omni-directional antennas with inter-antenna spacing of d = λ/2, where λ denotes the carrier wavelength. Therefore, the array response vectors are written as for x ∈ {t, r} and φ ∈ (−π, π).

B. LAYERED TRANSCEIVER ARCHITECTURE
We consider the layered filtering architecture proposed in [24] to tackle the large dimensionality of double-sided massive MIMO systems. This filtering scheme consists of factorizing the filter matrix into outer and inner filter matrices. The former serves to form a low-dimensional effective MIMO channel while the latter implements the precoding or combining operation. The precoding filter matrix F u is thus decomposed into an outer factor F o,u ∈ C Nt×Mt and an inner factor We define the normalization factor Regarding the hardware implementation of the transceiver system, the multi-layer scheme can be implemented in both fully-digital and hybrid A/D strategies [22], [24]. In the former strategy, the precoder F u = γ u F o,u F i,u and combiner W u = W o,u W i,u are completely implemented in baseband. In the latter strategy, the outer filters F o,u and W o,u are implemented in the analog domain and the inner filters γ u F i,u and W i,u are built on baseband. In the hybrid A/D strategy, the outer filters are constrained by the analog hardware with, for example, elementwise constant-modulus restriction [9], [28]. This constraint can be avoided by spending two analog phase shifters for each beamforming coefficient, as described in [31]. Such hardware constraints are not necessary when the transceiver filters are completely implemented in baseband, as in the fully-digital architecture. Of course, the transceiver design should mind other hardware-related constraints such as total or per-antenna power constraint, peak-to-average power ratio, among others.
Let us define the effective channel matrices: We also define the effective outer- . For future convenience, let us rewrite (1) in terms of the effective channels and inner layer filters:

C. CHANNEL STATE INFORMATION ACQUISITION
We assume that our double-sided massive MIMO system operates on perfectly synchronized TDD. The CSI acquisition is divided into two stages. First, the CSI necessary to compute the outer layer filters is obtained. We consider the following acquisition scenarios for outer layer CSI: • Statistical CSI -The BS and the UE estimate C ul,u = E H H u H u and C dl,u = E H u H H u , respectively, over some time slots. Subspace estimation [32] or compressive sensing-based approaches [33] can be used to estimate the statistical CSI; • Partial CSI -Both BS and UE have perfect knowledge of the macroscopic channel parameters: the path power |α ,u | 2 and azimuth angles φ (t,u) and φ (r,u) . Channel estimation methods that exploit the mmWave channel sparsity can be considered to obtain the partial CSI [34], [35]. We would like to emphasize that the outer layer filters depend only on macroscopic CSI (path power and angular directions). The statistical CSI depends only on the path power and on the angles (via the antenna array response vectors) and it does not rely on microscopic channel variations (phase-shifts of the individual multipath components), which are averaged out with the statistical expectation in C ul,u and C dl,u .
The second CSI acquisition stage consists of estimating the inner layer effective channels H eff,u,j (inner layer CSI). The inner layer CSI acquisition task is not expensive due to the low dimensions of the effective channel matrices. It can be efficiently performed by well-known MMSE estimators [4] and CSI feedback methods [25], [36], [37] without much overhead. Therefore, we consider that both BS and UE have perfect knowledge of the effective channels for analysis simplicity. Assessing the impact of imperfect CSI on the proposed transceiver strategies is out of the scope of this work. The inner layer filters depend on microscopic channel variations, which change quickly with movements in the order of the wavelength and cause microscopic fading.
The outer and inner layer filters are updated according to the different time scales. The macroscopic CSI necessary for the outer layer filters does not change significantly as long as the receiver stays within the 3-dB beamwidth of the transmitter's antenna array. If the distance between the transmitter and the receiver is at least several tens of meters, then the receiver will be within the 3-dB beamwidth for some time provided that it is not moving too fast. By contrast, the microscopic CSI changes faster, even than the channel coherence time. In conclusion, the inner layer filters are updated more often than the outer layer filters because of the different time scales of the corresponding CSI.

III. TRANSCEIVER SCHEMES
We present low-complexity outer and inner layer filtering methods for double-sided massive MIMO systems in this section. The filtering layers are designed to perform different tasks: the outer layer typically aims to provide an SNR gain, whereas the inner layer seeks to cancel multi-user interference [24]. We study three outer layer schemes, namely It is desirable to form full-rank effective channels H eff,u so that the proposed transceiver schemes support multi-stream transmission. Therefore, we consider the following assumptions: A1 The rank of the channel matrices is lower bounded as The outer layer filters have full rank, i.e., rank(W o,u ) = M r and rank(F o,u ) = M t .
We have that rank(H eff,u ) = min(M r , M t ) as a consequence of A1 and A2. A1 is satisfied provided that the channel has enough degrees of freedom, which depends on the assumed channel properties. Finally, A2 can be enforced when designing the outer layer filters, as we will show in the following.

1) Covariance Matrix Eigenfilter (CME)
Assuming statistical CSI, let denote the eigendecomposition of the estimated channel covariance matrices, Q dl,u ∈ C Nr×Nr , Q ul,u ∈ C Nt×Nt the eigenvector matrices, and Ξ dl,u ∈ C Nr×Nr , Ξ ul,u ∈ C Nt×Nt the corresponding eigenvalue matrices. The outer layer filters W o,u and F o,u are derived as the M r and M t dominant eigenvectors ofĈ dl,u andĈ ul,u , respectively. DefineQ dl,u andQ ul,u as the truncated eigenvector matrices with the M r and M t first columns of the corresponding matrices. Then, the eigenfilters are given by [24] F o,u =Q ul,u ∈ C Nt×Mt , W o,u =Q dl,u ∈ C Nr×Mr for all u ∈ {1, . . . , U }. We hereafter refer to this filtering scheme as covariance matrix eigenfilter (CME).
for all t ∈ L (t) D and r ∈ L (r) D .

3) Semi-orthogonal Path Selection (SPS)
Although the PPS method is simple, it has a major drawback: it may select highly correlated paths, which would yield rankdeficient effective channels. That would not be ideal for a multi-stream communications scenario. As an alternative to SPS and CME, we propose a novel sub-optimal solution which selects the beamforming directions using a semiorthogonal path selection (SPS) algorithm. The proposed solution can be seen as a customization of the semi-orthogonal user selection algorithm of [38] to the beamforming problem. SPS is presented in Algorithm 1 considering Partial CSI knowledge is sufficient here, since the array manifold matrix A can be built from the departure or arrival angles in partial CSI, as in (4).
SPS seeks M semi-orthogonal steering vectors with relatively strong power. Semi-orthogonality is enforced by steps 2 and 4 in Algorithm 1: the non-selected path components in Λ i are projected onto the orthogonal complement of span g (1) , . . . , g (i−1) . Then, among these semi-orthogonal vectors, the path with largest power, measured by g 2 2 is selected in Step 3. Since SPS provides outer layer precoding and combining matrices formed by M t and M r columns VOLUME 4, 2016 of A t,u and A r,u , respectively, then it can be shown that Step 1: Initialization: S ← Empty set Selected paths set 5: i ← 1 6: while #(S) < M do 7: Step 2: Form orthogonal projections: 8: for each path ∈ Λi do 9: g ← |α | 2 a 10: if i ≥ 2 then 11: Step 3: Select ith path: 15: π(i) ← arg max ∈Λ i g 2 2 16:

B. INNER LAYER FILTERING
The low-dimensional effective channels H eff,u can be formed once the outer layer filters have been selected. The design of inner layer filters is now regarded as a classical multi-user MIMO transceiver design problem. For future convenience, let the singular value decomposition (SVD) of H eff,u be written as where U s u ∈ C Mr×Ns contains the N s first left singular vectors, V s u ∈ C Mt×Ns the first N s right singular vectors, Σ s u = Diag(σ 1 , . . . , σ Ns ) the matrix formed by the N s first singular values and Σ o u = Diag(σ Ns+1 , . . . , σ min(Mr,Mt) ) the matrix with the remaining singular values. Note that the truncated singular vector matrices are semi-unitary, i.e., U sH u U s u = V sH u V s u = I Ns . Regarding CSI, we make the following assumptions: • BS, as well as UEs, have perfect knowledge of the corresponding H eff,u for all inner layer transceiver strategies. This is a practical assumption, since M t , M r ≤ N t , N r , allowing the development of efficient CSI feedback methods [25], [36], [37]; • MET-BD, BD-MER, MET-MMSE additionally have perfect knowledge of the interfering effective channel matrices H eff,u,j for all j = u at the user-side.

1) MET-MER: Maximum Eigenmode Transmission (MET) and Maximum Eigenmode Reception (MER)
The MET-MER transceiver scheme selects the inner precoding matrix F i,u as the first N s right singular vectors of H eff,u and the inner combining matrix W i,u as the first N s left singular vectors of H eff,u [39]: The MET-MER transceiver seeks to maximize the SNR at the UE disregarding multi-user interference. The BS can transmit up to N s ≤ min(M r , M t ) data streams per user simultaneously.

2) MET-BD: Maximum Eigenmode Transmission (MET) and Block Diagonalization (BD) Reception
In this scheme, the UE satisfies the BD condition to cancel multi-user interference [30]: where H eff,u,j is defined in (5), and F i,j = V s j , for all j ∈ {1, . . . , U } \ {u}. The BD combiner requires U N s ≤ M r in order to simultaneously cancel the multi-user interference and allow the transmission of N s data streams per user. If this condition is satisfied, then (U − 1)N s ≤ M r andH eff,u becomes full column rank. Consequently, interfering users can be canceled by projecting W i,u onto the null-space of H H eff,u . We project the MER combiner (9) onto the null-space of the multi-user interference matrixH H eff,u to maximize the intended signal power while canceling interference. Let the SVD ofH eff,u bē whereŪ o u ∈ C Mr×(U −1)Ns contains the last (U − 1)N s left singular vectors ofH eff,u . The null-space projection matrix is defined asP =Ū o uŪ oH u ∈ C Mr×Mr . The MET-BD transceiver filters are thus given by:

3) MET-MMSE: Maximum Eigenmode Transmission (MET) and Minimum Mean Square Error (MMSE) Reception
We also consider interference-aware MMSE combining [40] to balance between the multi-user interference minimization and intended user power maximization. The MMSE inner layer filter is obtained from where y u is the received signal at UE u defined in (6) and the expectation is performed with respect to the transmitted symbols and additive noise. By solving (10) and setting the MET precoders F i,u = V s u for all u ∈ {1, . . . , U }, the MMSE combiner reads as [40]: Note that the MMSE combiner does not require U N s ≤ M r unlike the BD combiner.

4) BD-MER: Block Diagonalization (BD) Transmission and Maximum Eigenmode Reception (MER)
With this strategy, the block diagonalization condition is formulated at the transmitting side [30]: The BD precoder is able to mitigate multi-user interference at the BS and transmit the N s data streams per user when U N s ≤ M t . In this case,H eff,u is of full row rank and the precoding filter lies in the null-space ofH eff,u . We project the MET precoder (9) onto the null-space of the multi-user interference matrixH eff,u to maximize the power of the intended UE while mitigating interference at non-intended UEs. Let the SVD ofH eff,u bẽ whereṼ o u ∈ C Mt×(U −1)Ns contains the last (U − 1)N s right singular vectors ofH eff,u . The null-space projection matrix is written asP =Ṽ o uṼ oH u ∈ C Mt×Mt . Therefore, the BD-MER transceiver filters are given by:

C. COMPLEXITY ANALYSIS
In this section, we evaluate the complexity of the proposed transceiver strategies. The total complexity is divided into three parts: outer layer filter design, effective channel matrices computation and inner layer filter design.

a: Outer Layer Filter Design
• CME -The computational complexity of eigendecompositions (7) and (8) ). In our multi-layer approach, the outer layer filters are updated once the macroscopic CSI is outdated, whereas the inner layer filters are recalculated as the microscopic CSI changes. Fortunately, the macroscopic CSI evolves slower than the microscopic CSI, as discussed in Section II-C, therefore, the outer layer is updated once in a while, whereas the inner layer is updated more often. If M t and M r are much smaller than N t and N r , then the proposed solution is less complex than the classical single-layer approach, which consists of applying the inner layer schemes directly to the (N r × N t )-dimensional channel matrices H u . In this case, the complexity of each transceiver would be cubic with N t and N r , instead of M t and M r , as we observe in the proposed multi-layer approach. Moreover, the single-layer transceiver filters would be updated at the microscopic CSI timescale. The double-sided massive MIMO transceiver schemes proposed in [21] would face similar computational challenges as the single-layer approach, because they work directly with (N r × N t )-dimensional channel matrices.

IV. SIMULATION RESULTS
In this section, we present and discuss a variety of numerical simulations conducted to investigate the proposed double-sided massive MIMO transceiver architectures. We are mostly interested in evaluating the spatial multiplexing capabilities of the proposed methods and identifying the most suited strategy for different channel propagation scenarios. Therefore, we consider the achievable sum rate as the figure of merit. In our simulations, we generate the arrival and departure angles in (3) as follows: the L rays are grouped in clusters of 4 rays. For each cluster, we select the mean cluster angleφ c , a random variable in U(0 • , 180 • ), and then the angle of each ray in the cluster is modeled as a Gaussian random variable with meanφ c and standard deviation of σ c degrees. To achieve satisfactory spatial multiplexing, the channel has to offer sufficient degrees of freedom. MmWave channels, however, are characterized by a reduced number of scatterers [41], which may decrease the channel degrees of freedom. To account for these propagation differences in the spatial multiplexing performance, we study three scattering scenarios: • Poor scattering -2 clusters, L = 8 rays; • Fair scattering -8 clusters, L = 32 rays; • Rich scattering -16 clusters, L = 64 rays. The "poor" scenario can be seen as the pessimistic setup, which can be realistic for indoor mmWave systems. The "rich" scenario is regarded as the optimistic case, which can be feasible for sub-6 GHz systems. The "fair" scenario plays a compromise between the pessimistic and optimistic setups.  We present three groups of simulation results. In the first group, we examine the outer layer filtering strategies. In the second group, we compare the achievable sum rate performance of the proposed inner layer filtering methods. In the final simulation group, we benchmark the proposed transceivers. In all simulations, we considered the following parameter setup: N t = N r = 64 antennas, noise variance σ 2 n = 10 −3 , i.i.d. channel gains variance σ 2 α = 1 and Gaussian spreading standard deviation σ c = 5 • . The downlink and uplink channel covariance matrices for statistical CSI (Section II-C) were estimated by averaging over 100 time slots. The presented results were averaged over 1000 independent experiments.

A. OUTER LAYER FILTERS
Let us first compare the spatial multiplexing performance of the outer layer filtering methods. Since this layer mainly concentrates at SNR gain, we disregard multi-user interference by setting U = 1. Furthermore, we do not employ inner layer filtering, and, thus, F u and W u in (13) are given by the outer layer filters with M t = M r = N s . Let us assess the impact of the number of multiplexed data streams N s . To this end, we consider the ratio N s /L. The transceiver operates at maximum spatial multiplexing when N s /L = 1. We set SNR = 20 dB for the results presented in figures 2-4.
In Figure 2, we evaluate the outer layer schemes at the poor scattering scenario. We observe that all methods perform roughly the same. At aggressive spatial multiplexing (N s /L approx. 1), CME exhibits an advantage over the geometrical methods. Since we only have a few paths in this poor setup, it is expected that SPS and PPS do not differ much. With only 2 clusters, at least two paths will likely show some spatial correlation. Figure 3 reveals that PPS tends to perform worse as we increase the number of paths. This is because of the likelihood of the strongest paths being spatially correlated increases with L. Moreover, we observe that SPS outperforms PPS because it avoids selecting highly correlated paths, which deteriorates the achievable sum rate. However, when N s = L, SPS behaves the same as PPS, because it ends up choosing all paths and cannot avoid correlation. When we set N s = L, the likelihood of selecting paths with similar angular directions significantly increases, the rank of the beamforming matrices decreases and the achievable rate drops. This likelihood is more pronounced in the fair and rich scattering scenarios. In the fair scenario, SPS yields the best performance in the multiplexing range N s /L = 0.125 to 0.625. Finally, the simulation results for the rich scattering scenario shown in Figure 4 indicate a similar behavior to that observed in the fair scenario. The main difference is that PPS performs even worse. Overall, these results reveal that SPS yields the best performance when there is enough path diversity and the spatial multiplexing is not too aggressive. CME exhibits good robustness to strong spatial multiplexing. Although SPS performs better than CME in many scenarios, it is more computationally complex, especially when M t and M r are large.
Furthermore, figures 2-4 provide valuable information on how to select the transceiver parameters M r and M t . Since N s = M r = M t in these experiments, we observe that M r /L = M t /L can be set as large as 0.75, 0.625 and 0.375 at poor, fair and rich scattering environments, respectively, for SPS. Larger ratios do not improve performance and may even deteriorate the achievable rate. Similar analysis can be done for CME and PPS. Note that we assumed M r = M t for simplicity since the analysis becomes convoluted when M r = M t .

B. INNER LAYER FILTERS
Recall that the inner filtering layer aims at tackling multiuser interference. Therefore, we conducted experiments to compare the interference robustness of the proposed inner layer schemes. We employed CME outer filtering motivated by the insights obtained from the outer layer simulation results.
Let us begin the assessment of the inner layer filters by analyzing the achievable sum rate performance at the pessimistic (poor) propagation scenario. Figure 5 shows the transceiver performance for a non-congested setup with U = 4 UEs, N s = 1 data stream per user and M t = M r = 4. Since U N s = M t = M r , BD/MMSE cancels the multi-user interference out, as expected. Also, all transceivers but MET-MER achieve the full degrees of freedom in the asymptotic SNR regime. What would happen in a congested scenario? In Figure 6, we consider U = 32 UEs, N s = 1 data stream per user and M t = M r = 4. Note that this parameter setup gives U N s > M t = M r , thus the BD conditions are not satisfied and the BD-based transceivers cannot be applied in this congested scenario. MET-MMSE works with this parameter setup, however, it is not able to completely reject the multi-user interference. As a result, the transceiver becomes interference-limited at high SNR. Nonetheless, we observe a reasonable performance at low SNR, e.g., MET-MMSE yields 63 bit/s/Hz sum rate at 0 dB SNR. This is because outer layer filtering already rejects some interference and the remainder is filtered by the inner layer. Figures 5  and 6 indicate that MET-MMSE and BD-MER yield the best performance in a non-congested scenario, while MET-MMSE and MET-MER are the preferred choices when the system becomes congested.   Figure 7 shows the achievable sum rate performance for the poor scattering scenario. We observe that the BD-based transceivers do not perform well in this scenario, as they are capable to manage up to 4 UEs. Note that BD-MER and MET-BD are plotted only when the BD condition U N s ≤ M t = M r is satisfied. MET-MMSE and MET-MER, on the other hand, are not limited by this constraint and provide satisfactory results even when the system is overloaded. At 20 dB SNR, the transceivers already have attained the rate saturation region, as we see in Figure 6 when the system is congested. Therefore, these curves mainly compare how well the transceivers perform when the system becomes interference-limited. Figures 8 and 9 present the simulation results for the fair and rich scattering scenarios, respectively. As the environment offers more scatterers, the transceivers may operate with larger M t and M r and, consequently, more UEs can be served. Figures 8 and 9 reveal that BD-MER has performance peaks at 16 and 24 UEs, respectively, which outperforms MET-MMSE for the given parameters. However, as U N s approaches M t and M r , the performance of the BD-based transceivers deteriorates. In conclusion, spatial multiplexing in poor scattering scenarios should be carried out using either MET-MMSE or MET-MER since there are not enough degrees of freedom for BD to cancel the interference. When the propagation medium offers more scattering diversity, such as in the fair and rich   scenarios, BD-MER becomes a reasonable choice as long U N s ≤ M t . But even when this condition is not obeyed, MET-MMSE still provides proper results.

C. BENCHMARKING
We benchmark the proposed transceiver to alternative schemes in this section. The first benchmark methods are the 1-layer version of our proposed methods. They are based on the (N r × N t )-dimensional channel matrix and they do not apply any outer-layer filter to form low-dimensional effective channels. The second benchmark method is the PZF solution proposed in [21]. We assume perfect CSI for the 1layer and PZF benchmark methods. Figures 10-13 reproduce the benchmark results for the MET-MER, MET-BD, MET-MMSE and BD-MER transceivers, respectively. We consider the poor scattering scenario with 2 UEs, 20 dB SNR and N s = 1 and 2 data streams. Therefore, the BD condition is satisfied and the BD-based methods can be applied.
We observe that the 1-layer strategy outperforms the proposed 2-layer strategy in the achievable sum rate criterion in all benchmark results. This is expected because the 2-layer solution is based on the concatenation of two filters, so any inaccuracy inserted by either outer or inner layer filter is sufficient to degrade the achievable performance relative to the 1-layer version. However, the benchmark results indicate that these losses are negligible when only one data stream is transmitted. Among the proposed transceiver schemes, BD-MER exhibits the most important loss relative to its 1-layer analogous at N s = 2 for the given parameters. PZF performs as well as our methods for N s = 1 data stream. However, we observe that PZF outperforms the proposed methods when the number of data streams is increased to N s = 2. PZF is a 1-layer method, which does not rely on the concatenation of low-dimension filters, so its superior performance is expected in non-congested scenarios.
The benchmark methods exhibit, in general, larger data throughput than the proposed methods for the given simulation parameters. However, they are more computationally complex and CSI acquisition is unfeasible in practice due to the large dimensions of the associated CSI. Our methods, by contrast, have low computational complexity and practical CSI requirements, as discussed in sections II-C and III-C.

V. CONCLUSION
We presented novel and practical transceiver schemes based on multi-layer filtering for double-sided massive MIMO systems. For the outer filtering layer, we compared a statistical approach (CME) to geometrical schemes (SPS and PPS). Simulation results show that SPS provides substantial gains over the naive PPS. Furthermore, it exhibits superior throughput to CME when spatial multiplexing is moderate, i.e., the number of data streams is roughly half the number of channel paths. However, the statistical approach offers good robustness to strong spatial multiplexing and can be less computationally complex than SPS. The choice between SPS and CME in practice amounts to the availability of either statistical or partial CSI. Regarding the inner filtering layer, MET-MMSE was found to be the most robust to different channel scattering conditions and multi-user interference, especially at low SNR. BD-MER provides the largest throughput for some specific scenarios with a fair amount of channel paths, which may not be practical in mmWave channels. For future work, we intend to investigate the proposed transceivers in some different application scenarios (multi-cell systems, vehicular communications, among others), to extend our methods to the broadband and multi-carrier scenarios [42] and to evaluate the effect of imperfect CSI on system performance.