Performance Improvement for Multi-User Millimeter-Wave Massive MIMO Systems

In this paper, we study the problem of optimizing the performance of multi-user millimeter-wave (mmWave) communications in three steps. The ﬁrst one is given by the use of a new pilot mapping to reduce the inter-user interference effect and to perform more accurate channel estimation. In the second step, we designed a hybrid receiver that, based on the accuracy of the channel state information, chooses between the minimum mean square error (MMSE) and the multi-user regularized zero-forcing beamforming (RZFBF) receivers, to combine/precode the massive multiple-input multiple-output (MIMO) signal. In the third step, we propose to improve the beam direction with a slight change in the azimuth angle during the uplink communications to increase the multi-user efﬁciency and reduce inter-user interference. Numerical results show the performance increase using the proposed solutions in terms of the spectral efﬁciency by comparing the MMSE, RZFBF, and hybrid receivers.


I. INTRODUCTION
Millimeter-wave (mmWave) technology enables low latency communications and multi-gigabit data rates that widely leverage the potential of the fifth-generation (5G) New Radio (NR) standard. The mmWave band allows packing massive multi-antenna arrays onto a small base station (BS) and dozens of antenna elements onto the user equipment (UE). This feature enables the use of genuinely massive multipleinput multiple-output (MIMO) technology for multi-user scenarios with very narrow beams that can achieve maximum directional gain [1], [2].
Fully Digital beamforming is the state-of-the-art method that allows achieving the maximum spectral efficiency with massive MIMO systems. However, at mmWave frequencies, digital beamforming is prohibitively complex due to the inherent hardware cost that supposes the use of a dedicated radio-frequency (RF) chain per antenna in tiny array panels. Instead, hybrid beamforming techniques with variable phase shifters for analog RF processing are integrated to provide The associate editor coordinating the review of this manuscript and approving it for publication was Fang Yang .
high-dimensional phase-control, whereas digital baseband processing can be performed in a reduced dimension. The objective of all hybrid beamforming techniques is to reduce the hardware and signal processing complexity while providing performance close to that achieved with fully digital beamforming [3]- [7].
Fully-and partially-connected structures are the leading designs for hybrid beamforming, which can significantly reduce the hardware cost and power consumption by employing a small number of RF chains. In [5], [8], a thorough analysis comparing the fully digital beamforming to the fully-and partially-connected hybrid structures can be found. Many studies state that a fully-connected hybrid beamforming structure, with the correct signal processing strategy, can achieve a spectral efficiency close to that of fully digital beamforming [9]- [12]. However, the computational complexity required for a fully-connected structure to approach the spectral efficiency of fully digital beamforming is considerably high. Still, this kind of architecture is not widely used for commercial radios since a phased-shifter per antenna at mmWave frequencies has a high manufacturing cost [8]. This is the main reason partially-connected hybrid structures are extensively used in commercial mmWave systems. With the correct signal processing techniques, partially-connected structures can achieve performance comparable to fullyconnected structures [4]. However, technologies for performance improvement should be focused on low complexity designs to get the right balance between hardware complexity, computational complexity, and spectral efficiency.
Although commercial mobile systems currently use the mmWave band, there is an opportunity to increase the spectral efficiency without using high computationally complex techniques. Standard solutions used to improve the performance of these systems are given by methods related to channel estimation and coding, enhanced receiver techniques, MIMO combining and precoding [13], beamforming designs [14], new coherent and non-coherent detection techniques, among others [9].
In [15], a method for spectral efficiency improvement known as spatially sparse precoding is proposed, considering low hardware-complexity architectures and realistic channel models. The solution is based on the linear combination of beam steering vectors that can be applied with analog RF precoders and combined digitally at baseband. The constraint of this solution is that it can be prohibitively complex since the received signal needs to be linearly combined in the analog domain before any other processing task. The same strategy is used with the low-complexity minimum mean squared error (MMSE) receiver to find the optimal baseband combiner, separating the analog and digital domains. However, this work only considered single-user precoding and assumed perfect channel state information (CSI) at the receiver.
In [16], [17], a multi-user regularized zero-forcing beamforming (RZFBF) receiver, is proposed. By making the signal components add destructively at the non-intended users, the RZFBF receiver eliminates the intra-group user interference. However, this solution is an approximation to the optimal ZF beamformer given the nature of the regularization parameter (which needs to be appropriately managed) and the power allocation of the multiple signals. Moreover, in the mmWave band, the challenge is to estimate the CSI of the multiple UEs accurately, which is required by the RZFBF technique. Without an accurate estimation of each one of the users' channels, the intra-group user interference cannot be canceled with this technique [18].
To overcome the channel estimation challenge present in multi-user scenarios, the work in [19] uses beam training and orthogonal pilot allocation for multi-user multi-stream mmWave massive MIMO systems. This approach considers the allocation of several UEs, reducing the overhead of beam training, slightly sacrificing the achievable spectral efficiency, and conclude that with a new pilot mapping and limited simultaneously served UEs, the performance can be increased.
In this paper, we propose a novel design framework to increase the spectral efficiency of multi-user mmWave systems by exploiting three low complexity techniques. We consider the effect of limited mmWave scattering and large tightly packed arrays, no perfect CSI at the receiver, realistic mmWave channels, and partially-connected hybrid structures. The contributions of this work are summarized as follows: • With independent streams over multiple channels that share the same frequency and time resources, we propose a new pilot mapping design to overcome inter-user interference without increasing the complexity of the CSI estimation task.
• To fully exploit the availability of multi-user CSI, the MMSE and the RZFBF methods are proposed in a hybrid receiver that, based on the CSI quality, switches between the two MIMO techniques to generate the combining and precoding matrices.
• Considering beamforming is essential for mobile mmWave communications and the time required to select a beam, we propose to refine the beam direction during the uplink (UL) transmission to achieve higher beamforming gains. Also, simulation results based on the proposed pilot mapping, hybrid receiver, and beam refinement show the dynamic behavior of the combinations of these three techniques and the impact on the performance in terms of spectral efficiency.
The remainder of this work is organized as follows. Section II presents the system description, which includes an overview of hybrid array structures, the system model, and the mmWave channel model. The new pilot mapping, the hybrid receiver design, and the proposed beam-steering improvement are outlined in Section III. In Section IV, the numerical results of the proposed solutions are presented and discussed. Finally, concluding remarks and future directions are given in Section V.
Notation: Scalars are denoted in lower case. Bold upper case and lower case denote matrices and vectors, respectively. For any general matrix or vector, x T represents the transpose and x * the conjugate transpose. We use diag (x 1 , . . . , x n ) to represent a diagonal matrix whose diagonal elements are the corresponding components from matrix X. I N is an identity matrix of dimension N, whereas 1 N is an all-ones vector of dimension N. · F represents the Frobenius norm operator. E [·] denote the expected value. C denotes the set of complex numbers, and N denotes the set of natural numbers. Finally, a circular symmetric complex Gaussian stochastic vector is written as x ∼ CN µ x , σ 2 x with mean µ x and variance σ 2 x .

II. SYSTEM DESCRIPTION
MmWave communications allow us to transmit large amounts of data thanks to the high bandwidth availability. Still, given the much smaller wavelength at mmWave frequencies, the channel effects are more severe, resulting in a high probability of low coverage and signal blockage due to the delay spread, Doppler spread, and atmospheric effects, but primarily due to high path loss. However, at those frequencies, with massive arrays of antennas, the resultant antenna gain with beamform-87736 VOLUME 8, 2020 ing transmission can reduce the severity of path loss and also reduce delay spread and the Doppler effect [1], [20]. As stated in [5], for massive MIMO systems, the fully digital beamforming is the state-of-the-art technology that can provide the maximum beamforming gain. However, fully digital processing is hard to realize at mmWave frequencies with wide bandwidths, and large antenna arrays since the baseband combining/precoding processing assumes the use of a dedicated RF chain per antenna, which involves high hardware cost and power consumption. Instead, a hybrid beamforming structure that consists of a reduced number of RF chains facilitates multi-stream transmission with spatial multiplexing, based on digital baseband processing, followed by high-dimensional phase shifter-based analog processing [4], [8], [21].
Hybrid beamforming structures reduce the hardware cost and power consumption of mmWave systems. They are primarily given with two designs: Fully-connected, in which each RF chain is connected to all antennas, and partiallyconnected, in which each RF chain is connected to a set of antenna elements (subarray). The fully-connected structure provides full beamforming gain per RF chain with U × M dedicated phase shifters, where U is the number of transceivers (RF chains), and M is the number of antennas at the BS. The partially-connected structure requires M dedicated phase shifters; and therefore, it is more energy efficient. In [5], a thorough analysis related to the performance of hybrid beamforming structures is provided. On average, the spectral efficiency of the fully-connected structure is 5 bits/s/Hz higher than the partially-connected design. However, many commercial mmWave 5G Radios are built based on a partially-connected design given that mmWave signal processing entails high power consumption due to the high sampling rate and high bit-resolution requirements [1], [3], [4], [22]- [24]. Thus, we chose the partially connected structure to describe our system model.

A. SYSTEM MODEL
We consider a single-cell multi-user mmWave system for the analysis, where a BS communicates simultaneously with K randomly located UEs within the coverage area, as it is shown in Fig. 1.
The block diagram on the left-hand side of Fig. 1, shows a mmWave BS equipped with a partially-connected hybrid array structure with U transceivers and M = UN r antennas, where each RF chain is connected to a subarray of N r antennas trough analog RF precoders (phase shifters and gain control units). With this hardware architecture, the number of simultaneously served UEs that can be adequately spatially separated is limited by the number of transceivers at the BS, K ≤ U . On the other hand, the right-hand side of Fig. 1 shows the block diagram of the K UEs. For the UEs, analog RF precoding is performed over N t RF paths of each UE, whereas digital baseband precoding is performed over the UE specific transceiver. This design enables to apply a baseband precoder f BB k ∈ C N t ×1 using the UE RF chain, followed by an analog RF precoder F RF k ∈ C N t ×N t . Therefore, the transmitted signal from the k th UE is F RF k f BB k s k , where s k ∈ C 1×N s is a symbol vector such that E s * k s k = 1 N s I N s , where N s is the number of symbols that compose the data stream. The RF precoder is expressed as ( where f par RF N t is the RF precoder matrix element for the partially-connected hybrid array [8], [25], [26]. We assumed the BS first selects the RF beamforming vectors for each UE from a beam steering RF codebook; this process is described in Section III-C. In practical mmWave systems, the CSI is estimated via UL training with the pilots received at the BS, with this, MIMO combining and precoding can be performed. In our system design, we consider the least square (LS) technique to estimate the CSI and, with the estimated channel, combine the received signal in the digital domain [27], [28]. In the UL, the signal Y u ∈ C N r ×N s received at the u th RF chain of the BS and transmitted by the k th UE, which is equipped with N t antennas, is given by where H k,u , H j,u ∈ C N r ×N t are the massive MIMO channels between the k th and j th UEs, respectively, F RF k f BB k s k denotes the precoded symbols from the k th UE, whereas F RF j f BB j s j denotes the precoded symbols from the j th UEs. It is assumed that all data symbols are independent, E |s k | 2 = ρ k for k = 1, 2, . . . , K , and E |s j | 2 = ρ j for j = 1, 2, . . . , K , j = k, where ρ k and ρ j represent the transmission power of the k th and j th UEs, respectively. Finally, N u ∈ C N r ×N s is the vector of i.i.d CN 0, σ 2 N u noise at the u th RF chain of the BS [15], [26].
After estimating the massive MIMO channel, the received signal can be combined in the analog and digital domains before any symbol detection and decoding are performed. The processed received symbol vectorŝ k from the k th UE is given in (3), on top of the next page, where w BB k ∈ C N r ×1 and W RF u ∈ C N r ×N r are the baseband and RF combining matrices, respectively [15].
The signal-to-interference-plus-noise-ratio (SINR) for the k th UE, measured at the u th RF chain is expressed as where we assume equal power allocation for each stream [11], [25]. Assuming Gaussian symbols are transmitted over the massive MIMO channel, the achievable sum spectral efficiency of the system is written as According to [15], [26], the sum spectral efficiency of a partially-connected hybrid array depends on the correlation of the channels between the different subarrays, the accuracy of the mmWave channel estimation, and the SINR of the received signal.
Since we considered a partially-connected array, we assumed the availability of only quantized angles for the RF phase shifters where the analog precoding/combining vectors, F RF ∈ F RF and W RF ∈ W RF respectively, can take only preestablished values from finite codebooks. In this work, we considered beam steering codebooks, where the response vector can be parameterized in azimuth and elevation angles. With this, F RF is the codebook of the feasible RF precoding vectors, whereas W RF denotes the RF combining vectors codebook. Both codebooks are similarly defined and can be jointly optimized; however, to make the optimization problem tractable, we configured a predefined set of finite RF codebooks to run the simulations [29].

B. CHANNEL MODEL
Given the multi-path nature of mmWave systems, the channel is characterized by high path loss and high spatial correlation. So, there are spatial directions statistically more likely to contain signal components than others, which lead to limited spatial selectivity or scattering [2].
Using the clustered channel model and considering a uniform planar array (UPA), the narrowband channel matrix H k from the k th UE, assuming the superposition of N r physical propagation paths, can be written as in (6), on top of next page, where α k,l is the complex gain of the l th ray. Statistical MIMO channel models describe the l th paths as arriving in clusters where, for each cluster, the model derives the angle of departure (the angle between the transmitter and the scattering cluster), and the angle of arrival (the angle between the receiver and the scattering cluster). Thus, each path cluster l, l = 1, . . . , L, is described by the angle of arrival pair (θ r k,l , φ r k,l ) and the angle of departure pair (θ t k,l , φ t k,l ). The functions r (θ r k,l , φ r k,l ) and t (θ t k,l , φ t k,l ) denote the transmit and receive antenna element gains at the corresponding angles of arrival and departure. Finally, the array steering vectors, a r (θ r k,l , φ r k,l ) and a t (θ t k,l , φ t k,l ) denote the array phase profiles as a function of angular directions of arriving or departing plane waves, respectively. The azimuth angles of arrival and departure, φ r k,l ∈ φ r min , θ r max and φ t k,l ∈ φ t min , φ t max respectively, are randomly distributed and are defined over an azimuth sector, with a minimum and maximum angle of arrival and departure. Similarly, the elevation angles of arrival and departure, θ r k,l ∈ θ r min , θ r max and θ t k,l ∈ θ t min , θ t max , are randomly distributed and are defined with a minimum and maximum angle of arrival and departure to cover an elevation sector [2], [15], [30].
According to [15], the steering vector response, for the N relement UPA subarray in the yz-plane, with Y and Z antenna elements in the y and z axes respectively, is a function of the azimuth angle φ and the elevation angle θ and is given by where κ = 2π/λ, λ denotes the wavelength of operation, and d is the array elements separation distance. 0 ≤ y ≤ Y corresponds to the y position, and 0 ≤ z ≤ Z corresponds to the z position of an antenna element, resulting in a subarray size of N r = YZ . This steering vector response enables 3D beamforming, which is of interest in this work to simulate practical systems since many materials can easily block mmWave signals, so the scattering clusters can appear and disappear. Thus, it is essential to perform beam sweeping in both the azimuth and elevation angles to establish a beamforming communication between the UEs and the BS.

III. STRATEGIES DESIGNED TO IMPROVE THE SPECTRAL EFFICIENCY FOR MULTI-USER mmWave SYSTEMS
In this section, we describe the techniques developed to achieve more spectral efficiency with multi-user mmWave systems.

A. NEW PILOT MAPPING
Although it is well-known that inter-user interference can be avoided with the use of orthogonal pilots, this affirmation may not hold for mmWave systems. By considering the high path loss of the mmWave band and its effects on pilot orthogonality, we provide the following remark. Remark 1: Pilot orthogonality among UEs holds if the received pilots meet the minimum signal-to-noise ratio (SNR) requirements.
For NR systems, the modulation scheme used for the demodulation reference signals (DMRS), commonly known as pilots, is the quadrature phase-shift keying (QPSK), which requires a minimum SNR of 10 dB. However, in the mmWave band, the SNR required for channel estimation is small due to high path loss [26], [31]. So, to tackle the orthogonality disadvantage, we propose to use a new pilot mapping capable of mitigating the inter-user interference effects. Fig. 2 shows the new pilot strategy based on the DMRS Type 1 mapping, described in the 3GPP release 15 specifications [24]. Given that Type 1 double-symbol DMRS can provide up to eight orthogonal signals, we modify this structure to perform a new pilot mapping to suppress the inter-user interference by changing the arrangement of the pilots.
The plot on the left-hand side of Fig. 2 corresponds to the NR physical resource block (PRB), which consists of the data to be transmitted scheduled in a group of 12 consecutive subcarriers in the frequency domain. In the time domain, the transmission is organized in slots, each consisting of 14 adjacent orthogonal frequency-division multiplexing (OFDM) symbols. Groups of PRBs are scheduled in the frequency domain, and those groups are assigned to a specific antenna port per UE [32] to be transmitted in a slot of time. The right-hand side figure shows the new pilot mapping designed for simultaneous communication with up to eight UEs. With this design, pilots for the firsts four UEs are mapped in the third OFDM symbol of their respective PRBs, whereas the fourth symbol remains empty. On the other hand, the pilots of the remaining UEs are mapped in the fourth OFDM symbol, whereas the third symbol is empty. Pilots are mapped in odd-or even-subcarriers to avoid inter-user interference, e.g., ports 1000 and 1001 use oddnumbered subcarriers in the frequency domain. In contrast, ports 1002 and 1003 use even-numbered subcarriers, and this design is repeated for the rest of the antenna ports.
Given that the PRBs are scheduled in the frequency domain, we can reduce the inter-user interference even more by sending pilots only in odd-and even-numbered PRBs. With the proposed pilot mapping, the UE in port 1000, schedules the pilot transmission only in even-numbered PRBs, whereas the UE in port 1001 schedules the pilot transmission only in odd-numbered PRBs. The same strategy is repeated with the remained UEs. Finally, with the new pilot mapping, there is a need to perform a frequency domain interpolation given that channel estimation must be completed for the entire bandwidth. Thus, fast Fourier transform (FFT) interpolation between subcarriers should be used to prevent CSI quality loss [33].

B. HYBRID RECEIVER DESIGN
Practical mmWave systems commonly use the linear MMSE receiver, a low-complexity implementation used to perform MIMO combining/precoding. This receiver is capable of combining the received signal of the intended UE, partially rejecting inter-user interference provoked by the simultaneous transmission of data streams from K terminals to the BS.
By defining the effective analog channel of the k th UE as the baseband MMSE combiner w mmse k ∈ C N r ×1 is given by whereĥ eff k,u ∈ C N r ×1 is the estimated channel from the k th UE, performed at the u th RF chain of the BS. The MMSE combining method only requires the CSI of the intended UE, so the CSI of the remaining multiplexed UEs does not contribute to the performance of this receiver [24], [34]. The combining technique capable of suppressing the inter-user VOLUME 8, 2020 interference is the multi-user RZFBF receiver, which is a close approximation of the optimal beamformer [16]. The problem of the optimal combiner w opt j ∈ C N r ×1 is stated as We define C = ĥ eff 1,1 , . . . ,ĥ eff k,u , . . . ,ĥ eff K ,U as the combined channel matrix, C ∈ C N r ×K . Additionally, by defining an identity matrix E = I K ∈ N K ×K , we denote the j th column-vector of E, e * ,j = [1, 0, . . . 0] T ∈ N K ×1 for j = 1, as the j th zero-interference vector [35]. Then, the RZFBF receiver becomes a practical solution to (10), which is written as However, the RZFBF combiner is constrained to the accuracy of the CSI of all the multiplexed UEs. This constraint is more considerable for mmWave systems, where the high path loss levels make difficult the channel estimation task. The RZFBF technique shows an optimal MIMO combination with perfect CSI of all the UEs, which is impractical for real systems. Still, the MMSE receiver is capable of mitigating the coherent interference with an accurate estimation of the intended UE. Thus, we propose the use of a hybrid receiver, which is the combination of the MMSE and RZFBF techniques [17].
Based on the received pilots from the different UEs, the hybrid receiver chooses between the MMSE or the RZFBF methods based on the Euclidean distance of the received combined pilots to their perfect constellation points, as is shown in Fig. 3. The combiner that has the shortest Euclidean distance is used to perform the combining/precoding operations [17]. Usually, it is not necessary to perform MIMO combination to the received pilots, since the BS already knows these reference signals. However, by applying MIMO combination to the received pilots, we can measure the accuracy of the CSI. The more accurate the CSI is, the smaller the Euclidean distance is between the combined pilots to the pilots known by the BS.
The hybrid combiner is stated as where x k is the 1 × N p pilot vector a-priori known by the BS, which represents the perfect constellation points, and N p denotes the number of pilot symbols, subject to N p ≤ N s . In this way, with the hybrid receiver, the combining technique that has the shortest Euclidean distance of the combined pilots to their perfect constellation points is chosen. However, what makes a combining method obtain a better performance than other relies on the accuracy of the CSI of the multiple multiplexed UEs, which is challenging in high path loss scenarios [1], [28]. For this reason, we propose to reduce the path loss along with the multi-user interference with a beamsteering improvement technique, which is described in the next subsection.

C. BEAM-STEERING IMPROVEMENT
We propose to perform a BS beam refinement during the UL communication, after the synchronization signal burst (SSB) beam sweeping process finishes, as is illustrated in Fig. 4. With semi-static TDD, the SSB beam sweeping process (P-1) is performed in 5 ms; during this time, the BS tracks the positions of the multiple UEs within a localized burst set, where up to 64 SSBs can be transmitted in different beams with pre-set azimuth and elevation angles. The UE identifies the best SSB and notifies this information to the BS to establish a fixed beamforming communication. Usually, this process is repeated every 20 ms; this implies that a beam keeps the same azimuth and elevation angles during the remaining 15 ms of UL and downlink (DL) communications [36].
With the BS beam refinement, our objective is to analyze the performance of the hybrid receiver described in Section III-B with the spatial sparse precoding solution introduced in [36], which consists of maximizing the SINR in (4) with new analog and digital combiners, W new RF and w new BB , respectively. The combining optimization problem is stated as where W k RF is a set of RF combining vectors limited to change the beam direction for the k th UE. Because the received signal is linearly combined in the analog domain before any other process, the W RF precoder can be adjusted to a narrower steering azimuth angle with a new RF precoding matrix W new RF . In this way, the beam-steering vector changes the direction of the pre-set beam, which can reach more scattering clusters or stronger multi-paths of the received signal [15], [24]. This procedure can be performed in the UL after the beam sweeping process finishes. However, to keep this process tractable, we propose to change the beam direction established in the initial BS beam acquisition only in the azimuth angle, between ± 2 • and ± 4 • . For example, an established beam with a determined elevation angle θ and an azimuth angle of 15 • can refine the azimuth direction in angles of 11 • , 13 • , 17 • and 19 • . In this way, during the UL communication, the BS adjusts the azimuth direction with the new combining matrices saved in the codebook W k RF and fixes the combiner that shows the highest SINR. With this, a new RF precoder F new RF can be established to perform DL communication.
In Fig. 4, after the SSB beam sweeping finishes, the BS sets the fifth beam to communicate with the UE, but the latter is slightly positioned to the left-hand side of the beamforming. With the beam-steering improvement, the azimuth direction of the fifth beam changes to the course of the first of the four new beams achieved with the new RF precoding matrices. In this way, the path of the new beam is more directed to the UE. However, with a slight change in the azimuth direction, the spectral efficiency of a single UE slightly changes. Still, by repeating this process for each one of the multiplexed UEs in the cell, the sum spectral efficiency can increase significantly. With the proposed beamforming refinement, the signal power from the desired UE increases, and at the same time, the interference to non-intended UEs reduces [16].

IV. RESULTS AND DISCUSSIONS
In this section, we present simulation results to show the performance achieved with the new pilot mapping plus the hybrid precoding algorithm described in Section III. We provide numerical results under imperfect CSI and a single cell scenario. The simulation parameters are given in Table 1, based on the NR specifications developed by the 3GPP Rel-15 standard [32], and the settings of the phased array antenna package for 28 GHz 5G radio access described in [23], [24], [37].
With the parameters of Table 1, we simulated the partiallyconnected hybrid array illustrated in Fig. 1. To simulate the antenna array, we used the 3GPP-mmWave channel model. The channel between the transmitter and receiver was simulated with the quasi-deterministic radio channel generator (QuaDRiGa), using the mmMAGIC Indoor model [38]. In Fig. 5, we present the simulated antenna array, which consists of 2 × 2 subarrays of antennas referred to as panels.
In Fig. 5, a panel is a uniform planar array (UPA) with 8×8 dual-polarized antennas. Each panel has two RF chains and VOLUME 8, 2020 FIGURE 4. After the SSB beam sweeping process (P-1) finishes, the beam-steering improvement process (P-2) is performed. With a code-book of W new RF combining matrices, the direction of the initial beam between the BS and the UE is changed in four azimuth angles of ± 2 • and ± 4 • , respectively. The beam-steering angle with the highest SINR, measured in the UL, is chosen and fixed to be used for the DL communication. can generate one beam per polarization using analog RF precoding [24]. Thus, we end up modeling eight mmWave channels, one per beam/polarization, that present high correlation, given that practical MIMO channels are spatially correlated due to the propagation environment and the antennas and panels geometry, separation, and polarization [34]. Fig. 6 plots the beamforming pattern generated by a transmitter with a 64-element subarray for a channel realization between the BS and the intended UE inside the cell.
In Fig. 6, we simulate a single-cell scenario with eight UEs (Rx-positions), each randomly distributed to the eastside of the BS (Tx-position), which is located at position 0,0

(x,y coordinates in [m]
). The BS equipped with four dualpolarized panels starts to synchronize with each UE using the SSB beam sweeping process [36]. With each panel followed by an RF precoder F RF , the BS sends the SS burst signal in a determined angular section of the cell (± 15 • in elevation and ± 60 • in azimuth). After the SSB transmission finishes, in case of line-of-sight (LoS), the BS establishes a beam pattern aimed at every UE in the cell. For non-line-of-sight (NLoS) scenarios, the beam pattern aims at the strongest multi-path between the intended UE and the BS.
For fairness, we set each UE with 0 dBm of transmission power, and the path-loss between the BS and each UE is different for every channel realization. However, for our simulations, we varied the transmission power of the different UEs in the cell concerning the beamforming gain achieved during the SSB process trough a multivariate normal distribution. Thus, the signal power presents a distribution ρ k ∈ CN (ρ, ), where ρ denotes the mean value of the UE transmission power ρ k plus the peak beam gain, and is the covariance matrix, which corresponds to a variance of 0,25 to 4 dB of the mean value of the distribution. This way, the power of the received signal can change according to the channel gain, as well as with the beamforming gain.
According to Table 1, the peak beam gain is 24 dBi, and the maximum transmission power for a UE is limited to 23 dBm [39], so we choose to vary the transmission power from 0 to 40 dBm, which corresponds to the UE transmission power plus the beam gain, to run our simulations. The noise power at the BS was set based on the 3GPP NR specifications for the BS reference sensitivity level. The configured noise power σ 2 N u is given by σ 2 N u [dBm] = −174dBm + 10 log 10 B + N F + I M , (14) where B is the maximum transmission bandwidth, which is 400 MHz, N F is the BS noise figure, equal to 5.7 dB for mmWave radios, and I M is the implementation margin equal to 2 dB. Thus, the resulting noise power at the BS was -106 dBm [40].
Once established the beam pattern, the UL communication can start with the new pilot mapping described in Section III-A. At the BS, channel estimation is performed; with this, in Fig. 7, we compared the performance of the estimation performed with normal pilot mapping (state-of-the-art) to the estimation performed with the new pilot design through the mean squared error (MSE) metric.
As shown in Fig. 7, the accuracy of CSI was improved by employing the new pilot design given that the MSE achieved with this method is smaller than the MSE met with normal pilot mapping. With both methods, the MSE reduces when the power of the received signal increases; however, the difference between the two results remains the same. Now we present the simulation results to demonstrate the performance of the proposed solutions, comparing the spectral efficiency achieved with normal pilot mapping to the new pilot design.

A. SINGLE-USER SPECTRAL EFFICIENCY
For every UE in Fig. 6, the achievable spectral efficiency is constrained to the path loss and the beamforming gain that directly influence the power of the received signal ρ. With this assumption, Fig. 8 shows the single-user spectral efficiency achieved when the path loss is high, where we compared the proposed new pilot design to normal pilot mapping using the MMSE and the multi-user RZFBF receivers. Fig. 8 demonstrates that with the proposed pilot strategy, both receivers show higher spectral efficiency. When the power of the received signal ρ is 20 dBm, the spectral efficiency of the MMSE technique passes from 2.76 to 12.83 bits/s/Hz, and from 2.84 to 12.46 bits/s/Hz with The figure also shows that the MMSE method achieves higher efficiency than the RZFBF receiver with and without the new pilot mapping, especially when the power of the received signal is high (ρ > 20 dBm).

FIGURE 9.
The figure compares the single-user spectral efficiency achieved with the MMSE and RZFBF receivers when a new pilot mapping is used, and the path loss of the received signal is small. In this case, the efficiency achieved with the new pilot strategy is almost the same as with the normal pilot mapping. Also, the RZFBF receiver shows a higher performance gain than the MMSE method.
RZFBF. The figure also shows that the MMSE receiver presents a higher performance than the RZFBF method. Both methods are limited by the CSI accuracy, especially the RZFBF receiver due to the need for accurate CSI of every UE in the cell; this is not the case for the MMSE receiver, which only requires the CSI of the intended UE. Now, we present the results for low path loss in Fig. 9, which shows that the performance of the new pilot mapping is negligible when the received signal power is strong. In this case, the efficiency gain obtained with the RZFBF receiver is higher than that of the MMSE method.
For the majority of the UEs, the MMSE receiver shows better performance than that of the RZFBF method. Still, the latter shows better performance when the CSI of the multiple UEs is accurate. We can see that the performance of the RZFBF method is constrained to the quality of the CSI of the multiple UEs. In contrast, the MMSE receiver only requires the CSI of the intended UE, not the CSI of the other UEs inside the cell.

B. MULTI-USER SUM SPECTRAL EFFICIENCY
In this section, we present the sum spectral efficiency of the multi-user mmWave system. Fig. 10 shows the spectral efficiency achieved with the MMSE and RZFBF receivers plus the efficiency achieved with the hybrid receiver described in Section III-B. As in the case of single-user results, we compared the performance of the new pilot strategy to normal pilot mapping performance. Fig. 10 demonstrates that the MMSE receiver shows higher efficiency than the RZFBF and hybrid methods in the high signal power regime. In contrast, when the power of the received signal is weak, all the receivers show similar results. At high signal power, the efficiency of the MMSE receiver is around 15 bit/s/Hz higher than that of the RZFBF method, whereas the efficiency of the proposed hybrid receiver approaches that of the MMSE receiver, but remains smaller.
As in the case of single-user spectral efficiency, the new pilot strategy presents a higher performance than the normal pilot mapping when the signal power is small. With mmWave communications, we can assume weak signal power at the receiver. Thus, with the sum spectral efficiency, we demonstrated that the proposed pilot mapping could help to achieve higher spectral efficiency without the need for other techniques. Now, we present the spectral efficiency achieved with the beam refinement strategy described in Section III-C, with and without applying the new pilot mapping. Fig. 11 shows the results of the MMSE and hybrid receivers, where we compared the performance of the RF combining matrix change. Fig. 11 indicates that the MMSE receiver shows higher spectral efficiency with the new pilot and beam steering strategies in the low signal power regime. However, this is not the case for strong signal power since the efficiency of the former converges to the case of normal pilot mapping even with the beam refinement. On the other hand, the hybrid receiver can adjust better to the new combining matrix W new RF and, in this case, it shows higher spectral efficiency than the MMSE receiver. Although at low signal power, the spectral efficiency is higher with the new pilot and W new RF strategies; whereas, with strong signal power, the efficiency converges to the case of normal pilot mapping for both receivers.
Finally, Fig. 12 shows the Multi-user sum spectral efficiency achieved with fully digital beamforming compared to the proposed scheme, which consists of a new pilot mapping and hybrid receiver design used with a partially-connected hybrid structure. For this comparison, we did not consider the proposed UL beamforming refinement since this technique can be used with both beamforming structures.
As it can be seen in Fig. 12, in the low signal power regime, the proposed new pilot mapping, and hybrid receiver strategies allow to decrease the inter-user interference and achieve higher spectral efficiency than that achieved with fully digital beamforming. These results show that at low signal power, even with a fully digital beamforming design, we can expect high inter-user interference that cannot decrease with the addition of more RF chains at the BS. If the signal received at the mmWave BS is weak, the inter-user interference can be sufficiently high to provoke the decrease of the achievable spectral efficiency since the multi-user pilots cannot meet the minimum SNR necessary for orthogonality, even with fully digital beamforming. When the signal power is between 15 and 25 dB, both beamforming designs present similar results. However, in the high signal power regime (ρ > 25 dB), the digital beamforming spectral efficiency outperforms the proposed solutions with a partiallyconnected hybrid structure.

C. PERFORMANCE AND COMPLEXITY COMPARISON
In terms of performance, firstly, with the new pilot strategy, the MSE of the channel estimation reduces significantly since the difference between normal pilot mapping compared to the new pilot design is around 139 dB. Lower MSE results in a more accurate estimation of the multiple channels of the UEs in the mmWave system. However, lower MSE can also be achieved when the power of the received signal increases, given that with multi-user systems, the UL training is performed with orthogonal pilots (Remark 1). Thus, at high SNR levels, both methods of pilot mapping will produce similar results in terms of spectral efficiency, as it can be seen in Fig. 10, where the efficiency of the new pilot design converges to that of the normal pilot mapping in the high signal power regime with the MMSE, RZFBF, and hybrid receivers.
Secondly, the MIMO receiver performance depends on many factors such as the power of the received signal, noise and interference levels, the scenario, size of the antenna array, and so on. However, the combining and precoding operations depend on the CSI quality since it is required to remove the effects of the channel on the received signal. The performance of the MMSE method is related uniquely to the accuracy of the CSI of the intended UE. In contrast, the performance of the RZFBF method depends on the quality of the CSI of all the multiple UEs simultaneously multiplexed. In the low signal power regime, the performance of the MMSE receiver is similar to that of the RZFBF receiver. However, in the high signal power regime, the efficiency of the RZFBF receiver can be smaller, the same to or higher than that of the MMSE receiver. It all depends on the CSI quality of the multiple channels of the UEs, as it can be seen in Fig. 8 and Fig. 9 for single-user spectral efficiency. If an accurate CSI of the intended UE is available, but the accuracy of the CSI of the rest of the multiple UEs is low, the MMSE receiver will perform better than the RZFBF method. On the other hand, if an accurate CSI of the UEs, as well as the intended UE, is available, the RZFBF method will show better results. Thus, we propose to use a hybrid receiver, which chooses between the two combining techniques and applies the one that generates more accurate combined pilots. The CSI accuracy is measured through the Euclidean distance between the received pilot symbols and their perfect constellation points, as it was described in Section III-B.
Thirdly, the performance of the beam refinement method, in terms of the multi-user sum spectral efficiency, is on average 40 bits/s/Hz (5 bits/s/Hz per UE) higher using the proposed hybrid receiver and new pilot design compared to the linear MMSE receiver with normal pilot mapping and codebook beamforming without beam refinement. The efficiency of the beam refinement method holds in the low and high signal power regime. This performance increase is given by the fact that this solution can potentially increase the SNR of the received signal while simultaneously reducing the interference to the multiple UEs in the network.
Finally, it is possible to approach the spectral efficiency of fully digital beamforming with a partially-connected beamforming structure when the proposed techniques in this study are employed, as it is illustrated in Fig. 12. However, this efficiency is restricted to the low SNR regime, given that the spectral efficiency of fully digital beamforming is higher than the proposed solutions in the high SNR regime. With fully digital beamforming, we can expect higher beamforming gain proportional to the number of antennas at the mmWave BS rather than the beamforming gain achievable with a partiallyconnected structure since this gain is proportional to the number of antennas of the subarray. Still, the partially-connected structure is of particular interest in this study since it allows us to perform multi-stream beamforming at a considerably lower hardware complexity and cost.  assuming linear interpolation, is the same for normal pilot mapping as for the new pilot design, 2KL p M = 4KL new M . On the other hand, the operations required for the proposed hybrid receiver in (12) are given by the sum of operations needed to perform MMSE and RZFBF combining. However, the RZFBF method requires fewer operations than the MMSE receiver since the matrix inversion task is given by K 3 rather than M 3 [34]. So, the hybrid receiver does not increment significantly the operations required for linear MIMO receivers like MMSE. Last, the proposed UL beam refinement method requires only a matrix multiplication since we propose a codebook-design of W new RF combining matrices limited by four azimuth angles [8]. To conclude this section, it has been shown that the number of operations required to perform the proposed scheme is slightly higher than the state-of-the-art methods, which is useful for practical mmWave systems.

V. CONCLUSION AND FUTURE DIRECTIONS
In this paper, we developed a hybrid receiver and a new pilot design for multi-user mmWave systems. By considering the pilot mapping of realistic NR systems, we developed a lowcomplexity pilot mapping solution that helps in the estimation of the CSI of mmWave channels, avoiding the interuser interference phenomenon. We formulated the problem of multi-user mmWave precoding to a hybrid receiver design in which the received pilots are combined with the MMSE and RZFBF techniques. Based on the shortest Euclidean distance between the received pilots and their perfect constellation points, the algorithm solution chooses between the MMSE and RZFBF receivers to perform MIMO combining and precoding. Given that the multi-user RZFBF approach requires less computational operations than the MMSE combiner, we showed the hybrid algorithm solution could be applied to practical mmWave systems.
We showed that only with a new pilot strategy, the resulting spectral efficiency is higher. However, the performance of the proposed pilot mapping is constrained to high path loss scenarios given that with low path loss, the performance of the new pilot design converges to that of normal pilot mapping.
Numerical results showed that a hybrid receiver could adjust better to a change in the RF combining matrix, given that, with this change at the BS, there is a better chance to estimate the channels of the multiple UEs accurately. Accurate multi-channel estimation makes the RZFBF combining method performs better than the MMSE receiver since the latter depends on single-user CSI. Accurate CSI of the multiple UEs does not contribute to the MMSE technique the way it helps to the RZFBF receiver. Additionally, when the path loss is high, channel estimation quality decreases, and with this, the RZFBF technique cannot perform as good as the MMSE receiver. However, when the path loss is small, the channel estimation quality improves, and if these conditions hold for every UE, then the RZFBF combining will perform better than the MMSE receiver.
Finally, we highlight future research directions concerning multi-user mmWave systems as follows.
• Fully-connected hybrid beamforming: According to [8], [10], the achievable spectral efficiency of a fully-connected hybrid beamforming structure could approach that obtained with fully digital beamforming with the appropriate signal processing design. However, hardware implementation and computational complexity currently are prohibitively high for the practical implementation of this kind of structure. The techniques proposed in this paper can be extended to fullyconnected hybrid structures that could help to achieve the spectral efficiency obtained with fully digital beamforming since the research goal is aimed to reduce computational complexity.
• Non-Orthogonal Multiple Access (NOMA) mmWave communications: The applicability of NOMA to mmWave communications is studied for beyond 5G standards. Specifically, for dense multi-user scenarios, NOMA has been proven to perform close or even better than multi-user MIMO at the cost of higher computational complexity, which has restricted the use of these systems in current NR systems [41]- [43]. Nevertheless, the potential of NOMA should be studied based on practical mmWave systems, like fully-and partiallyconnected hybrid beamforming structures, considering imperfect CSI, channel correlation, beamforming management, multi-cell scenarios, etc.
• mmWave vehicular communications: MmWave systems for high mobility scenarios face hard challenges like frequent hand-off due to small cell sizes and high probability of signal blockage. The potential of the proposed solutions in this paper could demonstrate the potential increase in spectral efficiency for cellular vehicle-toeverything (C-V2X) communications. However, accurate vehicular data for simulations are required, and these studies should rely on the 3GPP release 16 specifications where the C-V2X communications standard is addressed [20], [44].
• Hybrid beamforming and deep neural networks (DNNs): DNNs have been proven to optimize beam management, especially for high mobility wireless communications [45]. Still, a DNN approach should be studied for fullyand partially-connected structures to reduce computational complexity and increase spectral efficiency. The goal is to find more applications of DNNs for 5G and beyond communications.