Hybrid Beamforming and Data Stream Allocation Algorithms for Power Minimization in Multi-User Massive MIMO-OFDM Systems

A hybrid beamforming structure is suggested as one of the solutions to reduce implementation costs and energy consumption in millimeter-wave massive multiple-input, multiple-output (MIMO) systems. In this study, an optimization problem of resource allocation was formulated to minimize the total system transmission power on downlink under a certain quality-of-service (QoS), such as bit/block error rates and data rates for each user, and the solution was proposed therein. Our proposed stream incremental algorithm can dynamically adjust the number of data streams for each user according to the channel state information. Precoding and combining schemes need to be developed to solve the formulated problem and also are proposed in this paper to be paired up with the stream incremental algorithm. Our proposed algorithms consider the practical modulation and coding scheme for transmission in various data stream allocation and beamforming designs. The proposed algorithms provide beamforming solutions for millimeter-wave massive MIMO systems, which achieve comparable performance to that of a fully digital block-diagonalization (BD) algorithm with a lower implementation cost and outperform those of modified existing hybrid beamforming algorithms. Simulation results demonstrate the efficacy of the proposed schemes by allocating the different numbers of data streams for each user according to the channel state information. The simulation results verified that the proposed method can achieve a good trade-off between complexity and performance on comparison with the modified existing schemes and the full digital solutions.

less cellular communication systems [1], [2]. Smaller cells 23 are attractive for operation at millimeter-wave frequencies 24 where the path loss is significantly high. Shorter wavelengths 25 associated with higher frequencies are suitable for massive 26 multiple-input, multiple-output (MIMO) designs because 27 The associate editor coordinating the review of this manuscript and approving it for publication was Bilal Khawaja . of decreased antenna spacing and related electronics size. 28 Therefore, massive MIMO can provide a large beamforming 29 gain to compensate for the high path loss by using millimeter-30 wave frequency carriers [3]. A key challenge is that each 31 antenna in a MIMO system generally requires a dedicated 32 radio frequency (RF) chain [4]. Because of this, a hybrid 33 beamforming structure has been suggested as one of the 34 solutions to reduce the implementation cost and energy con- 35 sumption. In the existing resource allocation related papers, 36 most researches aim to maximize the total throughput subject 37 to specific power constraints. At present, few reports consider 38 the power control problems [5], [6], [7], [8]. However, in the 39 in [32] study the single-carrier scenario and use the inter-96 ference plus noise suppression criteria to design the digital 97 beamformer; the authors in previous reports [37], [38], [39] 98 considered multi-carrier scenarios; these three papers use 99 the criteria of maximizing the signal-to-leakage-plus-noise 100 ratio, weighted sum mean square error minimization, and 101 minimizing mean square error (MMSE) to design the digital 102 beamformer, respectively. These papers mentioned above do 103 not guarantee that all interferences are eliminated to sat-104 isfy the constraints of bit error rates and block error rates 105 (BERs/BLERs). In addition, although researchers considered 106 multi-carrier and multi-user scenarios [40] and [41], their user 107 equipment was set to a single antenna, so their methods are 108 not feasible to use in our scenario. In [5], [6], [7], and [8], the 109 authors considered the power control problems for a narrow-110 band single antenna millimeter-wave massive MIMO system. 111 We investigated multi-carrier and multi-antenna millimeter-112 wave massive MIMO systems where data rates, block error 113 rates, practical modulation, and coding schemes of each user 114 are considered. Based on the above discussion, the methods 115 of these papers could not be applied directly to solve the 116 optimization problem of this study and to be compared with 117 the proposed schemes. To the best of our knowledge, the 118 scenario setting along with the problem formulation has not 119 been investigated so far. 120 In this paper, we investigated a resource allocation problem 121 for multi-carrier (OFDM) and multi-user millimeter-wave 122 systems and use a multi-antenna hybrid architecture at the 123 transceiver to design the beamforming algorithm to pair with 124 the resource allocation solution for the formulated problem. 125 The main contributions are summarized as follows. 126 • Common optimization problems in this field include 127 either capacity problems -maximizing the total through-128 put subject to specific power constraints, or power 129 control problems -minimizing the transmission power 130 under a certain quality-of-service (QoS) for each user. 131 In the millimeter-wave large antenna array systems 132 under consideration, most papers discuss the capac-133 ity problems. Up to now, only a few studies have 134 investigated the above power control problems. How-135 ever, in these few studies, a single-carrier transmission 136 method was used; the number of user antennas was one, 137 and the required received SINR of each user was not con-138 sidered to meet the error rate and data rate constraints. 139 We studied the power control problems for multi-user 140 and multi-carrier and adopted the multi-antenna hybrid 141 architecture at the transceiver. The constraints of the data 142 rates along with block error rates with practical modula-143 tion, and coding schemes of each user were considered 144 to minimize the transmission power. To the best of our 145 knowledge, the resource allocation problem along with 146 the beamforming design with the above scenario has not 147 been considered yet in the literature.

148
• The existing studies assumed that each user transmits the 149 same number of data streams. can be adjusted to reduce computational complexity.

187
Simulation results also verify that the proposed algo-188 rithms can achieve a good trade-off between complexity 189 and performance compared with these existing schemes.

190
The following notations are used throughout this paper. The remainder of this paper is organized as follows.

204
In Section II, we introduce the adopted hybrid beamforming, 205 channel model and the problem formulation. In Section III, 206 the two beamforming algorithms are compared, and we 207 proposed a beamforming algorithm as the main contribu-208 tion to solve the formulated optimization problem. These 209 include the modified phase extraction alternating mini-210 mization (PE-AltMin) hybrid precoding/combing, the multi-211 carrier extension of the fully digital block diagonaliza-212 tion precoding/combing, and the proposed two-stage hybrid 213 precoding/combing. In addition, we also discuss the pro-214 posed stream incremental algorithm based on the MCS 215 index allocation/equivalent bit incremental loading algo-216 rithm. In Section IV, the computational complexity for the 217 proposed hybrid precoding/combining schemes are ana-218 lyzed. Simulation results are presented in Section V. Finally, 219 Section VI presents the conclusion of this paper.

222
A downlink multi-user millimeter-wave MIMO-OFDM sys-223 tem as shown in Fig. 1., was considered. A base station 224 equipped with N t antennas and N tRF RF chains simultane-225 ously transmits N s data streams to serve selected users over 226 K subcarriers using OFDM. And each user equipment is 227 equipped with N r antennas and N rRF RF chains to receive 228 n s,u data streams, where n s,u is the number of received data 229 streams by the u-th user. The transmitted data streams are 230 allocated to all U users, and the number is n s,1 + n s,2 + . . . + 231 n s,U = N s . The existing papers assumed that each user trans-232 mits the same number of data streams n s,1 = n s,2 = · · · = 233 n s,U . However, allocating different numbers of data streams 234 according to the channel state information of each user may 235 further improve the performance. Therefore, we assumed that 236 the number of data streams allocated to each user would be 237 different. N s ≤ N tRF < N t and n s,u ≤ N rRF < N r should be 238 satisfied to hold the multiplexing of data streams.

239
In the hybrid beamforming structure, input data symbols 240 are precoded by a low-dimension digital precoder F BB [k] ∈ 241 C N tRF ×N s , k = 1, · · · , K . After digital precoding F BB [k], 242 signals are transformed from the frequency domain to the 243 time domain by N tRF K -points inverse fast Fourier transforms 244 (IFFTs) and added cyclic prefixes (CP).

245
Next, an analog precoder F RF ∈ C N t ×N tRF was adopted 246 to generate the final transmitted signal. The analog precoder 247 F RF was executed in the time domain and used for the entire 248 bandwidth, while the digital precoder was executed on each 249 subcarrier basis, thus, complicating the resulting broadband 250 hybrid precoding design problem [10].

251
The final transmitted signal for the u-th user on the k-th 252 subcarrier is written as Particularly, the analog precoder F RF and the analog com-276 biner W RFu , u = 1, · · · , U were implemented by phase-277 shifters. We adopted the most common fully connected and 278 single phase-shifter architecture [32] in this paper.

279
The millimeter-wave propagation can be well charac- where α i,l ∼ CN (0, N t N r P l /N ray ) is the complex gain for 290 the i-th propagation path in the l-th cluster; pl u is the path 291 loss for the u-th user; φ r i,l and φ t i,l are the angles of arrival 292 and departure for the i-th propagation path in the l-th cluster, 293 respectively.

294
Further, a r (·) and a t (·) are the antenna array response 295 vectors for the receiver and the transmitter, respectively. 296 We considered the scenario of N cl clusters and N ray scatterers 297 per cluster [44] in which the angles of arrival (departure) 298 are generated by following the Laplacian distribution with 299 random mean cluster angelsφ r cl ∈ [0, 2π ] (φ t cl ∈ [0, 2π ]) and 300 angular spreads of θ as within each cluster. This paper adopted 301 the uniform linear array configuration with N antennas and 302 antenna spacing d. Therefore, the antenna array response 303 vector was expressed as where λ is the signal wavelength and d is usually set to λ 2 .

306
By applying the fast Fourier transform (FFT) to the time 307 domain channel, the frequency domain channel coefficient 308 between the base station and the u-th user on the k-th sub-309 carrier can be expressed as As the motivation described in Section I, the scenario under 314 consideration is multi-user and multi-carrier, and we adopted 315 the multi-antenna hybrid architecture at the transceiver. 316 We considered the data rate, the block error rate, and the 317 practical modulation and coding scheme of each user to min-318 imize the transmission power. Based on our observations and 319 due to the channel hardening, subcarrier allocation would be 320 no longer required [9]. With OFDM transmission, each sub-321 carrier in a massive MIMO system has a property of similar 322 channel gains after beamforming. Therefore, the entire band-323 width would be employed for each user. We assumed that the 324 VOLUME 10, 2022 receiver can perfectly obtain the channel state information.

325
The optimization problem was formulated as Assume f k (r u,n,k ) is the required receive 332 SINR for a particular BLER such as10 −1 .
Then, the required transmission power for the n-th data 335 stream on the k-th subcarrier for the u-th user can be 336 expressed as The beamforming weight vectors for the u-th user on the 340 k-th subcarrier are denoted as

355
The main challenge in this multi-user spectrum sharing The authors in an earlier report [33] developed an alternat-371 ing minimization (AltMin) algorithm mainly in a single-user 372 system. If this scheme were extended to multi-user and multi-373 carrier systems for solving the formulated optimization prob-374 lem along with the constraints in this paper, adaptation would 375 be required for the application. Due to the use of MCS index 376 selection to satisfy BERs/BLERs, it is required to ensure that 377 the interference effect is eliminated. A cascade digital com-378 biner is adopted to achieve this purpose. Therefore, a cascade 379 of additional block-diagonalization (BD) precoders based on 380 the effective channel idea to cancel the residual interference 381 was used [42]. For a multi-carrier system design, we arranged 382 is 383 the N t × KN s combined fully digital precoder, for all subcar-384 riers into a larger matrix and then deal with the optimization 385 problem. After designing the hybrid beamformer through the 386 steps, we defined an effective channel for the u-th user on the 387 k-th subcarrier as where W BBu [k] is the N rRF × n s,u digital combiner for the 390 u-th user on the k-th subcarrier, W RFu is the N r × N rRF 391 analog combiner for the u-th user, H u [k] is the frequency 392 domain channel for the u-th user on the k-th subcarrier, F RF 393 is the N t × N tRF common analog precoder, and F BB [k] is the 394 N tRF × N s combined digital precoder on the k-th subcarrier. 395 After some derivations, the effective channelĤ u [k] is the 396 input of the block-diagonalization (BD) algorithm to obtain 397 F BD [k] and W BDu [k] for the cascade digital combiner; it 398 was thus achieved so that the inter-stream interference was 399 eliminated to solve the optimization problem for the compar-400 ison in this paper. The processing steps are illustrated by the 401 pseudo-code in Algorithm 1, labeled Modified PE-AltMin 402 Algorithm. Although this method has a low complexity, its 403 performance depends on the initial input of the fully digital 404 solutions. The fully digital block-diagonalization (BD) algorithm was 408 proposed earlier [45] for a single-carrier scenario. For a multi-409 carrier system, the following constraint was imposed herein, 410 on each subcarrier to eliminate inter-user interference.
is the frequency domain channel for the i-th user 413 on the k-th subcarrier, and F j [k] is the digital precoder for 414 the j-th user on the k-th subcarrier. The processing scheme is 415 illustrated by the pseudo-code in Algorithm 2 labeled Fully 416 Digital BD Algorithm. This scheme is aimed for the compar-417 ison with the following proposed hybrid beamforming.

419
The above fully digital BD method can achieve good perfor-420 mance in the traditional rich scattering Rayleigh fading chan-421

Algorithm 1
Steps by Using PE-AltMin Algorithm RF with a feasible phase and set i = 0. 2: repeat

Algorithm 2 Steps by Using Fully Digital BD Algorithm
nel and the millimeter-wave channel with limited scattering. adjusting the number of data streams to achieve better perfor-436 mance.

437
In a single-carrier scenario, the analog precoder design and 438 analog combiner design may be viewed as two independent 439 stages [34], [43]. It assumes that the analog precoder is an 440 optimal para-unitary matrix on a per subcarrier basis [37]. 441 Then the analog combiner is designed to have the largest 442 array gain for each user regardless of the inter-user interfer-443 ence. The optimal analog combiner for the u-th user can be 444 expressed as Next, we can partition these two matrices¯ u andŪ u as After the analog combiners are designed, we regard the 468 result of the multiplication of the analog combiner and the 469 frequency domain channel as an equivalent channel. Then the 470 analog precoder F RF = [F RF1 , F RF2 , · · · , F RFU ] is designed 471 to have the largest array gain regardless of the inter-user 472 interference. The derivation process is similar to that of above 473 the analog combiner such that the analog precoder for the 474 u-th user F RFu is the first n s,u columns of the matrixṼ u 475 expressed asṼ u1 . Then, the phases are fetched to satisfy the 476 constant modulus norm constraints. The entire process of the 477 analog beamformer design is shown by the pseudo-code in 478 Algorithm 3-1 denoted as Proposed Hybrid Beamforming 479 Algorithm (Analog).
After solving the problem in the analog beamformer design the u-th user on the k-th subcarrier is expressed as The proposed joint transmit-receive processing scheme 512 is performed onH eq [k] instead of the baseband equivalent 513 channel as the input of Algorithm 2. Some inter-user interfer-514 ence is allowed in the transmit signals; it means interference is 515 not completely canceled in the transmit. The residual interfer-516 ence is dependent on the nulling of the beam patternW u [k]. 517 In other words, it aims to be canceled after the combiner 518 output.

519
Based on the above discussion, the next step is to determine 520 the expression ofW u [k]. A baseband equivalent channel is 521 performed by SVD and expressed as 522 Then, the receiving combiner for the u-th user on the k-th 524 subcarrier may be conjectured as is the first n s,u columns of the matrixǓ u [k], 527 because the beamformer is temporarily assumed by the trans-528 mitter on the best combiner structure, and the beamformer is 529 not necessarily the result of the final combiner.

530
After the above processing, the digital beamformer is fur-531 ther imposed by the following constraint to eliminate the 532 inter-user interference. In the absence of interference, the precoder and the combiner 555 can be designed based on SVD performed on this channel as 556   index allocation algorithm is derived in detail. This consid-585 eration is rarely investigated in hybrid beamforming systems 586 for multi-user multi-carrier scenarios. The issue of the switching levels for modulation and cod-589 ing is addressed as follows. In the performance curves of 590 SNR-BLER for 5G NR systems are presented. In 3GPP 591 38.212 [46], the LDPC code replaces the turbo code used 592 in the 4G LTE data channels, and the polar code replaces 593 the convolutional code used in the 4G LTE control channels. 594 Once the switching level is obtained, the proposed allocation 595 schemes can be applied, and a similar performance trend is 596 observed. The CQI table for the LDPC code in is adopted here 597 as an example of the modulation and coding schemes. This 598 table provides fifteen schemes, and each corresponds to an 599 MCS index. Five out of these fifteen schemes were extracted 600 and employed for simulation, and the related parameters are 601 listed in Table 1. The target BLER is set to 10% by the 3GPP 602 standard. The SNR of this table provides the switching level 603 for each MCS index satisfying 10% BLER.

604
First, the MCS u is denoted as the MCS index allocation 605 table to indicate the selected modulation and coding schemes 606 used by the u-th user on different data streams and subcarriers 607 and is initialized as The proposed novel approach is addressed as follows. 610 An MCS index corresponds to a transmission with equivalent 611 loading bits for performing typical bit loading algorithms. 612 Therefore, the mapping to the spectral efficiency as the 613 equivalent loading bits for the n-th data stream on the k-th 614 subcarrier is expressed as where MCS u (n, k) is the assigned MCS index for the n-th 617 data stream on the k-th subcarrier for the u-th user, and g k (·) 618 is a mapping function for a specific MCS index mapped 619 to equivalent loading bits according to Calculate P u (n, k) , for all n, k.

10:
Obtain spectral efficiency by (27), and calculate the increased spectral efficiency SE(n, k).

11:
Update SINR u (n, k) and R u (n, k) to next MCS index, then update P u (n, k). 12: bits ← bits− SE(n, k).  If more available data streams need to be transmitted at 660 the base station, the incremental algorithm is performed to 661 allocate the additional data streams based on the criterion of 662 minimizing the total transmission power. In other words, the 663 case of maximum reduced power is selected while an addi-664 tional data stream is allocated. This rule is repeated iteratively 665 until all of the available data streams are completely assigned. 666 The received data streams for a user can be demodulated 667 and requires n s,u ≤ N rRF . In running the proposed algorithm, 668 candidate users set C 1 for data stream allocation is formed if 669 satisfying First, one version of the proposed schemes, called the 672 full version, is performed by running the procedure through 673 the block labeled (a) of Fig. 2. Each candidate user tries to 674 allocate an additional data stream, and the required power 675 associated with the beamformer is calculated. The case with 676 the lowest power requirement is selected to allocate accord-677 ingly. This process repeats until N extra = 0.

696
A threshold ϒ is set to remove some users with a lower 697 probability of obtaining multiple data stream assignments.

698
After the screening, the resulting candidate set is expressed where C 2 is the modified set of the candidates. While setting a 702 higher value of ϒ, it will be closer to the performance of the 703 version with the full user set, but the complexity increases.

704
The performance loss becomes significant if the value of ϒ 705 is too small. Therefore, the value ϒ is a trade-off between 706 performance and complexity. k ∈ C 2 , n s,k ← n s,k + 1.
is the (N rRF K ) × N t longitudinal tensor unfolding of the 726 three-dimensional matrix for all equivalent subcarrier chan-727 nels. The multiplication of the two matrices is performed 728 K times, so the complexity required to calculate H (l) effu is 729 O (N rRF N r N t K ). After that, the analog precoder F RFu is 730 obtained by calculating the SVD of H (l) effu and taking the 731 first n s,u columns of the matrixṼ u . Therefore, the required 732 complexity to calculate F RFu is O n s,u KN rRF N t . The above 733 is analyzed per user, and the computation can be performed 734 in parallel. The iteration needs N avg times on average, so the 735 overall complexity of the analog beamformer design is   These simulations were categorized into two parts. First, 833 we considered the scenario of the base station transmitting 834 N s = 5 data streams and simultaneously serving U = 4 users. 835 In the beamforming method, we compared various 836 schemes along with different scenarios in Figs. 3-9. Among 837 these schemes under comparison, one is the fully digital BD 838 beamforming scheme mentioned earlier, which is expected 839 to achieve sub-optimal performance; another is the proposed 840 hybrid beamforming scheme. In the comparison for data 841 stream allocation, three allocation methods were compared, 842 including the random assignment, the stream incremental 843 algorithm (SIA), and the reduced complexity stream incre-844 mental algorithm. As revealed in the simulation result in 845 Fig. 3, the use of the stream incremental algorithm to dynam-846 ically allocate resources can reduce the transmission power 847 of the base station. The performance improvement by allo-848 cating different numbers of data streams for users according 849 to the channel state information is significant. Regardless 850      scheme. This reason would be that the performance of the 876 fully digital BD beamforming scheme is affected by the 877 antenna numbers. When the value of N t is much larger than 878 UN r , the system can achieve better performance. As the value 879 of N t is close to UN r , the overall performance may degrade.

880
In the second part of the simulation, we considered the sce-  users is also verified. In Fig. 6, two schemes use fixed data 891 stream allocation, and the performances are worse than those 892 of the other three schemes. Even though these two schemes 893 use fixed data stream allocation, the transmission power of 894 our proposed hybrid beamforming scheme is about 1.7dB 895 lower than the modified PE-AltMin algorithm. In addition, 896 the proposed hybrid beamforming scheme, even with the 897 reduced complexity version of SIA, outperforms the fully 898 digital BD scheme for ϒ = 5, 10, 15, and 20dB, respectively. 899 Similar to the previous simulation, the required transmis-900 sion power for various cell radiuses were also evaluated. 901 As indicated in the simulation results of Fig. 7, a similar 902 performance trend was observed. A base station with a cell 903 radius of two hundred meters required approximately 10 dB 904 more transmission power than that of the cell radius of one 905 hundred meters on average. Similarly, in Fig. 8, while the 906 antenna numbers increased in the simulation, the required 907 transmission power reduced. These results are consistent with 908 the previous simulation results. 909 Finally, the impact on performance by varying numbers of 910 users was evaluated. An increase in the number of users had a 911 big impact on the performance of the fully digital BD beam-912 forming scheme, as mentioned before; the results increased 913 significantly in the required power. To avoid this issue, the 914 number of the base station antennas was increased to N t = 915 192 while the other parameters remained unchanged. The 916 total data rate of the base station was fixed at 39.2 Mbps, and 917 then the system allocated the resources by varying the number 918 of users. Due to the multi-user diversity and the inter-user 919 interference was appropriately controlled, the power require-920 ment could be reduced. While the amount of interference was 921 beyond the suppression capability of the beamforming mech-922 anism, it led to an increase in the required power significantly. 923 the proposed method decreased by approximately more than 925 1 dB as the number of users increased from four to six. On the 926 contrary, the required power by employing the fully digital 927 BD beamforming scheme could not be improved by increas-928 ing the number factor of users. Thus, the proposed method 929 has better capability in handling the interference effect.

931
In this study, we considered the hybrid beamforming and the 932 data stream allocation algorithm designs for a multi-carrier