Improved Iterative Inverse Matrix Approximation Algorithm for Zero Forcing Precoding in Large Antenna Arrays

Precoding algorithms are used in massive multiple-input multiple-output (mMIMO) communication systems to ensure effective signal transmission. The zero-forcing (ZF) is one of the most common linear precoding algorithms used in MIMO and mMIMO systems. ZF precoding is complex to implement because it requires direct matrix inversion of the gram matrix. Iterative algorithms have been proposed for approximating matrix inversions. However, iterative algorithms require initial conditions and pre-computations to converge to the optimal transmitted signal vector. This paper proposes a new improved iterative algorithm that guarantees convergence under any circumstances without dependency on any optimized initial parameter or condition. The proposed algorithm is based on a three-step iterative and iterative generalized inverse matrix approximation algorithm. The proposed algorithm was verified using a new correlated channel model that included mutual coupling effects and gain and phase variances caused by radio frequency elements at a base station (BS). The computational complexity of the proposed algorithm was then computed. This study analyzes and compares the bit error rate (BER) performance of the proposed algorithm with that of prominent existing algorithms. Moreover, the sum-rate performance of the proposed algorithm was analyzed. Simulations were performed under both correlated and uncorrelated channel conditions, for comparison and analysis. The simulation results demonstrate that the proposed algorithm outperforms the compared algorithms in terms of the convergence and convergence rates.

generation (5G) communication systems are expected to 23 provide an improved communication quality with extended 24 coverage. Massive multiple-input multiple-output (mMIMO) 25 technology is considered a key factor that enables people 26 The associate editor coordinating the review of this manuscript and approving it for publication was Parul Garg.
to use fifth-generation communication systems [1] and 27 thing-to-thing prospective sixth-generation (6G) systems [2]. 28 To simultaneously serve many users at the same time, 29 mMIMO systems consist of a large number of antennas. 30 The densely numbered antenna structure of mMIMO sys-31 tems provides high transmission rate, spectral efficiency, and 32 power efficiency [3], [4], [5]. However, a large number of 33 antennas have drawbacks that must be carefully considered, 34 such as pilot-signal contamination and interference at the 35 base station (BS) and user equipment (UE) [1], [4], [6]. 36 In mMIMO systems, precoding algorithms are used to 37 inversion problem by solving the system equation, and the 94 optimal transmitted signal vector is calculated by applying 95 iterative processes to decomposed matrix elements, such as 96 the Gauss-Seidel (GS) algorithm and its derivative succes-97 sive over-relaxation (SOR) algorithm [7]. Finally, IMRN 98 algorithms focus on the minimization of the resid-99 ual norm application order to bypass the approximate 100 matrix inversion operations and directly find the transmitted 101 signal vector, such as the conjugate residual (CR) algori-102 thm (further improvement of the conjugate gradient 103 algorithm) [17] and generalized minimal residual (GMRES) 104 algorithm [18]. Moreover, iterative algorithms can be 105 obtained by combining two or more algorithms, such as 106 the joint CI and Neumann series (CI-NS) algorithms and 107 the SOR-based approximate matrix inversion (SOR-AMI) 108 algorithm [15]. Among the three types of iterative algorithms, 109 the AMIA is the most inefficient. As the dimensions of the 110 channel matrix increase, the number of polynomial terms 111 or iterations increases to maintain the accuracy. Hence, the 112 computational complexity of the algorithm increases signifi-113 cantly. It should be noted that the implementation complexity 114 and computational complexity are different. Iterative algo-115 rithms have a tradeoff between implementation complexity 116 and convergence, as shown in Fig. 1. However, under inap-117 propriate initial conditions, such as when the channel matrix 118 is nonsymmetric, positive, definitive, and strictly diagonally 119 dominant, many IASLE and IMRN algorithms fail to con-120 verge to a proper solution [9]. The three aforementioned types of algorithms require pre-122 computations for the optimal initial parameters to guar-123 antee convergence. The advantages and disadvantages of 124 iterative algorithms can be found in [7] and with brief 125 descriptions. 126 In general, one algorithm alone is not sufficient to sat-127 isfy the convergence, fast convergence rate, and accuracy 128 requirements. However, most algorithms proposed in the lit-129 erature require initial conditions and an appropriate initial 130 matrix, known as the precondition matrix to guarantee con-131 vergence. Hence, in this study, we propose a new improved 132 method based on the combination of Homeier's cubically 133  134 constraint, [20] resulting in an approach different from Xia's 135 iterative method [21]. Thus, the proposed method converges 136 globally. The proposed method is globally convergent, with-137 out the need for any preconditions, can use any square matrix 138 without symmetry, and the initial matrix can be diagonally 139 dominant. Hence, there is no need for any preconditions, and 140 pre-computation is discarded. The proposed algorithm was 141 divided into two parts. The first part computes the initial 142 approximation, whereas the second part processes the iter-143 ative generalized inverse matrix approach. The two parts are 144 described as follows:

145
-The first part of the algorithm was a three-step iterative 146 method based on Homeier's method. However, in the 147 third step, a secant approach was adopted to keep the 148 polynomial order low to avoid a higher computational 149 complexity [22]. After one iteration, the output of the 150 three-step iterative method was passed as an input to the 151 second iterative method. The second method is based on 152 iteratively approximating the generalized inverse using 153 the KKT conditions [21]. If the KKT conditions hold for 154 a problem, optimality is guaranteed [23]. The advantage 155 of this method is that no initial condition is needed 156 for convergence because the Moore-Penrose rules hold 157 for KKT [24] as in [21]. Thus, the proposed algorithm can be catego-171 rized as AMIA-type.

172
In the ZF precoding algorithm, interference is forced to 173 zero. In this study, ZF precoding was selected because inter-174 ference, rather than additive noise, is the dominant factor 175 when the number of antennas at the BS increases [15]. The 176 bit error rate (BER) and sum rate of the proposed algorithm 177 were evaluated using both the correlated and uncorrelated 178 channel models. One of the main focuses of this study was to 179 evaluate and analyze the proposed algorithm under realistic 180 conditions; thus, a new correlated channel model was con-181 sidered. Referring to the channel model, a mutual coupling 182 channel model [3] was used in this study, including array 183 manifolds, which were modeled as in, [25] and extended to 184 mMIMO. Array manifolds include manufacturing tolerances, 185 active radio frequency (RF) element gain, phase variations, 186 and mutual coupling, which are modeled using the k-nearest 187 neighbor approach, as explained in detail in [25]. By contrast, 188 the Rayleigh fading channel model was used for a simpler 189 evaluation of the performance of the proposed algorithm. 190 In addition, some of the most prominent iterative algorithms, 191 such as the CI, SOR, CR, and GMRES algorithms, were 192 investigated for a performance comparison with the proposed 193 algorithm.  Table 1.

210
Based on these results, the effectiveness of the proposed 211 algorithm was discussed.

212
The remainder of this paper is organized as follows.

213
In Section II, the system model and the ZF precoding 214 matrix are described. In Section III, the investigated itera-215 tive algorithms and the proposed algorithms are described.

216
In Section IV, a computational complexity analysis of the 217 investigated iterative algorithms and proposed algorithms is Section VI concludes this paper.

227
This section describes the system model used in this study. 228 We consider a downlink mMIMO system in which M trans- In this study, we consider a mutual coupling channel model  To obtain a more realistic channel model, array manifolds 267 including a coupling matrix were included in this study. The 268 array manifold channel model H ∈ C K ×M is expressed as 269 follows: where M ∈ C M ×M denotes the BS antenna array manifold, 272 and, which considers the mutual coupling matrix. The array 273 manifold matrix (θ, ϕ) is given [25] by: diagonal matrices with complex elements representing each 278 (θ, ϕ), is the gain of the RF circuit, and the effect of the gain 279 and phase uncertainty sources owing to the active antenna 280 array components and antenna elements, respectively [25], 281 [26]. In addition, g (θ, ϕ) ∈ C M ×1 denotes the amplitude and 282 phase of the m-th element, and a (θ, ϕ) ∈ C M ×1 denotes the 283 ideal steering vector of the array containing the information 284 for each (θ, ϕ) [25]. The mutual coupling matrix C ∈ C M ×M 285 consists of the amplitude and phase coupling coefficients 286 C m,k , k is the k-th neighbor of the antenna element m. 287 The coefficient of mutual coupling C can be calculated as 288 follows:.
λ (x m sin θ cos ϕ+y m sin θ sin ϕ) The received complex baseband signal y ∈ C K ×1 is given where ρ is the normalized average transmit symbol energy by 298 the number of transmit antennas, which denotes the signal-299 to-noise power ratio (SNR); n ∈ C 1×K denotes the additive 300 white Gaussian vector; and x ∈ C M ×1 represents the trans-301 mitted signal vector after precoding and can be expressed as where s ∈ C K ×1 is the symbol vector of the constellation 304 symbols to be transmitted and G ∈ C M ×K is the precoding 305 matrix, which can be expressed as where the scalar β is chosen to satisfy the equation G 2 F = 308 tr GG H = P. P is the total transmitted power. The BER 309 and sum rate were considered as criteria for measuring the 310 performance of the precoding algorithms.

311
After precoding, the sum-rate capacity [5] of the mMIMO 312 system can be calculated as

314
where γ k = ρ K |g kk | 2 , and g kk is the k-th row and k-th 315 column of matrix G.

318
Approximate matrix inversion algorithms are based on num-319 ber series. These algorithms are derived from the higher-order 320 recursions [27] in (12), which can be expressed as: where i is the number of iterations, p is the order of the 324 polynomial series, X i ∈ C K ×K is the estimated inverse matrix at the i-th iteration can be expressed as: iteration, and can be expressed as 337 The convergence of the approximate matrix inversion algo-338 rithms depends on the number of iterations i, the order p 339 and as well as on the preconditioning matrix X 0 . A Better 340 convergence can be achieved as the i and p increase with the 341 cost of increased complexity. However, the initial values of 342 the preconditioning matrix also affect the convergence and 343 complexity. In [16], the optimized initial values of X 0 were 344 calculated for both the NI and CI algorithms, and it was 345 shown that both algorithms had better convergence with the 346 optimized values. where z ∈ C K ×1 is an unknown vector solution. First, the 352 gram matrix W is decomposed into its subcomponents, and 353 then, an iterative process is applied to the decomposed parts 354 of the W. 355

Gauss-Seidel Algorithm and Successive Over-Relaxation 356
Algorithm: The GS and SOR algorithms were similar. The 357 Gram matrix is decomposed into W = D + L + U, where 358 D ∈ C K ×K is a diagonal matrix containing the diagonal 359 elements of matrix W , L ∈ C K ×K contains the lower 360 triangular components, and the U ∈ C K ×K contains the 361 upper triangular component matrix W . Subsequently, the 362 estimated transmitted signal vector z is computed iteratively 363 using the decomposed components. The GS algorithm can be 364 expressed as The SOR algorithm was derived from the GS algorithm 367 with a slight difference. The iteratively calculated estimated 368 transmitted signal vector z using the SOR algorithm is given 369 as where ω denotes the relaxation parameter, which is crucial 372 to the performance of the SOR algorithm. The relaxation 373 parameter must be between 0 < ω < 2 to satisfy the 374 convergence [7]. However, in [15], an optimized value for the 375 relaxation parameter is given as In [15], the authors combined the SOR and approximate 378 matrix inversion (AMI) algorithms and proposed a joint 379 SOR-AMI to increase the convergence of the SOR algorithm. 380 Moreover, the authors combined the CI algorithm with the 381 SOR-AMI and proposed a joint CI-SOR-AMI to increase the 382 convergence rate. In the first step, one iteration is spared for 383 the CI, and then the SOR-AMI is performed. The structure 384 is similar to that of our proposed algorithm; however, the CI-385 SOR-AMI is highly dependent on the initial conditions and 386 relaxation parameters.

387
As mentioned previously, the IMRN algorithms focus on 390 minimizing the residual norm rather than approximating a 391 direct solution [7]. This type of algorithm directly estimates 392 the transmitted signal vector without computing matrix inver-393 sion. In IMRN algorithms, the norm of the residual vector r i 394 is reduced until the desired tolerance is obtained or the direct 395 solution of z is obtained.

397
Residual Algorithm: The CR algorithm is derived from the 398 well-known conjugate gradient (CG) algorithm to achieve 399 better BER performance than the CG algorithm [7]. The 400 CR algorithm can be expressed as basis [28]. The formulation of GMRES can be found in [18].

441
AMIA has fast convergence, fair precision, and easy 442 implementation but at the cost of increased computational 443 complexity. Thus, in the first step, we selected a fifth-444 order (p = 5) three-step iterative algorithm based on (12). 445 However, the computational complexity of the algorithm 446 increased with the number of orders. In [22], a three-step 447 iterative algorithm was proposed based on two-step cubi-448 cally iterative Homeier and secant algorithms. The proposed 449 three-step algorithm for solving any function that equals zero 450 (f (x) = 0) is given as To iteratively approximate the matrix inversion, f (x) = 457 x −1 − W was applied to the above equations. The iterative 458 process can then be expressed as: In the second part of the algorithm, we adopt a novel 462 iterative algorithm to compute the generalized inverse matrix 463 in [21]. It applies the KKT condition to minimize the 464 Frobenius norm and iteratively solves the Moore-Penrose 465 generalized inverse conditions [21] with vector-matrix mul-466 tiplications. The KKT condition was used for the convex 467 optimization of the Frobenius norm. In [21], an accelera-468 tion scheme that replaces vector-matrix multiplications with 469 matrix-matrix multiplications was proposed. Unlike the algo-470 rithms in, the proposed solution does not require an initial 471 precondition matrix, norm condition, or symmetric or con-472 jugate symmetric matrix, as in the Newton and Chebyshev 473 algorithms. It is important to highlight that their algorithm for 474 the acceleration scheme uses a direct inversion matrix com-475 putation. Thus, the inverse matrix approximation problem 476 has not been properly solved. The first part of the proposed 477 algorithm replaces the direct inversion matrix using a robust 478 iterative algorithm.

479
Computation of the initial values of the initial matrix 480 allows for faster computation and easier implementation, as is 481 the case with the proposed algorithm. However, the inverse 482 matrix approximation algorithm must converge under any 483 circumstances, such as uncertainties in the gain and phase as 484 well as mutual coupling between adjacent antenna elements 485 in a real implementation.

486
To adopt an iterative algorithm for computing the general-487 ized inverse matrix, significant algorithms in the literature are 488 VOLUME 10, 2022 based on the Moore-Penrose condition, [24] which denotes 489 that for any matrix A ∈ C a×b , there exists only one matrix 490 P ∈ C b×a that satisfies the following equations: where P is the generalized inverse of matrix A. Subsequently, Subsequently, given matrix R ∈ C a×a , as well as a positive 505 scalar , the iterative algorithm is given as Proof of the global convergence of (32) can be found in [21]. proposed algorithm can be expressed as: where X is the estimated inverse matrix of M by (28).

522
The proposed improved iterative algorithm is described in // second step 6. In our algorithm, the computational complexity in terms 548 of the required number of complex matrix multiplications is 549 calculated as follows: We assume A T A, 6I K , 9I K , 14I K , 16I K 550 and the constant β are known. According to (28), 5K 3 + K 2 551 complex multiplications are required because the equation 552 includes five matrix-matrix multiplications and one matrix-553 scalar multiplication. In the second part of the proposed 554 algorithm, because matrix B is assumed to be known, only 555 2K 3 complex multiplications are required. In total 7K 3 + 556 2K 2 M + K 2 + KM + M complex multiplication was required 557 to calculate the transmitted signal vector x. According to the 558 above analysis, the computational complexities for the first 559 iteration (i = 1) of the NI and TSGIM algorithms and the 560 computational complexity of the ZF precoding with DMI are 561 listed in Table 2. ZF precoding with DMI was chosen as the reference and 563 ZF with the NI algorithm was chosen because it is the simplest 564 algorithm in terms of implementation among the algorithms 565 mentioned in Section III. It is observed that ZF with direct 566 matrix conversion precoding has the lowest computational 567 complexity. However, as previously mentioned, direct inver-568 sion is unfavorable for hardware. Although the proposed 569 algorithm has more computational complexity than the NI 570 algorithm, as described in Section III, it converges without 571 depending on any initial conditions. However, convergence 572 is only possible with appropriate initial conditions in inverse-573 matrix approximation algorithms including the NI algorithm.

594
Simulations based on the system model explained in algorithm was compared to different types of algorithms 603 (AMIA, IASLE, and IMRN). The optimized initial values 604 in [16] were used for the CI algorithm, whereas the optimized 605 relaxation parameter according to (16) was used for the SOR 606 and joint SOR-AMI algorithms. In addition, a noisy random 607 x vector is used as the initial vector, a diagonal matrix that 608 contains the inverse diagonal elements of the resultant matrix 609 H H H is used for the CR and SOR algorithms, and a diagonal 610 matrix that contains the inverse diagonal elements of the 611 resultant matrix H H H . ZF precoding, which contains direct 612 matrix inversion, was included as a benchmark.

613
With respect to the use case, a typical downlink massive 614 MIMO configuration with M × K = 256 × 32 [15], [16] 615 is considered for the BER analysis. For all combinations and 616 analyses, there were L = 8 propagation paths. In this study, 617 QPSK was used as the modulation scheme; however, any 618 modulation technique can be used. The transmitted signal 619 was normalized in the BER and sum-rate analyzes. The SNR, 620 denoted as ρ in Section II, is the ratio of transmitted signal 621 power to received noise power for BER and sum-rate analysis, 622 and can simply be expressed as; where H 2 F Frobenius norm of the channel and σ 2 n is the 625 noise variance.

626
The parameters for the correlated channel model are as 627 follows: the mutual coupling between adjacent antenna ele-628 ments is uniformly distributed between −20 dB and −10 dB, 629 as used in [26] for the first tier of neighboring elements; gain 630 and phase variations are ±5% and manufacturing tolerance is 631 ±10% which affects the inter-antenna element spacing, d. 632 The BER performances of the different algorithms are 633 compared in Fig. 5. The algorithms are simulated using the 634 proposed correlated channel model. The number of iterations 635 for all algorithms was i = 1. ZF precoding with direct 636 matrix inversion was used as the benchmark. According to 637 the results, the BER performances of the CI, SOR, CR, and 638 VOLUME 10, 2022 well. This is because the CI, SOR, and CR are highly depen-641 dent on the initial conditions. In this case, the Gram matrix 642 is not a symmetric or conjugate symmetric matrix; hence, 643 the BER performances of CI and SOR are poor. In addition, 644 because we did not consider multiple restarts, the BER perfor-645 mance of GMRES was insufficient. However, it can be seen 646 that the BER performance of the CR algorithm is fair because 647 the random initial vector is appropriate.   i.e., CSI is incomplete, then the performance of the proposed 678 algorithm decreases. Performance degradation due to the 679 imperfect channel estimation is valid for all algorithms even 680 if they are not shown in Fig. 8. In reality, perfect channel 681 estimation is impossible but near-perfect approximations can 682 be made. Referring to (1), τ = 0 states that perfect channel 683 estimation and τ = 0.3 states that ≈ 95% of the channel is 684 estimated correctly.

685
The Frobenius norm errors of the CI, SOR-AMI, and 686 proposed TSGIM algorithms are compared in Fig. 9 and 10 687 according to the increasing number of BS antennas. In addi-688 tion, in Fig. 11 the number of BS antennas is fixed and 689 the Frobenius norm errors are compared according to the 690 increasing number of iterations for the algorithms. A total 691 of 1,000 Monte Carlo (MC) trials were conducted under 692 correlated channel conditions. It is worth noting that the BER 693 performances of the other algorithms were not considered 694 in this analysis because they skipped the inverse matrix cal-695 culation. In the Frobenius norm error analysis, we consider 696  rithms were optimized. However, Fig. 10 shows that, after 707 five iterations, TSGIM outperformed CI and CI-SOR-AMI.

708
As shown in Fig. 11, the convergence rate of the proposed