An Efficient Modified Gauss Seidel Precoder for Downlink Massive MIMO Systems

Recently, as the demand for tremendous spectral efficiency has increased, the massive multiple-input multiple-output (MIMO) system has attracted attention in the wireless communication system. In massive MIMO, the zero forcing (ZF) precoder provides optimal performance. However, the complexity for process of matrix inversion is burden in terms of practical implementation. Therefore, many researches for approximate inversion of channel matrix have been performed in order to reduce the complexity. The typical linear precoder based on approximate matrix inversion is the Gauss Seidel (GS) precoder. The GS precoder provides the similar precoded signals to ZF precoder with low complexity. However, the GS precoder does not adopt parallel implementation because of inner structure. Consequently, precoder for the GS iterative method spends a lot of times in order to estimate precoded signal. Therefore, this problem makes the GS precoder impractical. In this article, the punctured GS (PGS) is proposed in order to mitigate the problem of parallel operation by modifying inner structure for the GS precoder. However, the performance for the PGS precoder is degraded due to modified inner structure. Therefore, the ordering PGS precoder which performance degradation due to modified inner structure is mitigated is additionally introduced. As a result, although the delay when precoded signal for the PGS precoder is obtained decreases than the GS precoder, the BER performance for the PGS precoder is degraded than the GS precoder. In contrast, the ordering PGS precoder provides improved BER performance with decrease of delay compared with the GS precoder.


I. INTRODUCTION
In the future, since demand for throughput increases to provide various services, wireless communication systems can require high bit error rate (BER) performance [1]- [3]. For this reason, massive multiple-input multiple-output (MIMO) system has attracted attention in the wireless communication system. Although base station (BS) transmits precoded signals with power constraint, massive MIMO system obtains the improved signal to noise ratio (SNR) by equipping huge antenna array in BS. Also, huge antenna array offers predictable statistic characteristic of channels by channel hardening [4]- [11]. Because of these advantages, simple linear The associate editor coordinating the review of this manuscript and approving it for publication was Luyu Zhao . precoders such as matched filter (MF) and zero forcing (ZF) show great improvement of performance [12]. In particular, the performance for ZF precoder is optimal performance in massive MIMO system. However, as the number of user equipments (UEs) and transmit antennas grows, the dimension of channels increases. The increased dimension of channels causes high complexity for precoder. Therefore, the various techniques, such as Neumann Series (NS) method, Gauss Seidel (GS) method, etc, which reduce the complexity were proposed [13].
The NS method provides approximate inversion of channel matrix and lower complexity compared with the ZF precoder. However, in order to obtain output close to exact inversion of channel matrix, the large iteration number must be required, and the complexity for the NS precoder significantly increases as the iteration number increases [14], [15]. On the other hand, the GS precoder offers precoded signals that are relatively similar to the output of the ZF recorder in small iteration number. Also, even though the iteration number grows, the GS method maintains low complexity [16], [17]. However, the GS precoder cannot perform parallel implementation because of inner structure. Although the internal sequential iteration structure of GS provides the enhanced performance, this iteration process does not allow parallel implementation [13], [18]. Therefore, the many modified GS precoders were proposed in [19]- [22]. Though these methods provide improved performance than conventional GS precoder, these schemes cannot mitigate the problem of parallel implementation.
The GS precoder has inner structure that previous estimated symbols are set as input in order to estimate next symbol. Therefore, after all preceding symbols are estimated, the last symbol can be obtained. Consequently, the problem for sequential process causes delay to estimate the precoded signal. To solve this problem, the author in [18] proposed the novel GS precoder. However, the performance for novel GS precoder is significantly lower than the GS precoder. Thus, the new scheme that mitigates problem for parallel operation is required with performance similar to GS.
This article proposes punctured GS (PGS) precoder which mitigates problem of parallel operation. In a different way from GS, the proposed scheme divides the gram matrix of channel. This modified structure mitigates chief problem for GS. However, because the relationship between parallel operation and accurate precoded signals has trade-off relation, the performance for proposed scheme is degraded than GS. Therefore, in order to reduce degradation of performance, the ordering PGS method which the gram matrix of channel is rearranged according to the correlation between channels of UEs is additionally proposed.
The remainder of this article is organized as follows. Section II addresses a downlink massive MIMO system model. Section III explains conventional precoding schemes. In Section IV, the PGS precoder and the ordering PGS precoder are proposed. The simulation results for BER performance and the comparison for complexity and the number of required calculations are given, in Section V. Finally, Section VI concludes this article.
In this article, the upper-case and lower-case bold-face letters are defined as matrices (e.g. H, Z, L, B) and vectors (e.g. x,x,x, y, w), respectively. Also, the scalars are represented by non-bold letters (e.g. N T , K , P T , i, c). Fig. 1 shows the downlink massive MIMO system model considering in this article. The BS employs N T transmit antennas and the K UEs are serviced from BS, where K N T . Also, each UE has only one receive antenna. The BS transmits K data streams via complex Rayleigh flat fading channel to separate K UEs. The K data streams are precoded according to the channel state information (CSI) by passing precoder. In this system model, the precoded signal vectorx ∈ C N T ×1 is given by,x

II. DOWNLINK MASSIVE MIMO SYSTEM MODEL
where F ∈ C N T ×K is preoding matrix and x ∈ C K ×1 is data signal vector with uniform distribution (E xx H = I K ). Also, the precoded signal vectorx is limited by trasmit power P T (E x 2 = P T ) and is transmitted from BS to UEs. The received signal vector y ∈ C K ×1 is represented as follows, where H ∈ C K ×N T and w ∈ C K ×1 are complex Rayleigh flat fading channel matrix and additive white Gaussian noise (AWGN) vector with independent and identically distributed (i.i.d) complex components with zero mean and unit variance.

III. CONVENTIONAL PRECODING SCHEME
A. CONVENTIONAL ZERO FORCING PRECODER In massive MIMO system, the ZF precoder provides optimal performance by completely eliminating inter user interference (IUI). In order to completely remove IUI, the pseudo inverse matrix of channel is used as precoding matrix for ZF precoder and represented as follows, where is gram matrix HH H and diagonally dominant matrix when the number of transmit antennas is very larger than the number of UEs [7]. γ ZF is scaling factor for ZF and given by, The gram matrix Z has Wishart distribution, so tr Z −1 can be approximated toward K N T −K . Therefore, γ ZF is represented as follows, Consequently, the precoded signal vectorx ZF for ZF is represented as follows, Technically, the minimum mean square error (MMSE) precoder provides optimal performance in massive MIMO. On the other hand, the performance for the ZF precoder is close to optimal performance and the performance for ZF and MMSE precoders becomes same when the transmit antennas per UE is large. However, because the MMSE precoder requires additional process that received SNR is measured, the MMES precoder has more complexity than the ZF precoder. Therefore, in massive MIMO, the ZF precoder is regarded as optimal precoder [6]. However, the computational complexity for the ZF precoder increases exponentially as the number of UEs which are serviced by BS grows. Thus, the many schemes which obtain approximate inversion matrix of channel with low complexity are proposed.

B. GAUSS SEIDEL PRECODER
The GS precoder was proposed in order to mitigate problem of complexity for ZF precoder. The GS precoder obtains precoded signal which is approximated to (6) by iterative calculation. The gram matrix Z can be divided into three parts as follows, where L, Z diag and L H are strictly lower triangular matrix, diagonal matrix and strictly upper triangular matrix, respectively. The GS precoder obtains precoded signal as follows,x where γ GS is scaling factor for the GS precoder and equal to γ ZF [16].x GS is final solution vector that is estimated by GS iterative way and satisfies criterion as follows, Also, the (9) can be modified as follows, Then, the iterative GS method is given by [16], [23], where i is the number of iterations andx GS satisfiesx GS = lim i→∞x (i+1) GS . The GS precoder provides improved performance than other methods for approximate matrix inversion by iterative calculation (11). This advantage is given by the structure of GS methods. In more detail, the (11) can be represented as follows,x and (12) can be expressed in form of elements for matrix as follows, j are the j-th component for estimated precoded signal of present step and previous step, respectively. According to (13), the first to (j−1)-th elements for estimated precoded signal of present step are required in order to obtain x (i+1) j . In this article, the previous estimated symbols that are used to obtain following symbol in the same (i+1)-th step are called as the feedback symbols. Also, the feedback symbols are multiplicated by strictly lower triangular matrix L. Therefore, for the sake of argument, L is called as the feedback matrix.
The structure for feedback matrix of the GS precoder provides improved performance. However, since the previous symbols must be obtained in advance to estimate the following symbol, this structure causes disadvantage that parallel implementation is difficult. Therefore, the new scheme that shortcoming for GS precoder is mitigated is required.

IV. PUNCTURED GAUSS SEIDEL SCHEME
Since the PGS precoder mitigates problem of parallel operation, the PGS precoder reconstructs the gram matrix into N ×N (N ≤ K ) separate blocks. This modified structure provides high efficiency for parallel operation by simultaneously calculating N symbols.
The GS precoder divides Z into three matrices and strictly lower triangular matrix L is used as feedback matrix. In contrast, the PGS precoder considers that Z is separated into N × N square matrices where N is the factor of K . Also, N represents the number of simultaneously operating precoders and the PGS precoder is equal to the GS precoder when N = 1. In more detail, the divided gram matrix that is consisted of N × N square matrices is represented as follows, where B i,j ∈ C K N × K N is the i-th row and the j-th column square matrix which consists of components for Z. Also, the B i,j can be separated into three parts as follows, where are strictly lower triangular matrix, diagonal matrix and strictly upper triangular matrix, respectively.
After the Z is divided into small blocks, the proposed scheme separates Z into three parts as follows, where Z fb consists of N 2 strictly lower triangular matrices. Z un -fb consists of N strictly upper triangular matri- . In more detail, the gram matrix can be separated into multiple block matrices as follows, where 0 is matrix with zero value. The number of components for feedback matrix is reduced as N increases. Therefore, the performance for the PGS precoder is degraded when the number of simultaneously operating precoders is large. However, the required time is reduced in order to calculate the precoded signal as the gram matrix is divided into smaller blocks. In other words, the relationship between parallel implementation and accurate precoded signals has trade-off relation.
The PGS precoder obtains precoded signal as follows, where γ PGS is scaling factor for the PGS precoder and equal to γ ZF , according to the subsection IV-B. The final solution vec- for the PGS precoder satisfies criterion as follows, The (19) can be rewritten as follows, Like (12), iterative way for the PGS precoder is given by, In order to obtain improved performance, the initial solution vectorx (0) PGS can be adopted as various methods. However, to compare only the performance for each precoder, it is assumed that initial solution vector for all schemes is vector with zero value in this article.
Also, (21) is modified as follows, In more detail, the (22) can be represented in form of block matrix B as follows, (23), as shown at the bottom of the next page, where x u ,x When the number of UEs is prime number, the PGS precoder does not service some UEs in order to change prime number to composite number. The PGS precoder can know the UE that has most non diagonally dominant channel by calculating gram matrix and omits the UE which has poor channel condition. In this way, the PGS precoder can divide gram matrix into N × N blocks when the number of UEs is prime number.
However, because the PGS precoder uses less the number of feedback symbols than the GS precoder by separating the gram matrix into small square matrices, the PGS precoder obtains degraded performance than the GS precoder. For compensation of degradation, the ordering PGS precoder is additionally proposed.

A. ORDERING PGS PRECODER
In this subsection, the ordering PGS precoder is proposed. According to channel condition, the ordering PGS precoder changes the order of column and row for gram matrix Z. In massive MIMO system, the large diagonal elements and small non-diagonal elements for Z represent that the channel condition is favorable. In other words, the UE that has diagonal dominant channel enjoys favorable channel condition. Also, accuracy for the estimated symbols is increased as many feedback symbols are used. Therefore, the ordering VOLUME 8, 2020 PGS precoder arranges that UE with unfavorable channel condition is located in the bottom of block matrix B to use more feedback symbols and UE with favorable channel condition is located in the top of block matrix B. In order to rearrange Z, the ordering PGS precoder calculates c = sum row Z diag − abs element (Z fd + Z un -fd ) , where c is factor vector for channel condition and sum row (·) is sum operator for each row. Also, abs element (·) is operator which each element for matrix has absolute value. In other words, c is obtained as follows, where c j is factor that represents channel condition of the j-th UE. Then, the ordering PGS precoder sorts these factors in specific order and rearranges the gram matrix Z into the sorted gram matrix Z order according to c. In more detail, the Fig. 2 representes the way for specific order. TheB i ∈ C K N ×N is row matrix that consists of i-th row block matrices (B i = B i,1 · · · B i,N ). Firstly, N rows with the smallest N components of c are extracted and these rows are located at the bottom of each row block. Then, N rows with the next smallest N components of c are located at above space where the previous N rows are placed. In this way, the all rows are rearranged. Also, since the gram matrix is symmetric matrix, the columns for rearranged gram marix are sorted in same specific order.
After that, the ordering PGS precoder obtains the precoded signal via Z order in the same iterative way for PGS precoder. The algorithm for the ordering PGS precoder is represented in Algorithm 1. From Algorithm 1, firstly, the gram matrix Z is obtained and diagonal components of Z are extracted on line 4. Then, the factor vector is calculated and index c which is row index for arranged factor vector is gained by sorting c (line 5-6). The sort des (·) is operator that the row indexes for vector are extracted in specific order so that the UE which has unfavorable channel utilizes more feedback symbols.
On line 7-8, Z and x are respectively rearranged into the sorted gram matrix Z order and x order via index c . On line 9, Z order is separated into feedback matrix Z order fd , diagonal matrix Z order diag and unfeedback matrix Z order un-fd , like the PGS precoder. Also, the initial solution vector of ordering PGŜ x order,(0) ordering PGS and iteration factor i are set as zero vector and zero, respectively (line 10). On line 11-14, the ordering PGS precoder estimates the precoded signal by using iterative way of PGS and the lastly estimated signalx order,(I ) order PGS is arranged in original ordering. Then, the ordering PGS precoder provides precoded signalx (I ) order PGS with high accuracy than the PGS precoder.

B. TRANSMIT POWER CONSTRAINT
This subsection shows that the scaling factor for the PGS precoder γ GS has appropriate value in order to radiate precoded signal with suitable power. The Fig. 3 represents the total radiated power for the ZF and PGS precoders when the number of UEs and power constraint factor are fixed as 12 and 1, respectively. In this case, the PGS precoder divides the gram matrix Z into 2 × 2 blocks. From the Fig. 3, the scaling factor for PGS precoder γ GS provides proper total transmit power. Also, the total radiated power for PGS precoder approaches the total radiated power for ZF precoder as the number of iterations or transmit antennas per UE increases.

C. CONVERGENCE RATE
The PGS precoder can achieve improved convergence rate by reducing N or using ordering scheme. From (20) and (21), the error between the final solution vectorx PGS and approximate solution vectorx (i+1) PGS can be obtained as follows, x (i+1) where M PGS (= − Z fd + Z diag −1 Z un -fd ) is the iteration matrix for PGS precoder and the (25) is rewritten as follows, The convergence rate accelerates when Frobenius norm for M PGS is small [24]. The Fig. 4 represents the Frobenius norm for the iterative matrix of the GS, PGS and ordering PGS precoders when the number of UEs is fixed as 16. Since the GS precoder utilizes many feedback symbols, the GS precoder has highest convergence rate compared with the other precoders. As N reduces, the Frobenius norm for the iterative matrix of the PGS and ordering PGS precoders reduces. Because the number of feedback symbols reduces when N is large, the convergence rate for both schemes declines. Also, the convergence rate of the ordering PGS precoder is faster than the PGS precoder when the both precoders divide the gram matrix into the same N × N blocks. The gap between the ordering PGS and PGS precoders decreases as N is large. In other words, when N is small, the ordering PGS precoder obtains more gain for convergence rate than the PGS precoder.

V. SIMULATION RESULTS
This section is composed of four parts. The subsection V-A provides the comparison of complexity for the PGS, ordering PGS and GS precoders. In the subsection V-B, the number of required calculations in order to obtain final calculated symbol is presented. This comparison is useful to verify the efficiency for parallel operation. The subsection V-C shows the performance comparison of BER for the proposed and conventional schemes. In the subsetction V-D, the comprehensive analysis is given.

A. COMPARISON OF COMPLEXITY
This subsection shows comparison for multiplication complexity for the PGS, ordering PGS and GS precoders. It is assumed that the one complex multiplication needs four real multiplications in this subsection.
The step that the gram matrix is divided into multiple small square blocks does not use multiplication operation. Therefore, the complexity for the PGS precoder is only caused by calculation of precoded signal. And the processes that the gram matrix is obtained and the lastly estimated signal is multiplied by H H are equally involved in the PGS, ordering PGS and GS precoders. Thus, complexity for multiplication that is caused by these processes is not considered in this subsection. On the other hand, since the ordering process requires additional multiplication operation in order to obtain the factor vector for channel condition in Algorithm 1-line (5), the complexity for ordering PGS is higher than the complexity for the PGS and GS precoders. In more detail, the total complexity for the ZF, PGS, ordering PGS and GS precoders is represented in Table 1.
According to Table 1, the highest order of total complexity for the ZF precoder is K to the power of three and the highest order of total complexity for other precoders is the square to K . Therefore, the complexity for ZF precoder exponentially increases than approximate matrix inversion methods when the number of UEs is large. The GS and PGS precoders have same complexity for multiplication caused by (13) and (23), respectively. Also, because the ordering process of ordering PGS precoder utilizes absolute values, the ordering PGS precoder requires additional multiplication. Consequentially, total complexity for the PGS precoder is same as total complexity for GS precoder and total complexity for the ordering PGS precoder slightly increases compared to the other precoders.

B. COMPARISON FOR EFFICIENCY OF PARALLEL OPERATION
The number of required calculations in order to obtain final estimated symbol for the ordering PGS, PGS and GS precoders is presented in this subsection. This comparison is useful in order to identify efficiency for the parallel implementation of each precoder. It is assumed that the one calculation can obtain multiple estimated symbols. Also, since the feedback symbols are used to estimate another symbol, the feedback symbols must be estimated previously. In other words, the symbol can be calculated in present step when all feedback symbols used to estimate present symbol are obtained. Therefore, the precoder which needs the low number of required calculations has high efficiency for parallel operation. In more detail, the number of required calculations in order to obtain final estimated symbol for each precoder is represented in Table 2.
According to Table 2, the total number of required calculations for the PGS and ordering PGS precoders reduces as N increases. Also, the gap between the GS precoder and proposed schemes grows by increasing number of iterations. These advantages mean that the PGS and ordering PGS precoders are efficient in terms of parallel implementation.

C. COMPARISON OF BER
This subsection provides the comparisons of BER performance for the ZF, GS and proposed schemes. The simulation parameters are represented in Table 3. Also, it is assumed that BS can obtain perfect CSI for all UEs.
The Fig. 5 and Fig. 6 show the comparison for BER performance when BS has 100 transmit antennas and the number of UEs is 12. And the Fig. 7 and Fig. 8 show the comparison for BER performance when BS has 140 transmit antennas and the number of UEs is 12. Also, the number of iterations is 2 in Fig. 5 and Fig. 7 and 3 in Fig. 6 and Fig. 8. Because the ZF precoder provides the optimal performance in massive MIMO, the BER performance for the ZF precoder is always higher than other precoders in all comparisons for BER.
Since the PGS precoder uses the lower number of feedback symbols than GS precoder, the BER performance of all cases for PGS precoder is degraded than the GS precoder.   Also, the BER performance for PGS precoder reduces as N increases. This result is caused by the decreased number of feedback symbols when N is large.
According to Fig. 5, in contrast, although the ordering PGS precoder utilizes fewer feedback symbols in order to estimate the precoding symbol like PGS precoder, the ordering PGS precoder provides higher BER performance than the GS precoder by rearranging gram matrix. This gap between the ordering PGS precoder and GS precoder increases when N is small and SNR is high.
According to Fig. 6, since the number of iterations grows, the BER performance for all precoders except for ZF precoder is improved. Also, the BER performance for the ordering PGS precoder is always high compared with the PGS precoder when both precoders have same N . Though the ordering PGS precoder obtains higher BER performance than the GS precoder when gram matrix is divided into 2 × 2, the BER performance for the ordering PGS precoder is degraded compared with the GS precoder in other cases (3 × 3 and 4 × 4). Because the ordering method only provides lower gain for convergence rate than the way that the feedback symbols increase by growing N (in Fig. 4), the ordering PGS precoder gains the lower benefit by increasing the number of iterations i than the way that uses more feedback symbols.
The channel of Fig. 7 and Fig. 8 has the larger number of transmit antennas per UE than the channel for Fig. 5 and Fig. 6. Therefore, in Fig. 7 and Fig. 8, the channel has favorable condition than the channel for Fig. 5 and Fig. 6. Due to these properties, performance for all precoders is improved.
According to Fig. 7, the ordering PGS precoder provides improved BER performance than the GS precoder for the same reason as Fig. 5. Also, the gap between both precoders is large when N is small and SNR is high.
In Fig. 8, all preocders have same BER performance. This improvement for BER performance is caused by growth for the number of iterations and transmit antennas per UE. In other words, the PGS precoder gains BER performance same as the GS precoder with decrease of the number of required calculations when the wireless systems have favorable channel and the precoder can execute multiple iterations. In contrast, in this case, the ordering PGS precoder does not obtain any other benefits compared with the PGS precoder. Even if the ordering PGS precoder provides small gain for BER performance, the additional processes, such as rearranging gram matrix, are inefficient against benefit for performance.

D. COMPREHENSIVE ANALYSIS
This subsection provides comprehensive results and analysis in order to provide usefulness for proposed schemes in accordance with situation. Table 4 shows rate of change for the complexity of the proposed schemes compared with the GS precoder when BS has 140 transmit antennas and the number of UEs is 12 with i = 2. Also, Table 4 presents the number of required calculations for proposed schemes and the maximum difference for the BER gain between the proposed schemes and the GS precoder. In this case, the number of calculations for the GS precoder is 24.
According to Table 4, the BER gain of PGS precoder decreases as the N is large. In addition, the PGS precoder has always lower BER gain than the GS precoder. Also, the gap between the PGS and GS precoders increases when the SNR is large in Fig. 5, Fig. 6, Fig. 7 and Fig. 8. On the other hand, the PGS precoder reduces required time for obtaining the precoded signal by 50% in (2 × 2), 67% in (3 × 3) and 75% in (4 × 4), respectively, without any additional complexity. When the SNR is low and the iteration number i is large, since the GS, PGS and ordering PGS precoders provide similar BER performance, the PGS precoder that has large N is more efficient in terms of required time.
Like the PGS precoder, the BER gain of ordering PGS precoder decreases as the N is large. Also, the number of calculations for the ordering PGS precoder is same as the PGS precoder. On the other hand, the ordering PGS precoder obtains more BER gain compared with the GS precoder as the N decreases. Due to the method for rearranging gram matrix in specific order, BER performance is improved. Therefore, the ordering PGS precoder has additional complexity. In Table 4, the complexity of all cases for ordering PGS precoder is increased by 50%. When the SNR is large and the iteration number i is small, since the ordering PGS precoder provides improved BER performance than other precoders and decreased time for obtaining the precoded signal, the ordering PGS precoder is efficient.

VI. CONCLUSION
In this article, in order to mitigate the problem of parallel operation for the GS precoder, the PGS is proposed by modifying the structure for feedback matrix. The PGS precoder reduces the number of required calculations in order to obtain final estimated symbol and provides same total complexity for the GS precoder. However, since the PGS precoder divides the gram matrix of channel into small blocks, the PGS precoder utilizes the fewer number of feedback symbols. Therefore, the convergence rate and BER performance for the PGS precoder are degraded compared with the GS precoder.
Thus, the ordering PGS precoder is additionally proposed in order to compensate degradation for performance by rearranging the gram matrix. Although the ordering PGS precoder obtains only small gain of convergence rate and complexity for the ordering PGS precoder slightly increases, the degradation for the PGS precoder is mitigated by allocating more feedback symbols when the UE has unfavorable channel condition. Therefore, the simulation results show that the ordering PGS precoder provides the improved BER performance than GS precoder. Consequently, the ordering PGS precoder provides efficiency for parallel operation and enhanced performance in massive MIMO.