An Efficient Method for Combining Multi-user MIMO Tomlinson-Harashima Precoding with User Selection Based on Spatial Orthogonality

Combining multi-user multiple-input and multiple-output (MU-MIMO) Tomlinson-Harashima precoding (THP) with semiorthogonal user selection (SUS) yields both space diversity and multi-user diversity benefits; however, it entails a significant computational effort. To overcome this limitation while retaining system-level performance, this paper proposes an efficient method for combining MU-MIMO THP with SUS. The proposed method is inspired by the fact that the ordering calculations in both THP and SUS are generally based on spatial orthogonality. Particularly, the ordering result of SUS is fully reused for the ordering in THP, thereby eliminating the ordering calculation in the latter. Through a system-level performance evaluation, the proposed method is compared to a traditional method with the ordering process, thereby demonstrating the effectiveness of the proposed method. Numerical results obtained via computer simulations show that although the proposed method achieves almost the same system capacity as the traditional MU-MIMO THP method with ordering, the computational complexity of the proposed method decreases substantially with an increase in the number of antennas.


I. INTRODUCTION
With the rapid growth in the use of smart devices such as smartphones and tablets, the demand for mobile wireless services has exponentially increased, and in 2022, the amount of wireless traffic is estimated to reach 71% of all IP traffic [1]. Fifth-generation mobile communication systems (5G) cover various scenarios, such as enhanced mobile broadband (eMBB), ultra-reliable and low-latency communications (URLLC), and massive machine-type communications (mMTC) [2], [3]. For eMBB, the target peak data rate is set to 20 Gbps [2], [3], which is expected to be improved to at least 1 Tbps in sixth-generation mobile communication systems (6G) [4].
Multi-user multiple-input and multiple-output (MU-MIMO) is a promising technique for achieving a larger system capacity because simultaneous transmission can be realized via a single antenna mounted on a mobile station (MS) [5], [6]; moreover, it can be practically extended to massive MIMO using analog beamforming [3], [7]. Considering that capacity enhancement will be a high-priority requirement in 6G, MU-MIMO will continue to be an essential elemental technology [4].
In MU-MIMO systems, precoding techniques are essential for creating spatial orthogonality among multiple users. Precoding is categorized into two classes, namely linear and nonlinear precoding. Linear precoding (LP) [5], [6] is a prevalent and simple precoding scheme; notably, it has been adopted in IEEE 802.11ac [8] and LTE-Advanced [9]. However, LP entails noise enhancement at MSs, which restricts the transmission performance. By contrast, non-linear precoding (NLP) [10]- [16] can achieve better transmission performance than LP. Particularly, NLP enables the suppression of noise enhancement based on a perturbation vector. Thus, NLP has emerged as a candidate technique to realize 5G systems and beyond [11]- [14], [17]. Among the various NLP schemes, the vector perturbation (VP) approach provides near-optimal performance using sphere encoding [10], [11]. However, VP involves a non-deterministic polynomial time hard (NP-hard) problem in finding the optimal perturbation vector. By contrast, Tomlinson-Harashima precoding (THP) can be considered to be a practical approach because the perturbation vector can be generated by a simple modulo operation [14]- [16].
The number of existing users generally exceeds the number of antennas at a base station (BS); therefore, the achievable system capacity depends on the possible simultaneous user groups. In this context, user scheduling techniques are essential to enhance the capacity of MU-MIMO systems. Among user scheduling techniques, greedy user selection could be a promising approach to realize throughput maximization [18], [19]. However, a heavy computational load is required to calculate system capacities for possible user groups considered for existing users. Hence, practical approaches that reduce the computational complexity have been proposed [20], [21]. Among such approaches, semiorthogonal user selection (SUS) is a potentially effective technique because only spatial orthogonality, which is easily obtained from channel state information (CSI), is utilized for user selection [21].
From a system-level perspective, it is important to consider the effective combination of MU-MIMO THP with SUS. In [22], the contribution of SUS to MU-MIMO THP has been investigated, and its multi-user diversity benefit has been intensively discussed. However, the effectiveness of the ordering process of MU-MIMO THP, which creates the space diversity benefit, has not been addressed. Nevertheless, it is essential to consider the effect of this ordering process in the system-level performance evaluation of THP. Moreover, considering that the ordering process entails multiple calculations of the inverse of the channel matrix [23], practical approaches to reduce the computational cost of THP have been proposed [24], [25]. However, these approaches have primarily focused on the link performance, without considering the impact of user scheduling such as SUS. Therefore, to the best of our knowledge, there is no prior research that considers the cooperation between user scheduling, such as SUS, and the ordering process in THP to enhance the systemlevel performance via a low computational cost.
Considering this aspect, we propose a method for efficiently combining MU-MIMO THP with SUS. The key feature of the proposed method is that the ordering in the SUS is reused as that for MU-MIMO THP. In this case, the computations required for the ordering process of MU-MIMO THP can be eliminated. The proposed concept is inspired by the fact that the ordering calculations of both MU-MIMO THP and SUS are based on spatial orthogonality. Specifically, the ordering process of MU-MIMO THP ensures sufficient orthogonality for lower users, whereas SUS requires high spatial orthogonality for upper users. The proposed method exploits this property by reversing the ordering results for SUS to be used as the ordering for MU-MIMO THP. To evaluate the performance of the proposed method, considering the impact of the modulo loss specific to MU-MIMO THP, the system capacity is accurately determined by conducting a mod-Λ-channel-based analysis [26]. Moreover, the effectiveness of the proposed method is demonstrated by comparing its performance with that of the traditional MU-MIMO THP method with ordering through computer simulations.
This study extends our previous work [27], in which the effectiveness of the proposed method was not clarified in terms of the computational cost. Moreover, in the previous work [27], the traditional Shannon-Hartley theorembased capacity analysis was shown instead of the more accurate mod-Λ-channel-based analysis [26], which depicts the impact of the modulo loss unique to the THP. In the performance evaluation in the former work [27], a cellular scenario in which the carrier-to-noise ratio (CNR) differs among the existing users was not considered; instead, a constant CNR was assumed as a basic illustration. Overall, the differences between this work and that reported in the conference paper can be summarized as follows. The computational complexity of the proposed method in terms of the number of floating-point operations (FLOPs) is clarified (see Section II-D). Moreover, the performance evaluation is achieved by conducting mod-Λ-channel-based analysis [26] for a typical single cell scenario (see Section III-B).

A. SYSTEM CONCEPT
MU-MIMO THP is an effective technique to substantially improve the transmission performance of MU-MIMO. To enhance the system capacity of MU-MIMO THP, a user scheduling technique such as SUS must be deployed before MU-MIMO THP signal processing. Therefore, it is meaningful to consider the effective combination of MU-MIMO THP with SUS from a practical point of view. Fig. 1 shows the system configuration, wherein SUS is performed prior to MU-MIMO THP. In Fig. 1(a), it should be noted that the ordering of the signals for scheduled MSs is generally adopted for MU-MIMO THP to suppress the noise enhancement. Because this ordering process entails multiple calculations of the inverse of the channel matrix, additional computational load is unfortunately caused in MU-MIMO THP. In this context, it is necessary to reduce the computational load while retaining the ordering in MU-MIMO THP.
We propose an efficient method for combining MU-MIMO THP with SUS, based on the fact that the ordering results of SUS are originally determined based on spatial orthogonality, which is almost the same as the ordering operation in MU-MIMO THP. Thus, the ordering result of SUS can be fully reused for the ordering in MU-MIMO THP, which is performed after SUS signal processing. Specifically, MU-MIMO THP requires sufficient spatial orthogonality for the  lower users ordered at the end, whereas SUS requires the same for the upper users selected at the beginning. The proposed method acquires the ordering result of MU-MIMO THP by arranging that of SUS in the reverse order, thereby eliminating the computational effort of the ordering process of MU-MIMO THP. As shown in Fig. 1(b), the ordering process of MU-MIMO THP is omitted, enabling the efficient combination of MU-MIMO THP and SUS, in contrast with the traditional approach shown in Fig. 1(a).

B. TOMLINSON-HARASHIMA PRECODING
In this section, we briefly introduce the operating principle of THP. Fig. 2 shows the system configuration for the MU-MIMO THP, where N t and N r denote the numbers of transmitting antennas and simultaneous MSs, respectively. Herein, each MS is assumed to contain a single receiving antenna. As shown in Fig. 2, the feedforward (FF) filter F and the feedback (FB) filter B are employed to retain spatial orthogonality among multiple MSs.
In general, THP can be implemented by an LQ decomposition [16]. By utilizing the LQ decomposition on the channel matrix H ∈ C N r ×N t , we have where L ∈ C N r ×N r is a lower triangular matrix and Q ∈ C N r ×N t is a unitary matrix. Assuming that precoding weight is determined by the zero-forcing (ZF) criterion, both FF filter F and FB filter B for THP algorithm can be obtained as   where L ii is the i-th diagonal element of the lower triangular matrix L. Next, we briefly describe the modulo operation of THP for the transmit power limitation. Fig. 3 shows how to generate the transmit signal of THP. In THP, the interference subtraction vector generated by the FB filter B is added to the original modulated signal vector x = [x 1 , · · · , x N r ] T ∈ C N r , which increases the transmit power. The modulo operation is performed to limit this increase. In detail, the transmit signal of the i-th antenna after the modulo operation v i is given by where τ, z i , and b i j represent the modulo width, perturbation vector added to the transmit signal of the i-th antenna at the BS, and (i, j) element of FB filter B, respectively. The power VOLUME 4, 2016 normalization factor is, therefore, given by where E tx denotes the total transmit power and C v ∈ C N r ×N r is the covariance matrix of the transmit signal vector after the The ordering process is generally performed to enhance the transmission performance by rearranging the users to maximize the signal-to-noise ratio (SNR) of the precoded signals. The SNR in THP is generally expressed as where n denotes the noise, and σ x 2 and σ n 2 are the transmit signal power and noise power, respectively. Considering that tr(FC v F H ) depends on the order in which the users are sorted in the precoding process, the order that minimizes tr(FF H ) provides the most optimal transmission performance. Taking advantage of the fact that (HH H ) −1 is related to FF H , the vertical Bell Laboratories layered space-time (V-BLAST) algorithm makes it possible to derive the suboptimum ordering results by calculating (HH H ) −1 as many times as the number of antenna elements [23], [25]. As mentioned in the previous section, the proposed method aims to reduce this computational burden by reusing the ordering results of SUS, as MU-MIMO THP is combined with SUS in this study.

C. SUS ALGORITHM
SUS is a practical user scheduling technique that takes advantage of the spatial orthogonality among MSs. Fig. 4 shows the spatial orthogonality among MSs in SUS. The MS with the largest norm of channel coefficients is selected as the first MS. Then, the MS that is the most orthogonal to the first MS is determined as the second MS. Next, the MS that is the most orthogonal to both the first and second MSs is determined as the third MS, and such a procedure is continued. Here, the i-th selected MS is given by [21] π(i) = arg max where q (i) k ∈ C 1×N t and h k ∈ C 1×N t are the orthogonal vector and channel row vector corresponding to the k-th MS, respectively. This user selection process restricts the simultaneous MSs by using the threshold α, given by  contrast, when α is relatively small, the transmission quality of each MS can be improved owing to spatial redundancy.
In this manner, SUS can select MSs with high spatial orthogonality using simple inner product calculations; thus, inverse matrix calculations for system capacity analysis can be circumvented.

D. SYSTEM CONFIGURATION
In MU-MIMO THP, a relatively low degree of spatial orthogonality is preferred, especially for the lower users, to suppress the noise enhancement; by contrast, SUS requires high spatial orthogonality for the upper users to improve the channel capacity. Therefore, there is an inverse relationship between the ordering results of MU-MIMO THP and SUS. Fig. 5 shows the operating principle of the proposed method. As shown in Fig. 5, reverse ordering the sorted result obtained from SUS yields the ordering result of MU-MIMO THP, substantially simplifying the overall system configuration of MU-MIMO THP in combination with SUS.
Next, prior to the performance evaluation, we clarify the computational complexity of the proposed method in terms of the number of FLOPs. Here, the number of FLOPs required for each matrix operation is as follows: where C SUS , C THP , and C Ordering are the numbers of FLOPs for SUS, THP filter generation, and ordering, respectively. If the SUS threshold α = 1.0, which implies that the number of transmitting antennas matches the number of simultaneous MSs, C SUS , C THP , and C Ordering are represented by where we define that the number of antennas N = N t = N r , and K denotes the number of existing MSs. Moreover, because the reduction in the computational effort of ordering owing to the proposed method is proportional to the fourth power of N in Eq. (14), the number of FLOPs can be reduced to

III. NUMERICAL RESULTS
This section presents verifications for the effectiveness of the proposed method by comparing them with those of the traditional method in terms of the bit error rate (BER), system capacity, and computational complexity. Here, for the BER evaluations, link-level simulations are performed by assuming a specific modulation scheme. For the system capacity evaluations, system-level simulations are conducted for a cellular scenario in which the CNR differs among existing users. In the computational complexity evaluations, the reduction effects of the proposed method are compared with those of the traditional method with ordering in terms of the number of FLOPs.

A. LINK-LEVEL PERFORMANCE EVALUATION
First, we perform a link-level performance evaluation to verify the effectiveness of the proposed method. Table 1 lists . Performance comparison between proposed and traditional methods in terms of BER versus average CNR, where QPSK modulation is employed. VOLUME 4, 2016 $YHUDJH &15 >G%@ evaluations, and the BER performances of the LP are shown for reference. It is noted that the CNR is conceptually the same as the SNR. Since radio propagation is considered in our performance evaluation, the CNR is used instead of SNR. The ordering process of the traditional MU-MIMO THP method improves the BER performance; moreover, the BER performance of the proposed method approaches that of the traditional method with ordering, regardless of the MIMO antenna configuration. These results demonstrate that reverse ordering, which is the main feature of the proposed method, alleviates the noise enhancement. Moreover, the performance of both traditional MU-MIMO THP and the proposed method is better than that of MU-MIMO LP, even in the presence of time-selective fading. Particularly, in the case of a large MIMO antenna configuration, such as 8 × 8, the superiority of MU-MIMO THP is remarkable. This is because MU-MIMO THP yields a larger space diversity benefit, thereby alleviating the enhancement of the multi-user interference due to time-selective fading as well as noise enhancement.
Second, we clarify the impact of spatial channel correlation on the proposed method. The Kronecker model is widely used for spatially correlated MIMO channels, and the channel matrix for the existing MSs H K ∈ C N t ×K is given by [28] where R t ∈ C N t ×N t and R r ∈ C K×K are the transmitting and receiving correlation matrices, respectively; G ∈ C N t ×K denotes the i.i.d. complex Gaussian matrix with zero mean and unit variance. Assuming that the correlation matrix follows the exponential correlation model, the (i, j) element of the correlation matrix R is given by [29] [R] i, j = where the correlation coefficient is |ρ| ≤ 1. Fig. 8 shows the BER performance versus the receiving correlation coefficient ρ r , where 16QAM is employed, the average CNR is 12 dB, and transmitting correlation coefficient ρ t is set to 0.0 or 0.75. From Fig. 8, it is noted that as the receiving correlation increases, the BER performance of the proposed method declines compared with that of the traditional method with ordering. However, the superiority of the proposed method over the traditional method without ordering remains intact. Moreover, it can be seen that both the traditional MU-MIMO THP and proposed method outperform MU-MIMO LP regardless of the transmitting and receiving correlations.

B. SYSTEM-LEVEL PERFORMANCE EVALUATION
Next, we evaluate the system-level performance of the proposed method. Table 2 show the simulation parameters. In our performance evaluation, K MSs are uniformly distributed in a single cell with radius R = 1, 000 m, and spatially uncorrelated MIMO channels are assumed. Moreover, considering that MU-MIMO THP entails modulo loss due to the modulo operation at MSs, the sum-rate is calculated based on mod-Λ channel analysis, which is represented by [26] Sum-rate = Here, p(z) (−τ/2 < z < τ/2) is the probability density function of the noise after the modulo operation, which is given as  Fig. 9 shows the performance comparison between the proposed and traditional methods in terms of the sum-rate versus the number of existing MSs K, where the SUS threshold α = 1.0. From Fig. 9, it is observed that the proposed method provides almost the same system capacity as the traditional MU-MIMO THP method with ordering, regardless of the number of existing MSs. Moreover, the performance gap between the proposed method and MU-MIMO LP increases with an increase in the size of the MIMO antenna configuration because MU-MIMO THP can obtain a larger space diversity benefit than MU-MIMO LP. Fig. 10 Table 3 lists the relationship between the SUS threshold α and average number of simultaneous MSs. From Fig. 10, as the threshold α approaches 1.0, the system capacity tends to increase irrespective of the precoding scheme because the number of simultaneous MSs N r increases. Moreover, it is observed that the system capacity of the proposed method is almost the same as that of the traditional MU-MIMO THP method with ordering, regardless of the threshold α. For relatively lower SUS thresholds, such as α ≤ 0.3, the system capacities of MU-MIMO THP decline compared with those of MU-MIMO LP owing to the impact of modulo losses.

C. COMPUTATIONAL COMPLEXITY
Finally, we demonstrate the extent to which the computational complexity can be reduced using the proposed method.

FIGURE 11
. Normalized computational complexity of the proposed method versus the number of antennas. Fig. 11 shows the relationship between the normalized computational complexity of the proposed method and the number of antennas N (= N t = N r ), with the number of existing MSs K as a parameter. Here, the normalized computational complexity is defined as the number of FLOPs required for the proposed method normalized by that for the traditional method, which is calculated by using Eqs. (11)- (15). From Fig. 11, it is clear that the proposed method dramatically reduces the computational complexity with an increase in the number of antennas N, regardless of the number of existing MSs K. This is because the proposed method eliminates the ordering process in MU-MIMO THP, which entails a high computational effort proportional to the fourth power of N.

IV. CONCLUSION
In this paper, we proposed an efficient method for combining MU-MIMO THP with SUS. The proposed method eliminates the ordering process in MU-MIMO THP by fully reusing the ordering result of SUS, which is obtained in advance. The proposed concept is inspired by the fact that the ordering in both MU-MIMO THP and SUS is performed based on the spatial orthogonality among MSs. Numerical results showed that the proposed method achieves almost the same system capacity as the traditional MU-MIMO THP method with ordering, and that the benefit of the ordering in MU-MIMO THP can be retained even in the presence of time-selective fading. Moreover, the proposed method reduces the computational complexity regardless of the number of existing MSs, and the reduction effect becomes remarkable in the case of relatively large MIMO antenna configurations, where MU-MIMO THP is effective in terms of the transmission performance.