Efficient User Subset Selection for Multi-User Space-Time Line Code Systems

This paper considers the problem of user subset (US) selection for minimizing the bit error rate (BER) of multi-user space-time line code (MU-STLC) multiple-input multiple-output systems with fairness-aware per-user power allocation. The optimal selection criterion suitable for MU-STLC transmissions based on zero forcing (ZF) precoding is given and two efficient algorithms are then proposed. First, an incremental search approach is presented for US selection in the MU-STLC systems. The proposed suboptimal solution to BER minimization starts an empty US and adds users one by one, where the low-complexity recursive computation of the block matrix inverse is further performed. Second, by avoiding recurring matrix computations in each incremental procedure of the second algorithm, a more efficient algorithm is developed. It is observed through simulation results that the proposed incremental-based algorithms achieve most US selection gains with very low complexity. In addition, it is demonstrated that when there are T N transmit antennas, U users (each user has 2 receive antennas), and K selected users, the achievable upper diversity order of the ZF precoding-based MU-STLC systems with optimal US selection is given as 2(NT -K +1)(U -K +1). The analytical diversity order is well-matched with simulation results.


I. INTRODUCTION
Recently, a space-time line code (STLC) in [1] has been designed as a novel transmission approach. In a STLC scheme achieving a full spatial diversity gain, two information symbols are encoded by channel gains coming from multiple receive antennas and are sent consecutively in time [1]- [3]. It requires the perfect knowledge of the full channel state information (CSI) at the transmitter and utilizes a simple STLC combining structure without CSI at the receiver. Furthermore, a multi-user (MU) STLC system, which can simultaneously deliver multiple STLC streams to multiple users, has been proposed for multiple-input multiple-output (MIMO) downlink transmissions [4], [5]. By employing a zero-forcing (ZF) precoder, the MU-STLC system with fairness-aware per-user (FAPU) power allocation has been shown to be capable of offering near optimal performance in terms of the sum achievable rate.
In wireless communication systems, the number of downlink multiple users ( U ) is often larger than the number of transmit antennas ( T N ) and/or the number of users that can be served at the same time. Then, the base station has to perform user subset (US) selection based on the CSI of all the available users in an MU MIMO communication system where ZF-based precoding is employed in downlink transmissions [6]- [15]. The optimal US selection can be performed by using an exhaustive search, but due to its impractical computational complexity, the development of efficient suboptimal selection algorithms has drawn great attention. Several suboptimal US selection algorithms fall into two categories such as incremental search-based algorithms and decremental-based algorithms. In [7], suboptimal greedy US selection schemes based on ZF dirtypaper (DP) precoding and simple ZF beamforming without DP coding have been considered for wireless broadcast channels. Since the linear ZF precoding has many advantages over DP coding in practical systems, much of the existing US selection algorithms are based on ZF precoding. In [8] and [9], a semi-orthogonal US (SUS) selection algorithm has been developed under a ZF beamforming strategy. Lowcomplexity SUS incrementally selects a new user with the This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. largest effective channel norm that is nearly orthogonal to the channels of the other selected users. In [11], a decremental US selection algorithm based on ZF beamforming has been proposed when the number of users is less than the number of transmit antennas. In [12], a novel greedy US selection procedure with swap has been presented. Although various US selection schemes have been presented for MU MIMO transmissions, they are unsuitable for the MU-STLC MIMO systems owing to different transmission methods. To the best of our knowledge, no study has been previously made for US selection in the MU-STLC systems. Moreover, previous works on US selection for different MIMO systems have mainly focused on the problem of achieving the maximum sum rate. Although the sum achievable rate could be employed as an US selection criterion in terms of information-theoretic point of view, it would be more practical to use a minimum error probability for a given modulation scheme and a MIMO detector [16], [17]. It should be noted that one key problem in US selection is to optimally design a proper selection criterion.
In this paper, an US selection criterion suitable to the ZFbased MU-STLC MIMO systems with FAPU power allocation is first presented to minimize the bit error rate (BER) performance. It has been shown in [4] that the FAPU power allocation in the MU-STLC systems is able to offer fairness in terms of allocated power as well as lowcomplexity power allocation. Thus this work assumes a simple FAPU power allocation. Efficient incremental searchbased suboptimal algorithms are proposed to greatly reduce the computational complexity of the optimal US selection algorithm. Furthermore, the original SUS selection algorithm of [8] with low-complexity is not suitable for the MU-STLC transmission systems. It should be adapted for use by MU-STLC systems. Therefore, we present a modified SUS algorithm with lower complexity for MU-STLC, which is simulated as a benchmark for comparison with the proposed incremental-based US selection algorithms. By exploring an analytical bound on pairwise error probability (PEP), the achievable diversity order is derived. We demonstrate that the ZF-based MU-STLC MIMO systems with FAPU power allocation can achieve an upper diversity order of 2( 1)( 1) where K users are selected, which simultaneously achieves both receive/transmit and MU diversity. Simulation results verify that the achievable diversity order matches well with the analytical one. The analytical and numerical results show that the proposed incremental US selection strategy can efficiently decrease the computational complexity to yield an US.
The main contributions of this study are summarized as follows:  The efficient US selection algorithm based on the incremental search approach is developed for the MU-STLC systems. Another proposed incremental US selection can reduce efficiently the computational complexity by exploiting the orthogonal STLC encoding structure as well as the recursive block matrix inversion operations. To our knowledge, the proposed efficient US selection algorithms are the first efforts in the MU-STLC systems to provide a low computational complexity.  The computational complexity of the proposed US selection algorithms is analyzed and compared to the optimal and conventional SUS selection schemes. The complexity comparison proves the efficiency of the proposed algorithms.  The overall diversity order achieved in the ZF-based MU-STLC system with optimal US selection is analytically provided and verified by simulation results. The remainder of this paper is organized as follows. In Section II, a system model for the MU-STLC transmission with US selection, based on the ZF precoder, is briefly presented. In Section III, three efficient US selection algorithms for MU-STLC systems are presented together with the computational complexity analysis. The achievable diversity order is analyzed in Section IV. The simulation results are presented in Section V. Finally, some conclusions are drawn in Section VI. submatrix obtained by deleting the (2 1) k  -th and 2k -th two row vectors in matrix X .

II. SYSTEM MODEL OF MU-STLC WITH US SELECTION
We consider a downlink MU-STLC time division duplex (TDD) system, which has T N transmit antennas and U users. Each user has two receive antennas for STLC [1]- [6] as shown in Fig. 1. In this work, it is assumed that K () U  users are selected from the U users. Let , kt x be the t-th transmitted symbol of the k-th user, with Then the MU-STLC signal matrix is defined as 2 12 [] bt -th element of the matrix S WX and 22 12 [ , which is assumed to be perfectly known at the transmitter. In TDD mode, channel reciprocity between uplink and downlink channels can be exploited to estimate the CSI by using the pilot/training signals from all the available users [4], [18]. Here , is a channel matrix between all transmit antennas and each user, which is static for 1 t  and 2 t  , and whose elements are independent and identically distributed (i.i.d.) circularly symmetric complex Gaussian random variables with zero mean and unit variance. 22 12 , is an i.i.d. additive white Gaussian noise (AWGN) matrix whose elements are the zero-mean circular complex white Gaussian noise component of a variance of 2 z  .
For the MU-STLC decoding, the received signal matrix of (4) is re-expressed in a linear form as [5]   2 01 10      Q (8) and 41 K C   z is the AWGN vector with 2 4 [] H zK E   z z I . By the simple STLC combining procedure at the receiver, which is described in [5], user k conducts STLC combining as * , 1 2 ,2 T k k  r Q r . Therefore the MU combined-STLC received signal vector can be given as H W x z (9) where the expression (5) is used and  (12) where the power normalization factor related with the selected US is given as

III. US SELECTION ALGORITHMS
In this section, we present optimal and two suboptimal incremental US selection algorithms for the ZF-precoded MU-STLC systems. It can be easily shown from (9) that maximizing the received signal-to-noise ratio (SNR) of the ZF-precoded MU-STLC systems is equivalent to maximizing the term 2 S  of (13). It is pointed out that the US selection design objective of 2 max S  is identical to minimizing the error probability from the PEP expression (55), which will be presented in section IV. Then the optimal US selection algorithm for the ZF-precoded MU-STLC system can be described as (14) where n S is the n-th enumeration of the set of all available USs. Here U K C is the total number of combinations of selecting K users out of U available users. Obviously, the exhaustive search algorithm to solve (14) requires U K C matrix inverse operations, whose computational complexity is huge, especially when the number of all the possible USs is large. To evaluate its computational complexity, we take account of the number of real multiplications (RMs) and the number of real summations (RSs). Here a complex multiplication corresponds to 4 RMs and 2 RSs whereas a complex summation employs 2 RSs. By performing the similar computational complexity analysis used in [6], [19], [20], [21], [22], and [23], they are given as 2 3 2 (16 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. It should be pointed out that US selection algorithms are closely coupled with an employed transmission method. The conventional low-complexity US selection algorithms are based on the channel (or ) S HH , not (or ) S HH and thus unsuitable for US selection in the MU-STLC systems. Because of the difference between two channel matrices of , they are inappropriate for direct use in the MU-STLC systems. The succeeding efficient US selection algorithm (Algorithm 5 in subsection III.D) designed for the MU-STLC systems by effectively utilizing the orthogonal STLC encoding structure of the definition (10) in this work is distinct from the conventional US selection algorithms.

A. ORIGINAL SUS SELECTION ALGORITHM
), :) 15: ([1: 2 ( ) 2 2 ( ) 1: end], :) 16: end Output: The semi-orthogonal US selection method with the full channel matrix 2 T UN C   H originally proposed in [8] can be described as Algorithm 1, termed as SUS. It can incrementally find K best users with large channel gains that have also a good level of orthogonality. In the i-th iteration,   ( 1) i  -th iteration as presented in lines 9, 12, and 14 of Algorithm 1. Note that each user in the MU-STLC system has 2 receive antennas. That's why two channel vectors are selected for each user at any given time. It is noted that Algorithm 1 does not force semi-orthogonality among users for simple use and comparison. It means that we do not adopt the step of measuring the orthogonality of pp  Ψ and comparing it to a predetermined threshold, which has been employed in [8]. It is shown in Section V that the modified low-complexity SUS selection algorithm (Algorithm 3) without this step has negligible difference in BER performance compared to the proposed incremental-based algorithms.

B. MODIFIED SUS SELECTION ALGORITHM WITH LOW-COMPLEXITY
The original SUS selection approach in Algorithm 1 is based on the full channel matrix . It will be shown through simulations in Section V that its performance is very poor. Therefore, it should be modified using the channel to apply to the MU-STLC systems for a fair comparison. Then the SUS selection algorithm for MU-STLC signals can be presented as Algorithm 2.   The computational complexity of the SUS selection algorithm in terms of the RMs and RSs, respectively, can be obtained as 1 2 11 (( 1)(32 56 40) 24 ) To reduce the computational complexity of Algorithm 2, a step to make up the subspace { (1, :), ( 2, :), , can be relocated outside from inside of 'a for-loop with index j' as shown in Algorithm 3, which is a low-complexity SUS (called LC-SUS) selection algorithm for the MU-STLC. Then

C. INCREMENTAL US SELECTION ALGORITHM
For the US selection with reduced complexity, another incremental user selection strategy is introduced. Note that this work focuses on the incremental-based selection algorithm because its complexity is substantially smaller than the decremental method when the number of all the available users is large and simultaneously much larger than K . An US is constructed by adding users one by one in the incremental manner. Assuming that ( 1) i  users are selected, the i-th user is selected according to the following criterion.  (24) where ( 1) , ,, To further reduce the computational complexity, we employ the block matrix inverse [12] and then have the following result. 1 1 where we assume that ( 1) i  Π and ij S are both nonsingular and

HH
for the complexity reduction. Note in Algorithm 4 that is replaced with another matrix symbol notation of ( 1)   12:

D. INCREMENTAL US SELECTION ALGORITHM WITH REDUCED COMPLEXITY
In the incremental US selection algorithm proposed in subsection 3.C, the block matrix computation of (30) is involved with several matrix-by-matrix multiplications at each incremental step (for double for-loops in Algorithm 4). By exploiting the definition of (10) used for the STLC encoding, we can reduce the computational complexity for the matrix-by-matrix multiplications.
First of all, we can easily show from (10)    H .
The proposed incremental US selection algorithm using the above-mentioned complexity reduction methods is described in Algorithm 5 (named as R-inc-US), where ( 1) i  V takes replace of In the algorithm, the optimization criterion of (24) is modified for further reduction of the complexity. The US selection criterion of (24) and (30) can be rewritten as The second term of (48) can be rewritten as Finally, it should be noted that to compute the updated 1 () i  Π such as the right side of (30), the same complexity-reduction approach as before is employed after finding the user index

IV. ACHIEVABLE DIVERSITY ORDER
Using the combined symbols in (9) and (12), the maximum likelihood (ML) detection can be performed at the receiver side by minimizing the following metric: where  is the set of user message symbols from the M-ary signal constellation. To derive the achievable diversity order of the MU-STLC system with optimal US selection, we consider the PEP of detecting x when x is transmitted with the assumption that the optimal US for a given channel realization S H is denoted by n S , which is given as  Next, to obtain an additional diversity order of the MU-STLC system achieved from the optimal US selection scheme selecting K users among U users, the same analysis used in [27] can be straightforwardly applied. Using the Craig's result, the conditional PEP between signal vectors x and x is given as   As shown in [27], the optimal US selection for ML detection with respect to the additional diversity order is to determine K users in the n -th US, which satisfies Then, the diversity order of the optimal US selection can be obtained by [27]    Fig. 2 shows the BER performance of the proposed US selection algorithms for (2, , 2) U MU-STLC systems with respect to SNR in dB. Here three different scenarios of 3, 4, and 6 users are considered for US selection. Note that the (2,2,2) scenario representing no US selection is simulated as other reference. It can be observed that as the total number of all available users increases, the performance gets improved. The (2,2,2) , (2,3,2) , (2,4,2) , and (2,6,2) systems have the diversity orders of 2 G  , 4, 6, and 10, respectively. To draw the BER reference black solid lines, the constants associated with 2 G  , 4, 6, and 10, are selected as c  8.9, 950, 5 1.8 10  , and 10 1.9 10  , respectively. Given that the analytical diversity orders are in good agreement with the simulation results, we can verify the accuracy of our diversity order analysis. It is shown that the proposed inc-US selection algorithm offers a large BER performance gain compared with no US selection case, which has only 2 receive diversity gain. For example, the proposed US selection algorithm for (2,6, 2) yields about 8.3 dB gain at a BER of 3 10  . It is obtained by the product of a diversity order of 5 (MU diversity) and that of 2 (receive diversity). Although the LC-SUS, inc-US, and R-inc-US selection algorithms display worse performance than optimal one, their BER performance gap is minor.  The achievable diversity orders of (3,4,2) and (4,4,2) are given as G =12 and 18, respectively, which are calculated by MU diversity and receive diversity in addition to transmit diversity. Here the constants c = 11 2.3 10  and 16 3.8 10  , respectively, are used for G =12 and 18. For a comparison purpose, simulation results of the original SUS selection scheme (Algorithm 1) are also presented. It is shown that the proposed algorithms outperform the original SUS selection algorithm, which cannot achieve the full diversity gain of the MU-STLC transmission systems. Compared with the optimal algorithm, the performance loss of the proposed algorithms is small. Note that, as expected from the diversity order analysis, the BER performance increases as the number of transmit antennas grows.   (4,4,2) , and (5,5, 2) systems are considered. It is noticed that except for (2, 2, 2) , the other three systems can achieve three kinds of diversities. Here (3,3, 2) and (5,5,2) can achieve the diversity orders of 8 G  and 32, respectively, whose BER reference lines are plotted with c  . It is clear that as the numbers of available users and transmit antennas simultaneously increase, the diversity gain is greatly improved. Especially in (5,5,2) , the proposed efficient US selection algorithms offers BER results very close to that of optimal one.
The complexities of the proposed algorithms under the scenarios considered in Figs. 2, 3, and 4 are described in terms of RMs plus RSs in Table I. We observe that the proposed inc-US and R-inc-US selection algorithms yield a complexity reduction compared to the LC-SUS and optimal one. Especially, the complexity of the R-inc-US selection VOLUME XX, 2017 algorithm is lowest for all the given systems and less than half that of the inc-US. Moreover, as the system parameter values increase, the rates of increase in complexity of the inc-US and R-inc-US are substantially smaller than those in the LC-SUS and optimal one. In Fig. 5, the BER results of the proposed US selection algorithm are given for (8,12,8) , (10,12,8) , (12,12,8) , and (14,12,8) systems with higher numbers of transmit antennas.
It is also found that the proposed algorithms make a substantial performance gain over a (8,8,8) scenario. The SNR gains for (8,12,8) , (10,12,8) , (12,12,8) , and (14,12,8) are about 6 dB, 10 dB, 12 dB, and 13.2 dB, respectively, at BER= 2 10  . Further, the performance of the proposed algorithms under (14,12,8) is close to that of the optimal one. The complexity comparison in the scenarios presented in Fig. 5 is also made in Table I. Additionally, it is observed that the original SUS selection scheme (Algorithm 1) yields significantly worse BER performance than other algorithms especially for T NU  . Compared to the system parameters used in Figs. 2, 3, and 4, the system parameter in Fig. 5 has larger values. In the scenarios with large numbers of transmit antennas and users, the optimal algorithm has a tremendous complexity. On the other hand, the proposed inc-US and R-inc-US algorithms have much smaller complexity than the LC-SUS. Furthermore, it is found that the proposed R-inc-US algorithm can offer more than 2 times lower complexity than the proposed inc-US algorithm. Hence the complexity reduction of the proposed R-inc-US is remarkable for large numbers of transmit antennas and users. Next, we investigate the complexity of the proposed algorithms in terms of the number of transmit antennas for 12 U  and 8 K  in Fig. 6. It is obvious as expected that the complexity of the proposed algorithms grows as the number of transmit antennas increases. The complexity slope of the LC-SUS is much steeper than the R-inc-US and inc-US. The R-inc-US has the lowest complexity. Note that since the optimal algorithm has a huge complexity, its complexity is not included in the plot. Fig. 7 compares the complexity of the proposed algorithms as a function of the number of selected users for 30 T N  and 8 K  . It is found that the complexity of the inc-US and R-inc-US is lower than that of LC-SUS and the R-inc-US has still the smallest complexity.

VI. CONCLUSION
This paper examines the MU-STLC system based on ZF precoding with US selection and it is shown that it can achieve a large performance gain compared to the system without US selection. The conventional SUS selection algorithm with low-complexity has been modified to be acceptable to MU-STLC transmission systems and thus used as a benchmark for comparison. A more efficient incremental US selection algorithm has been proposed by adopting an incremental strategy in company with the recursive computation of the block matrix inverse. The proposed inc-US algorithm is capable of achieving near-optimal performance with very low complexity. In addition, the inc-US selection scheme has been modified to have more reduced complexity. It has been achieved by exploiting the fact that the matrix-by-matrix computation involved with MU-STLC transmission has recurring operations at each incremental step, which is due to the orthogonal STLC encoding structure. Moreover, we have analyzed an achievable upper diversity order for the MU-STLC system with optimal US selection, which can offer receive and transmit antennas diversity as well as MU diversity. Simulation results have verified the analytical diversity orders.