Optimal Precoder Selection for Spatially Multiplexed Multiple-Input Multiple-Output Systems With Maximum Likelihood Detection: Exploiting the Concept of Sphere Decoding

In this paper, a computationally efficient implementation technique for optimal precoder selection in spatially multiplexed (SM) multiple-input multiple-output (MIMO) systems with maximum-likelihood detection at the receiver is proposed. The techniques previously developed for suboptimal precoder selection were based on the lower bounds of the free distances of precoders to reduce the processing time. However, the use of these techniques leads to significant declines in error performance when the number of spatial streams approaches the number of receiving antennas. At the same time, to achieve optimal performance, the conventional optimal precoder selection technique can be employed; however, it has a long processing time due to exhaustive search. Thus, in this paper, we propose a precoder selection technique that maintains an optimal performance without the prohibitive processing time of the conventional optimal precoder selection. The processing time can be reduced by the following: (1) exploiting the symmetric structure of quadrature amplitude modulation (QAM) constellations, thereby reducing the search space; (2) adopting the concept of sphere decoding (SD); (3) eliminating the last stage of SD; and (4) performing an SD-like process in a selective manner. Both the optimal performance and reduction in the processing time realized by the proposed technique are confirmed via simulation.


I. INTRODUCTION
The techniques for codebook-based precoding are known to improve the performance of multiple-input multiple-output (MIMO) systems without the requirement of feedback containing full channel information. Thus, they have been standardized in contemporary communication systems, including long-term evolution-advanced (LTE-A) systems [1]. In this paper, we focus on such techniques for codebook-based precoder selection.
For linear detection techniques, such as zero forcing or minimum mean square error, selection of the optimal The associate editor coordinating the review of this manuscript and approving it for publication was Nan Wu . precoder can be performed in a computationally efficient manner in terms of the post-equalization signal-tonoise ratio (SNR) or signal-to-interference-plus-noise ratio (SINR) [2], [3]. However, linear detection generally does not achieve sufficient error performance for spatially multiplexed (SM) MIMO systems. This is the main issue that we will address in this paper. Contrarily, maximum-likelihood (ML) detection achieves a good error performance for SM MIMO systems. However, when ML detection is employed, prohibitive complexity is required in the corresponding optimal precoder selection. Because the Euclidean distance needs to be calculated for every transmitted candidate symbol vector and for each precoder matrix in the codebook, such complexity leads to a long processing time [2]. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ To lessen the infeasible complexity of the optimal precoder selection technique when using ML detection at the receiver, several suboptimal techniques have been proposed [4]- [8]. In general, the minimum Euclidean distance for a precoder matrix is referred to as the ''free distance'' of the precoder [6]. In [4],the free distance of the precoder was found to be lower-bounded by the minimum singular value of the product of the channel matrix and the precoder matrix. Moreover, the precoder that maximized the minimum singular value was selected. In [5], another lower bound for the free distance was derived via QR decomposition (QRD), and the precoder that maximized the minimum diagonal entry of the upper triangular matrix was selected. In [6],it has been demonstrated that the QRD-based lower bound was tighter than the singular value decomposition (SVD)-based lower bound. By allowing column-wise permutations of the product of the channel and precoder matrices, the QRD-based precoder selection performance can be improved further [6]. However, when the number of data streams approaches the number of receiving antennas, the lower bound fails to maintain its tightness, which results in the decline in the error performance, as demonstrated in Section V.
The lattice reduction (LR) technique is known to reduce the condition number of a matrix. Thus, it is conventionally used in MIMO channel matrices prior to linear detection at the receiver side [7]. In [8], a more accurate assessment of the free distance was attempted without consideration of all the possible candidate symbol vectors. Particularly, LR was applied to the product of the channel and precoder matrices before the application of a QRD-based technique. It should be noted that LR is performed at the transmitter side, whereas ML detection is performed at the receiver side. In addition, it should be emphasized that LR is only used to assess the free distance corresponding to a given precoder matrix; LR should not be performed at the receiver side.
In [9], inspiration was taken from the observation that the free distance of a given precoder matrix is highly likely to be achieved by a vector (to be exact, the difference between two distinct symbol vectors) with only a few nonzero entries. Thus, the free distance was calculated, considering the limited set of the candidate vectors. In [9], the search space was successfully reduced, hence reducing the processing time. It has been found that a sufficient error performance can be achieved when the number of streams is smaller than the number of the receiving antennas. Contrarily, the performance significantly decreases when the number of streams approaches the number of the receiving antennas, as will be discussed in Section V.
As is known, precoding encompasses a wide range of techniques and can be employed for various purposes. In [13], transmit precoding and signal detection techniques were combined in the scenario of relayed systems. In [14], to achieve high spectral efficiency, sparse code multiple access and faster-than-Nyquist signaling techniques were jointly used. The precoding optimality depends on the assumed receiver type. The two recent works, namely, [13] and [14], use the decision feedback receiver and iterative receiver, respectively. Thus, these two works cannot be directly applied to the optimal precoder selection problem with the maximum-likelihood receiver that we will address here.
In this paper, we establish a novel precoder selection technique that can achieve optimal error performance without the requirement of either the prohibitive complexity or long processing time of the conventional optimal precoder selection technique for SM MIMO systems using ML detection at the receiver. Without compromising optimality, the processing time of the conventional optimal precoder selection was reduced using four technical recipes.
First, in the calculation of the free distance of the precoder, only a small set of the difference vectors between the transmitted signal vectors are searched (instead of the set of all possible difference vectors). This does not compromise the optimality, because we exploit the symmetric structure of the QAM constellations.
Second, we exploit the sphere decoding (SD) concept to reduce the calculation time of the free distance of the precoder. SD is well known as an ML detection technique for SM MIMO systems at the receiver side [11]. In this paper, we also exploit the SD concept to calculate the exact free distance for the precoder matrix at the transmitter side. It should be noted that our proposed algorithm functions independently of the way in which ML detection is implemented at the receiver side.
Third, to reduce the search space, we eliminate the last stage of SD, which searches for the first element of a difference vector. In the conventional SD at the receiver side, Euclidean distances are important for all (or most) of the candidate signal vectors for the calculation of the log-likelihood ratio. However, only the minimum Euclidean distance (i.e., the free distance) needs to be calculated in the precoder selection technique that is introduced in this paper.
Finally, to further reduce the processing time, the SD-like process is performed in a selective manner. The free distance of a given precoder matrix does not have to be explicitly calculated when it is clear in advance that it is lower than that of another precoder matrix, as we only need to choose the precoder matrix with the largest free distance. In such a case, we skip the calculation of the free distance of the precoder, thereby reducing the processing time.
It should be noted that parts of our proposed work were reported as conference papers [11], [12]. In [11], only the above second and fourth technical recipes were used for SM MIMO systems. In [12], the work in [11] was extended to SM MIMO-OFDM systems that are composed of multiple subchannels; moreover, the similarities between adjacent subchannels were exploited. We focus on SM MIMO systems without extension to OFDM systems. To demonstrate the efficacy of the first and third technical recipes, we demonstrated the reduction in the processing time caused by the addition of two technical recipes in the simulation section. It can be observed that the algorithm in [11] is has a longer processing time than the LR-based technique [8], which, to the best of our knowledge, is the state-of-the-art technique. However, the LR-based technique has a longer processing time compared with the proposed algorithm using all four recipes. We reiterate that the works [11], [12] offer optimal error performance that is achieved by the proposed algorithm using four technical recipes.
The remainder of this paper is organized as follows. Section II describes SM MIMO systems that adopt a codebook shared by the transmitter and the receiver. Section III presents the conventional technique for optimal precoder selection. Section III reviews the previous techniques for suboptima precoder selection, including SVD-based, QRD-based, limited search space, and LR-based techniques. Moreover, Section IV proposes a novel SD-based precoder selection technique that achieves optimal error performance. Section V provides a set of simulations to validate the efficacy of the proposed technique. Finally, Section VI concludes the paper.

II. SYSTEM DESCRIPTION
In this section, we describe the precoded SM MIMO systems. Assuming N t and N r denote the number of transmitting and receiving antennas, respectively, the relationship between the transmitted and received symbol vectors can be expressed as follows: where x = [x 1 x 2 · · · x N s ] T denotes the transmitted symbol vector with N s (≤ min(N t , N r )) symbols, and y = [y 1 y 2 · · · y N r ] T with y j denoting the received signal at the j-th antenna. H denotes an N r × N t channel matrix, in which h ji denotes the standard unit power of the Rayleigh-fading complex gain between the i-th transmitting antenna and the j-th receiving antenna, whereas z = [z 1 z 2 · · · z N r ] T , with z j denoting the additive white Gaussian noise with zero mean and variance σ 2 z at the j-th receiving antenna. Finally, F l denotes an N t × N s precoder in a codebook F = {F 1 , F 2 , · · · , F L } that is available at both the transmitter and the receiver sides.
In this paper, we assume the square QAM constellations from which x i (i = 1, 2, · · · , N s ) are drawn. | | denotes the cardinality of , and P denotes the average power of .

III. PREVIOUS PRECODER SELECTION TECHNIQUES
The precoder can be selected to either maximize the capacity or minimize the error rate. In this paper, we focus on the minimum error rate criterion. Moreover, we review the previous techniques for precoder selection.

A. OPTIMAL PRECODER SELECTION TECHNIQUE
To minimize the error rate, a precoder can be selected as with d min (HF l ) = min where N s denotes the set of transmitted symbol vectors [2]. The precoder selection presented in (3) is optimal from the viewpoint of error performance. However, its processing time is very long due to the exhaustive search over combinations of x p and x q for each F l . There are numerous suboptimal precoder selection techniques proposed previously that avoid the prohibitively long processing time associated with the optimal precoder selection in (3) and (4).

B. SINGULAR-VALUE-DECOMPOSITION-BASED TECHNIQUE
Let σ k (HF l ) (k = 1, 2, · · · , N s ) be the k-th largest singular value of HF l . In addition, let the following hold: In [4], the following was demonstrated: Based on (6), selection of the precoder can be performed as follows: Compared with the conventional optimal precoder selection presented in (3) and (4), this SVD-based technique has a much lower complexity, although it has decreased the error performance due to the use of a lower bound instead of the exact free distance d min (HF l ). Furthermore, when N t = N r = N s , the performance cannot be increased using a unitary precoder (as in the LTE-A systems) as the singular values of H are not altered by multiplication with square unitary matrices.

C. QR-DECOMPOSITION-BASED TECHNIQUE
First, we assume the QRD Q l R l = HF l . In [5] and [6], the following was demonstrated: where r k,k (HF l ) denotes the k-th diagonal entry of R l . Based on (8), selection of the precoder can be performed as follows [5]: Compared with the SVD-based technique, thanks to the tighter bound in (8), the QRD-based technique achieves better performance [8]. Moreover, as will be discussed in Section V, less processing time is required by the QRD-based technique. To further enhance the error performance of (9), the columns of HF l can be permuted at the expense of additional processing time [6].
Although the QRD-based technique is capable of achieving a shorter processing time, it still suffers from degraded error performance due to the use of lower bounds instead of the exact free distance value d min (HF l ).

D. LIMITED SEARCH SPACE TECHNIQUE
In [9], the search space of (4) was reduced by considering only a limited set of d = x p − x q . The search space limitation was driven by the observation that it is highly probable that the minimum value of HF l d (i.e., the free distance of the precoder F l ) is achieved by a vector d opt l that has only a few nonzero entries. Thus, the search space was reduced by considering only the vectors d with one or two nonzero entries in the free distance d min (HF l ) calculation.
The limited search space approach offers a good tradeoff between error performance and processing time when the number of spatial data streams is smaller than that of the receiving antennas. Unfortunately, when the number of data streams approaches the number of the receiving antennas, its performance is significantly degraded.

E. LATTICE-REDUCTION-BASED TECHNIQUE
LR is conventionally applied to MIMO channel matrices prior to linear detection at the receiver side [7] owing to its capability to reduce the condition number of a matrix. In [8], the LR technique was employed in conjunction with the QRD-based technique presented in (9) in an attempt to more accurately assess the free distance without prohibitive complexity.
In LR, a unimodular matrix P l is observed, such that HF l P l has a smaller condition number. When used in conjunction with the QR-based technique presented in (9), the matrix P l satisfies the following inequality: The above inequality guarantees that the minimum value of r k,k (HF l P l ) is found to enable a more accurate assessment of the free distance d min (HF l ).
Among the various suboptimal precoder selection techniques discussed in this section, the LR-based technique offers the best error performance. For this technique, the achieved error performance is close to optimal, even when the number of spatial streams is the same as that of the receiving antennas.

IV. THE PROPOSED SD-BASED PRECODER SELECTION TECHNIQUE
In this section, a novel precoder selection technique that directly uses (4) rather than its various lower bounds (as used in the previously outlined methods) is proposed. This technique aims to maintain the optimality of the selection performance, which is especially important when N t = N r = N s . Moreover, we reduce the prohibitive complexity of (3) and (4) using four technical recipes, which are outlined in the following subsections. (3) and (4), where is defined as follows:

A. REDUCING THE SEARCH SPACE
The derived from the assumed square QAMs has the following symmetric structure: Figure 1 presents an example of the symmetric structure of , where 16-QAM and θ = π 2 are assumed. The set does not change when multiplied by exp(j π 2 ) (i.e., when rotated 90 • counterclockwise).
In Fig. 2, the cross symbols, except for the cross at the origin, denote the elements of + , which is defined as follows: Figure 2 demonstrates that the round, triangular, and square symbols denote the set + multiplied by exp(j π 2 ), exp(jπ ), and exp(j 3π 2 ), respectively. Since ||HF l x|| = ||HF l exp(jθ) x||, if x opt is the minimizer, then exp(jθ ) x opt (θ = π 2 ,π, 3π 2 ) are also considered as minimizers. Thus, redundant minimizers can be excluded from the search space without compromising the optimality.
N s ] be an optimizer, and assume that x opt N s is one of the square symbols without loss of generality. This assumption indicates that exp(j π 2 ) x opt is also an optimizer and that exp(j π 2 ) x opt N s is a cross symbol. Thus, we can confine x opt N s to being cross symbols without sacrificing optimality. It should be noted that all the other rotated symbols exp(j π 2 ) x opt i (i = 1, 2, · · · , N s − 1) belong  to the set and that any x opt i (i = 1, 2, · · · , N s ) can be confined to {0, + }. In addition, we determine x opt N s , which corresponds to the first stage of SD, to be a cross symbol. Now, the optimal precoder selection of (3) and (4) can be equivalently expressed as follows: with Reduced = {0, It should be noted that the size of Reduced in (14) is approximately one-quarter of that of the search space ( ) N s in (3) and (4). As long as the constellation is symmetric with respect to the real-axis and imaginary-axis of the signal space, the symmetric structure can be utilized for the reduction of the processing time. For example, the symmetric structure of MPSKs can also be exploited. We focus on the structure of QAMs due to their popular adoption in various contemporary communication systems.

B. ADOPTION OF THE SPHERE DECODING CONCEPT
SD is a well known ML detection technique that is capable of reducing the prohibitively long processing time of the exhaustive search of brute-force ML by considering only a small number of candidate vectors within a sphere [10], [16]. In this paper, the complex SD presented in [15] is applied to the precoder selection in (14) to further reduce the processing time.
The minimization portion of (14) is considered for a given F l as follows: where we assume the QRD HF l = Q l R l . The SD concept can be exploited to solve (16) in a computationally efficient manner by searching over candidate vectors of x inside the following sphere with the initial radius R init SD,l : The way in which the initial radius R init SD,l is set is important with respect to the processing time of SD. We propose the consideration of x ∈ {e 1 , e 2 , · · · , e N s } when deciding R init SD,l , where e i denotes the unit vector of length N s with 1 at the i-th position and 0s at all other positions. The initial radius is set as follows: where, r l i denotes the i-th column of R l in (17). In (18), we aim to set the radius as a small value to simultaneously reduce the processing time and guarantee a non-empty sphere. Setting the initial radius as in (18) ensures that at least one vector exists inside the sphere (17). For example, if R init SD,l = min i r l i = r l 2 , then at least VOLUME 8, 2020 x = e 2 is inside the sphere. The sphere radius only needs to be decreased when it has the initial radius presented in (18), unlike in the conventional SD, where the radius is increased to include at least one candidate vector when the sphere is empty. This is because the center point of the sphere (17) is always 0; however, the center point of the conventional sphere for the purpose of signal detection at the receiver side depends on the received signal vector that is random. Thus, setting a small initial radius and ensuring a non-empty sphere in the conventional SD are not straightforward.
Once all of R min SD,l (l = 1, 2, · · · , L) have been calculated, the precoder selection in (14) is performed as follows:

C. ELIMINATION OF THE LAST SPHERE DECODING STAGE
The detection of ML signal for SM MIMO systems with two spatial streams is addressed in [17]. It has been demonstrated that the optimal value of one stream can be determined as a function of the other stream during the Euclidean distance calculation. In the SD employed for ML signal detection at the receiver side, the Euclidean distances are necessary for all signal vectors to calculate the soft decision. However, in choosing the optimal precoder matrix, only the Euclidean distance of the optimal difference vector needs to be calculated. In the case of N t = N r = N s (to which we give particular attention in this paper), assuming the candidate vector [ x 2 x 3 · · · x N s ] in the (N s − 1)-th stage, x 1 can be determined as follows: where · denotes the nearest integer for both the real and imaginary parts. Thus, we can eliminate the last stage of SD for each F l , thereby speeding up the precoder selection presented in (19).

D. SELECTIVE SPHERE DECODING
The precoder selection presented in (19) requires R min SD,l (l = 1, 2, · · · , L) to be calculated. Assuming sequential calculations of R min SD,l from l = 1 to l = L, we consider the calculation of R min SD,l after calculating R min SD,k (k = 1, 2, · · · , l − 1). Moreover, we need to compare max 1≤k≤l−1 R min SD,k with R min SD,l , as we are searching for the largest sphere radius in (19). The unnecessary SD can be skipped without compromising optimality if it can be determined that R min SD,l is smaller than max 1≤k≤l−1 R min SD,k without explicitly calculating R min SD,l , thereby speeding up the algorithm. Figure 3 presents the concept of our selective SD. First, we consider the case in which R min SD,l < max 1≤k≤l−1 R min SD,k , as in Fig. 3(a). If any candidate vector is found inside the dashed sphere, such as R l x 3 and R l x 4 in Fig. 3(a), then further calculating the R min SD,l values that will be discarded in the calculation of max 1≤k≤l R min SD,k is no longer necessary. However, in the case that R min SD,l > max 1≤k≤l−1 R min SD,k , as in Fig. 3(b), no candidate vector is found inside the dashed sphere with a radius of max 1≤k≤l−1 R min SD,k . Thus, R min SD,l needs to be calculated based on the SD concept and is bound to be max 1≤k≤l R min SD,k . This selection process renders the precoder selection in (19) more computationally efficient.
It should be noted that a unique vector exists inside each of the minimum spheres, namely, R l x 3 in Fig. 3(a) and R l x 2 in Fig. 3(b). These vectors exist due to the exclusion of R l exp(jπ/2) x 3 , R l exp(jπ ) x 3 , and R l exp(j3π/2) x 3 in Fig. 3(a) and R l exp(jπ/2) x 2 , R l exp(jπ ) x 2 , and R l exp(j3π/2) x 2 in Fig. 3(b) by searching over Reduced instead of over ( ) N s . Algorithm 1 summarizes the proposed precoder selection technique. The minimization presented in (16), which exploits the SD concept using a sphere with an initial radius R init SD,l , is denoted as SD(HF l , R init SD,l ). It should be noted that the minimization SD(HF l , R init SD,l ) in Algorithm 1 does not contain the last stage of conventional SD at the receiver side.

V. SIMULATION RESULTS
In this section, we provide the simulation results of the various precoder selection techniques, which are outlined in this paper, to demonstrate the efficacy of our proposed algorithm, comparing the techniques in terms of error performance and processing time. In addition, we assume SM MIMO systems with ML detection at the receiver side and adopt the unitary codebook used in LTE-A systems [1]. Table 1 presents the precoder selection techniques compared in the simulation.

A. SIMULATION ENVIRONMENT
It is assumed that the elements of a 4 × 4 channel matrix H were independent and identically distributed complex Gaussian random variables with zero mean and variance σ 2 z . A 16-QAM constellation was also used. For the limited search space technique, the maximum number of nonzero entries for the difference vectors and their norms were limited to 2 and 9, respectively. For the LR, the complex Lenstra-Lenstra-Lovász (CLLL) algorithm was used, and its parameter δ was set to 0.99 [8], whereas the number of channel columns to be permutated was 3 when N s = 4.
It should be noted that even though the LTE-A codebook that corresponds to N t = N s = 4 is composed of 16 matrices F = {F 1 , F 2 , · · · , F 16 } [1], only five precoder matrices {F 1 , F 2 , F 5 , F 6 , F 13 } were considered in choosing the optimal precoder. The reason for this is that each of the other matrices was equivalent to one of these five matrices in terms of ||HF l x|| 2 . For example, ||HF 3 x|| 2 = ||HF 1 D x|| 2 , where D is given as follows: Due to the symmetric structure of x, F 1 is also optimal when F 3 is optimal.
To run the simulations, we used MATLAB on a 3.80 GHz Intel Core i7-9800X CPU with 32 GB of RAM under the Ubuntu OS.

B. COMPARISON OF ERROR PERFORMANCE
In this section, we compare the precoder selection techniques in terms of the bit error rate (BER) vs. the SNR. At each receiver antenna outlined in (1) with unitary precoding, the SNR is given by 1/σ 2 z . Here, two cases were considered: N s = 2; and N s = 4. Figure 4(a) compares the error performance of the techniques when N s = 2. From the figure, it can be seen that all of the precoder selection techniques, except for the SVD-based technique, achieved near-optimal performance. When compared with the non-precoded system, the improvement in the error performance achieved by all of the techniques was quite dramatic. At a BER of 10 −5 , the SNR gain was approximately 6 dB. Thus, it can be inferred that simple techniques for precoder selection are sufficient if the number of spatial streams is much smaller than that of the receiving antennas.  As the number of spatial streams N s approaches the number of the receiving antennas, care should be taken in choosing a precoder selection technique. Figure 4(b) compares the error performance of the various techniques when N s = N r = N t = 4. From the figure, it can be seen that the three suboptimal techniques (i.e., the QRD-based, SVD-based, and limited search space techniques) offered only negligible SNR gain. Concurrently, the LR-based technique and the proposed SD-based technique achieved a sufficient error performance. Based on the simulation results presented in Fig. 4(b), it can be inferred that a sophisticated precoder selection technique, such as the LR-based technique or the proposed SD-based technique, is needed when the number of data streams is close to that of the receiving antennas. The processing time and error performance need to be considered in choosing an algorithm. This is the issue addressed in the following subsection V-C. Figure 5 presents the optimality of the proposed precoder selection technique. In the simulations, random independent 4 × 4 MIMO channels were generated, and a precoder was selected among {F 1 , F 2 , F 5 , F 6 , F 13 } using the optimal method in III-A, the previous LR-based method in III-E, and the proposed method. Moreover, Figure 5(a) demonstrates that the selected precoder using the proposed method (denoted as empty circles) is the same as the optimal precoder selected using the conventional time-consuming optimal precoder selection method (denoted as small black dots). In addition, Fig.5(b) demonstrates that the previous LR-based technique chooses non-optimal precoders quite often, which are denoted as empty circles. Although not presented in Fig. 5, it can be observed that the proposed method sometimes selects a non-optimal precoder for specific MIMO channels. However, in such cases, the difference of the minimum distances (4) of the two different precoders was in the order of machine epsilon. This means that it is the finite-precision problem. Thus, we argue that Fig. 5 successfully demonstrates the optimality of the proposed method.

C. COMPARISON OF THE PROCESSING TIME
In this subsection, we attempt to confirm the cost-effectiveness of the proposed technique in terms of its processing time. In Fig. 6 the required processing time of all the simulated precoder selection techniques are compared. Assuming the same system parameters as in the simulations presented in Fig. 4, we measured the average processing time for the precoder matrix selection over 1000 MIMO channels. In Fig. 6, various techniques for precoder selection have been compared in terms of their processing time. For both cases (N s = 2 and N s = 4), the previous limited search space technique had the longest processing time, whereas the previous SVD-based and QRD-based techniques had short processing time. However, as presented in Section V-B, the three suboptimal techniques did not achieve sufficient error performance when N s = N r = 4. By comparing the processing times presented in Fig. 6, it can be inferred that the proposed SD-based technique had a shorter processing time than the LR-based technique for both N s = 2 and N s = 4. Considering the superior error performance of the proposed SD-based technique presented in Fig. 4 and the processing time comparison in Fig. 6, it can be concluded that the proposed SD-based technique achieves the best tradeoff between error performance and processing time when N s = N r = 4. Moreover, Figure 6(b) presents the processing time of the proposed scheme using only two technical recipes, the adoption of SD in Section IV-B and selective SD in Section IV-D, which corresponds to our previous works [11], [12] and is denoted as Proposed (Intermediate) in the figure. When using only two recipes, it can be observed that it is outperformed by the LR-based technique, which is found to be the best conventional technique.
One interesting phenomenon observed in Fig. 6 is that the processing time of the previous SVD-based and QRD-based techniques decreased when N s was increased from 2 to 4. This is because when N s = 2, the codebook is composed of 16 matrices of size 4 × 2, whereas when N s = 4, it is composed of 5 matrices of size 4 × 4. However, the processing times of the limited search space, LR-based, and proposed SD-based techniques increased when N s was increased from 2 to 4. This is due to the fact that the increase in the processing time of the three techniques for the given precoder matrix was greater than the decrease in the processing time due to the reduction in the codebook size.

VI. CONCLUSION
In this paper, we used the SD concept to propose a novel technique for optimal precoder selection. Moreover, we exploited the difference between the conventional SD used for signal detection at the receiver side and the SD-like process used for precoder selection (as addressed in the paper). The algorithmic changes applied to the SD did not compromise its optimality in terms of error performance. Thus, unlike the various existing suboptimal techniques, the proposed SD-based precoder selection technique achieves optimal error performance for ML detection in SM MIMO systems. When the number of spatial streams approaches the number of the receiving antennas, the superior error performance of the proposed technique becomes significant. Using computer simulations, we also demonstrated that the proposed SD-based technique achieves a good tradeoff between error performance and processing time.