When Should We Use Geometrical-Based MIMO Detection Instead of Tree-Based Techniques? A Pareto Analysis

The soft-output multiple-input multiple-output (MIMO) detection problem has been extensively studied, and a large number of heuristics and metaheuristics have been proposed to solve it. Unlike classical tree-search based detectors, geometrical heuristic algorithms involved two consecutive steps: (i) an exploration step based on the geometry of the channel matrix singular vectors; (ii) a local exploitation step is performed in order to obtain better final solution. In this paper, new enhancements for geometrical heuristics are introduced to significantly reduce the complexity in quadrature phase-shift keying (QPSK) and allow 16 quadrature amplitude modulation (QAM) capability through new exploration techniques. The performance-complexity trade-off between the new detector and two tree-based algorithms is investigated through Pareto efficiency. The Pareto framework also allows us to select the most efficient tuning parameters based on an exhaustive search. The proposed detector can be customized on the fly using only one or two parameters to balance the trade-off between computational complexity and bit error rate performances. Moreover, the Pareto fronts demonstrate that the new geometrical heuristic is especially efficient with QPSK since it provides a significant reduction in regards to the computational complexity while preserving good bit error rate (BER) performance and ensuring high flexibility.


I. INTRODUCTION
In the last decades, the increase in the quantity of data sent over wireless channels has led to a shortage of available frequency bands. This scarcity has driven researchers and operators to improve the spectrum efficiency of wireless communications systems and, more recently, seek new frequency bands with THz technologies. Specifically, spectrum efficiency improvement increases the data throughput and the link quality without using new frequency bands.
In this context, MIMO systems have been widely adopted for their capability to multiplex transmitted data streams over time-frequency-space dimensions. The spatial multiplexing MIMO technique is mainly used to increase the data trans-The associate editor coordinating the review of this manuscript and approving it for publication was Ahmed Mohamed Ahmed Almradi . mission rate or spectral efficiency. The space-division multiplexing (SDM) MIMO technology can transmit several data streams in the same time-frequency slot and separate them according to spatial considerations. This multiplexing technique increases the spectrum efficiency through the addition of antennas. However, the more antennas there are, the more complex the receiver design is. Therefore, these systems require new algorithms to exploit the spatial information to separate data streams efficiently.
WiFi standards like IEEE 802.11n/ac, long-term evolution (LTE), WiMAX, and 5G, among other modern standards, rely on MIMO technologies. All of these standards use known pilot signals in order to estimate the channel state information (CSI). It is common to assume that the receiver gets a perfect CSI whereas the transmitter is CSI-agnostic. This operating regime is easier to set up as it does not require each CSI to be sent back to each emitting antenna. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ The separation of received streams has been widely studied, and many algorithms have already been proposed in the literature. This detection problem is known to be NPhard [1], which implies that an optimal solution cannot be computed in polynomial time (unless under the unattainable assumption of P=NP). Such optimal algorithms include the naive resolution with the maximum likelihood (ML) detector or the optimal tree-path search using sphere decoding (SD) as initiated by [2]- [4]. However, their exponential complexity does not make them suitable for hardware implementation, especially at low signal-to-noise ratio (SNR) regimes. Still, it should be noted that some SD implementations can compete with a polynomial algorithm in terms of complexity when the number of antennas and the constellation size are rather small and when the SNR is high enough [5].
Several approaches have been considered to offer heuristics and metaheuristics that provide good performance in polynomial time. Under ideal circumstances where the amount of information available is significant, linear detectors provide an acceptable result. In more difficult cases, more advanced algorithms are necessary. The earliest heuristics are based on the interference cancellation between the different received data streams. Two versions of this heuristic coexist. The succesive interference cancellation (SIC) detection scheme suppress the interference by iteratively maximizing the signal-to-interference-plus-noise ratio (SNR). This approach is well suited when the received signals present different individual quality metric corresponding to each of the received data streams [6], [7]. In the opposite scenario, when data streams have similar quality parallel interference cancellation (PIC) detectors are preferred [8], [9].
The detection problem turns out to be a complicated combinatorial optimization problem, Thus, to circumvent and simply solve this problem, tree-based heuristics algorithms can be used. These algorithms are classified according to the method of searching the tree: depth-first algorithms look for the best possible leaf through a descent, prune and backtrack process; breadth-first algorithms keep only a fixed number of paths at each step [10]- [13]; and best-first algorithms exploiting metrics to determine how to explore the tree [14], [15].
Alternative solutions have been proposed thanks to a shift in the problem perspective. For instance, detectors based on Markov chain Monte Carlo (MCMC) algorithms have been developed by addressing the problem through a probabilistic approach [16], [17]. The emergence of deep networks trained to tackle the detection problem is also studied in recent work [18]. Finally, bio-inspired metaheuristics based on ant colony optimization (ACO) [19] or on the firefly algorithm (FA) [20], [21] have also been proposed.
We previously investigated the interest of geometrical heuristics to solve the MIMO detection problem [22]. We compared the geometrical approach and the tree-based one for a bit interleaved coded modulation (BICM) scheme with QPSK modulation. The criteria of BER and complexity revealed that the geometrical method was close but not yet as good as the state-of-the-art detectors.
The geometrical approach provided in [22] was restricted to lower-order constellations (i.e. binary phase-shift keying (QPSK) and QPSK) whereas higher-order modulations were not investigated at all. Moreover, the geometrical heuristic was compared with a tree-based reference on both BER performance and complexity but these two criteria were not considered simultaneously within a trade-off perspective. This paper presents two new contributions that help to overcome the highlighted limitations present in [22]: • Enhancements to improve geometric heuristics on QPSK and extend its use cases to high-order modulations (e.g., 16-QAM) • Performance-complexity trade-off study using the Pareto efficiency. The paper is organized as follows. In Section II, the MIMO system model under consideration is presented. In Section III, a detailed description of the soft-output computation is given, while in Section IV, we provide the geometrical-based MIMO detection framework. The proposed enhanced geometrical-based detection algorithm is presented in Section V. A detailed performance-complexity tradeoff analysis of the proposed algorithm through Paretofront curves is given in Section VI for both QPSK and 16-QAM. Section VII concludes the paper.

1) NOTATIONS
In the following, bold uppercases (resp. lowercases) denote matrices (resp. vectors), whereas the other letters refer to scalars. All sets are noted with calligraphic uppercases, and the corresponding lowercase refers to their cardinality. The vector h j denotes the j th column of matrix H and H T , denotes the transpose H. The natural logarithm is denoted by ln. The real and imaginary part of a complex number a ∈ C are denoted by (a) and (a), respectively. We denote the l2-norm of vector y as y .

II. TRANSMISSION MODEL
Let the N × N Rayleigh-fading MIMO system transmitting N data streams to N receiving antennas. We assume a perfect CSI at the receiver (ideal channel estimator), whereas the transceiver is CSI-agnostic. From a model perspective, the receiver knows the channel matrix H c ∈ C N ×N where h c (i, j) is the complex channel gain from antenna j to antenna i. The MIMO channel is modeled as quasi-static block fading where the channel path gains can be considered constant during a large block comprising hundreds of transmitted vectors [23]. The channel gains change according to a statistical model given by an independent Rayleigh-distributed envelope. Therefore, the computational complexity of any pre-processing phases, such as singular value decomposition (SVD) or QR decomposition, pseudo-inverse calculation, can be negligible. Indeed, they are performed once for several transmitted vectors.
Let Q c be the set of all constellation symbols from a square QAM. We denote by y c ∈ C N the signals received on each antenna after the propagation of the symbols x c ∈ Q N c through the channel and after the addition of the complex gaussian noise w c ∼ CN (0, σ 2 ). With this notations, the system model is expressed as This model can be rewritten as an equivalent real-valued expression such that where the new parameter n defines the size of the real-valued matrices and vectors. It can be interpreted as the number of real-valued data streams. Ideed, switching from a complex-valued to a real-valued perspective is equivalent to process real and imaginary parts independently. In this real-valued model, (1) becomes the following equivalent system model: The BICM transceiver is composed of an encoder, an assumed-perfect interleaver, and a modulator. The receiver uses the corresponding components in a reversed order: demodulator, deinterleaver, and then decoder.

III. SOFT-OUTPUT COMPUTATIONS
Let b ij be the i th bit encoded in the j th symbol of x. Basic hard-output detectors provide an estimate of the vector of transmitted symbols. To improve the performance, soft-output detectors search for the log-likelihood ratio (LLR) for each bit defined as with P(b ij |(H, y)) the probability mass function of b ij given the channel state and the received vector. Expression (9) is not suitable for practical use as it has been shown that its computations is exponentially complex [24]. A common solution is to use the max-log approximation: where This new expression (10) is still exponentially complex as the norm must be computed for each point in the constellation. Indeed, we clearly have X 0 ij ∪ X 1 ij = Q n which contains 2 n points. Therefore, we introduce a new subset S ⊂ Q n with a lower cardinality and approximate (10) on it. The new expression becomes The reduced subset S may have no points from X k ij (i.e., S ∩ X k ij = ∅). In such a configuration, we assign k to the bit b i,j by the fact that the probability P(b ij = k|(H, y)) is considered equal to one. Therefore, L ij is set to its maximum In the remainder of this paper, the objective function denotes the squared norm involved in the LLRs, and a point is considered better than another if it has a smaller objective function.

IV. GEOMETRICAL-BASED DETECTION
Soft-output geometrical heuristics are based on three main steps: exploration, exploitation, and the LLRs computation. This section reviews the first two-step as the last phase has already been discussed in the previous section.
Exploration and exploitation steps are both designed to search for feasible solutions into Q n . They can be viewed as coarse and fine search methods. Coarse search step (exploration) is performed over whole solution set Q n in order to find pertinent solutions, whereas exploitation step (fine search) step refines the quality of the solutions through local searches.

A. SVD-BASED EXPLORATION
The exploration step produces a set P of promising points to be exploited in the next step. Let H = U V be the SVD of H with (U, V) two orthogonal matrices and a diagonal one containing the singular values. Without loss of generality, we assume that the singular values are sorted in ascending order: Let x ∈ R n be the real vector minimizing the objective function. This point can be obtained using the Moore-Penrose inverse H + through x = H + y. The regular inverse can be use if the channel matrix is well conditioned, if not, the Moore-Penrose invert is required. We can rewrite the objective function as Given that U is orthogonal, the previous equation gives As V is orthogonal, its columns {v i } i=1...n constitute a basis and we can introduce α i the coordinates of x −x on this basis. The matrix is diagonal, then the objective function can be expressed as which provides hints on the objective function evolution. The ascending order of the singular values induces that the first α i are less impacting than the last ones. Therefore, points with similar coordinates except for the first ones, can be considered equivalent with respect to their objective function. Exploration step aims for building the promising set P from points as different as possible but with equivalently low objective function. The promising set P is also used to initialized S.

B. ITERATIVE LOCAL EXPLOITATION
The exploitation step gets the most of the previously selected points P by exploring each promising feasible solution belonging the subset P. For each promising point in the subset, an iterative local search is perform to find better feasible solutions around the starting points. One iteration can be decomposed into three steps: 1) Generate at most 2n new points equal to the candidate but for one coordinate. This modified coordinate is once the previous and once the next symbol in the real-valued constellation. If the coordinate to be modified is already at an extreme value (ie. the first or last symbol in the constellation), then a single point is generated. 2) Compute the objective function for each generated feasible solution and add them to S. 3) Select the best feasible solution between the initial candidate and the newly generated points. The best point becomes the new starting candidate for the next iteration. If there is a tie, prefer the initial point. It is easy to prove that this algorithm reaches a stable point. First, each iteration decreases the objective function over the finite set Q n of possibilities. Moreover, oscillations between two equally good points are prevented by the tie rule. Therefore, this process ends on a stable point that is not guaranteed to be the global optimum. The performance-complexity trade-off can be tuned by stopping the algorithm after a predefined number of iterations rather than waiting for a stable point.

V. ENHANCEMENTS TO GEOMETRICAL DETECTION
In this section, we present some enhancements to the geometrical-based detection framework that was presented in the previous section. Section V-A describes the previous exploitation technique and highlights its limitations, then Section V-B provides the new exploration techniques to build the set of promising feasible solutions. Section V-C introduces a lossless method to reduce the complexity and Section V-D summarizes the proposed geometrical detector and provides the algorithmic computational costs of the detection process.

A. PREVIOUS EXPLORATION TECHNIQUE
The SVD-based exploration is the core step of the geometrical heuristic since it provides a good set P of promising point is the key to the success of the following steps. In the previous works, P was build using an intersection-based process [22], [28]. Fig. 1 illustrates this exploration technique with n = 2 and a 16-QAM modulation scheme. The process starts by computing the line passing through x and directed by the singular vector v 1 . The dashed blue line represents this straight line. Then, each intersection of this line with the basis axis is projected on the 16-QAM constellation to build the subset P. This process is illustrated by the purple arrows. Fig. 1 shows the process with one direction for readability's sake, but real detectors apply this method for several directions. As an example, we used 3 directions for a 4 × 4 channel in [22] or 4 directions for a 30 × 30 channel in [28]. The exploration technique is able to find promising points as the elements of P should have similar objective function values but it has two major drawbacks. First, the construction of P implies that it contains only points that are near the basis hyperplanes. For instance, the point (−3, −3) cannot be reached by this exploration process, regardless of its quality, since no intersection could be projected to this point. That is why, this method performs well with QPSK, where all points are accessible but that it leads to poor results with higher-order modulation schemes. Besides, this exploration technique is very complex, with large channels. Indeed, building P requires to compute the intersection of several straight lines with the n basis hyperplanes. Therefore, the more n growth, the more intersections are needed. That is why the new exploration technique should rely on steps that do not increase in complexity with additional antennas.

B. PROPOSED EXPLORATION TECHNIQUES
As exposed in the previous section, the exploration technique presented in [22] has two significant drawbacks. In this section, we propose new exploration techniques that solve these issues. Indeed, the new methods are able to select, with no preferences, each point of the constellation so that it could perform well on higher-order modulation schemes. Moreover, they are composed of steps that are independent of the number of transmitted data streams n.

1) INTERSECTION-LESS (IL) EXPLORATION
As stated in Section IV-A, the exploration builds a subset of promising points P ⊂ Q n as different as possible with equivalently low objective function. On the first hand, (16) shows that equivalent points differ only on their first coordinates when expressed on the basis {v i : 1 i n}. On the other hand, x is obviously a good point since it is the global optimum on R n . Therefore, the sought equivalently good points can be constructed by adding to x some linear combinations of the first v i . Let n d n be the number of v i used during this new exploration step.
To have equivalent points, we build them as with f some scalar whose value will be discussed later. Expressed in the V basis and after the translation to x , these points' coordinates are Equation (16) highlights that these points have exactly the same objective function being f 2 n d . However, these points do not belong to Q n so that they cannot be used directly to build the subset P. Therefore, the promising set is built by taking the nearest value in Q for each coordinates. This last step is the same used in the zero-forcing (ZF) detector and will be referred as the ''projection on a set'' in the remainder of this paper. Fig. 2 and 3 provide examples of the intersection-less exploration process for a 16-QAM, f = 1 and n = 2. The exploration process starts at its center x . Blue arrows represent the addition of the scaled singular vectors that appear in (17) and purple ones correspond to the projection on the constellation. Fig. 2 illustrates the case n d = 1 where two different points are obtained: (−3, −1) and (−1, 3). Fig. 3 shows the same scenario with n d = 2. In this situation, three points are generated: (−3, −1), (−3, 1) and (−1, 3).
The points from (17) can be either within or outside the constellation regarding the magnitude of each singular value. That is why the projection can produce either good or  bad points. Fig. 2 and 3 show an ideal case whereas Fig. 4 presents a poorly conditioned case with λ 1 λ 2 . In the latter, the explored points are not at all promising, which compromises the construction of P. For this reason, we introduce the scaling coefficient f in (17) which selects the sampled iso-value. A large f induces a large iso-value and thus widely spaced points whereas a small scaling coefficient produces closer points. Thus, the proposed exploration uses a few different scales to increase the opportunity that at least one of the iso-values will generate suitable points.

2) SUB-CONSTELLATION PROJECTION
It is required that S includes the points minimizing each of the two terms for (11) to be equivalent to reference (10). If not, the result of (11) is a close but inexact approximation of (10). The described exploration is not guaranteed to yield VOLUME 8, 2020 these two minima. Thus, the process can be modified to force the search for points in each sub-constellations X k ij . This goal is easily achieved by altering the final projection. The point construction steps from x and singular vectors can be kept as it. Subsequently, the projection is no longer performed on Q n but on every set X k ij . The overhead of this method is to move from 1 projection to a 2nq projections.

C. COMPLEXITY REDUCTION
As for the majority of detectors, the overriding computational cost is the objective function evaluation. Thus, any refinement of this step greatly decreases the complexity. The efforts are focused on the number of products since this type of operation is considerably more complex than an addition.
The products involved during the objective function evaluation are due to the squares in the norm and to the computation of with h j the j th column of H. A naive computation of (19) would require n products for each h j x j and then n 2 additions to add up all the vectors. However, it is noteworthy that the values x j are in the finite set Q so that there are only q possibilities for the product h j x j . For instance, with Q = {−3, −1, 1, 3} (i.e., using a 16-QAM), a product h j x j can either be −3h j , −h j , h j or 3h j but nothing else. Generally speaking, there are q different possibilities. Based on this property and remarking that H is known for a whole block, we can preprocess and store the q possible products h j x j . This is equivalent to preprocess and store the matrices −3H, −H, H and 3H for 16-QAM and in the general case, to store and compute the set {sH : s ∈ Q}. Then, the computation of (19) become the sum of n known vectors and does not require any product at all.
This technique dramatically reduces the amount of product required to decode the symbols at the cost of qn 2 coefficients storage. Indeed, one have to store the channel matrix H multiplied by each possible symbol in Q. This storage space and the number of precalculations can be halved in ordinary situations since the usual constellations allow to obtain the negative symbols by changing the sign. Continuing the previous 16-QAM example, storing H and 3H is enough to compute (19) with only additions since −H and −3H can be obtained by substracting rather than adding the preprocessed vectors. The lower part of the figure illustrates the detection process for a symbol from the received vector and the estimated noise variance. During the exploration step, the optimal vector is first derived using the pre-calculated H + . It is then added to the sum n d i=1 ±f λ i v i that is also precomputed by the previous preprocessing. These points are then projected on the constellation and optionally on the sub-constellations, as described in Section V-B. All these points are used to initialize the subset S. During the exploitation step, some local search iterations are performed to add new promising points to S (see Section IV-B). Eventually, the LLRs are estimated using the approximation of (11). Table 1 details the complexity for each steps of the proposed geometrical detector with p = 2 n d n f the number of promising points in P and n f the number of scaling factor used. The preprocessed operations computed at a block level are neglected.

VI. COMPARISONS WITH TREE-BASED REFERENCES
The proposed algorithm is to be compared with the reference detectors according to two criteria: BER performance and computational complexity. Indeed, highly complex detectors are required to provide the best performance in unfavorable use cases (low SNR regime). On the contrary, a simpler algorithm is preferable to increase the transmission rate under favorable conditions. Besides, the flexibility of the algorithm is an attractive feature for matching the performance-complexity trade-off to the use cases. Section VI-A described the tree-based reference to be simulated using the Monte-Carlo setup proposed in Section VI-B. Section VI-C introduces the Pareto efficiency and Section VI-D provides the comparisons using this framework.

A. REFERENCE TREE-BASED DETECTORS
The tree-based paradigm represents the detection problem as the search for the best path in a weighted tree. Any path starting from the root and reaching a leaf represents a decoded vector x whose total weight corresponds to the objective function. Each node has as many children as the number of symbols in Q. Selecting a node at each level corresponds to detect the corresponding component of x. In the following, we refer to ''extend a node'' the process of computing the children's objective function of a particular node to extend the current path.
As discussed in the introduction, tree-based detectors can be classified according to the method of searching into the tree. In this section, we describe two tree-based references: a canonical breadth-first K-best with Schnorr-Eurner enumeration [11] and a state-of-the-art best-first detector [15].

1) BREADTH-FIRST DETECTION: K-BEST ALGORITHM
The breadth-first approach builds paths from the root to the leaf with no backward step. At each level of the tree, each surviving path is extended. The K best new paths are preserved to be expended at the next level while the others are pruned. The process ends when it reaches the leaf level. Table 2 details the computational complexity of the K-best detector as described in [11] with two metrics: the number of products and the number of additions. The complexity grows linearly with K and quadratically with n. It is assumed that the QR decomposition is performed on a block basis and therefore, can be neglected. Moreover, some preprocessing can be performed on a block basis to divide by a factor of about five the required K to reach a specific BER vs. SNR performance. As in [11], we refer by mode 1 the algorithm without any preprocessing and by mode 3 the one with the best preprocessing. Step numbers refer to the one described in Section III-B from [11].

2) BEST-FIRST DETECTION: CROSS-LEVEL PARALLEL TREE-SEARCH
We describe in this section the algorithm from [15]. The best-first approach selects the path to extend based on the partial cumulative weight of a node rather than exploring straight to the leaf (as in depth-first paradigm) or rather to keep a specified amount of paths (as in breadth-first paradigm). The cross-level variant keeps track of the best nodes at each level through several stacks of finite length. At each iteration, the best node from each stack is popped out, extended, and the best siblings and the best child are inserted in the corresponding stack. If a stack reaches its maximal length, then the worst path is pruned. The process ends when all the stacks are empty. Moreover, the algorithm saved the cumulative weights of the best leaves so far and used them to prune paths that are already worst. This pruning criteria inspired by SD, avoids the extension of paths that are known to lead to worst solutions.
Unlike K-best detection, the best-first cross-level algorithm does not expand the same number of nodes at each run. Indeed, the number of paths pruned due to stack overflows or to poor cumulative weights depend on H and y. That is why it is not possible to find a closed-form for its complexity. Table 3 details the complexity for the three steps and neglects the QR decomposition for the same reason that for K-best. Reference [15] reports that the algorithm visits, on average, hundreds of nodes per detection with n = 4 and a 64-QAM. In the following, we will use the average observed complexity among all the detections as the complexity of this algorithm. This metric is computed at run time using the data from Table 3. The message is encoded in blocks of 720 bits with an irregular, systematic low-density parity-check (LDPC) code of ratio 1/2. The parity check matrix is designed according to the WiMAX standard (IEEE 802.16e) [29]. All receivers exploit the generated LLRs to decode the message with 15 iterations of a belief-passing min-sum algorithm. All the simulations are run using the CommPy framework [30].

C. PARETO FRONTS: A TOOL TO STUDY TRADE-OFFS
The detectors are compared based on the Pareto efficiency to study the performance-complexity trade-off objectively. A detector is said to be Pareto efficient if it is impossible to find an alternative detector that reduces either the complexity of the BER without losing on the other metric. The set of all Pareto-efficient detectors, called the Pareto front, represents the trade-off options. Indeed, switching from a Pareto-efficient detector to another means to favor complexity or BER in the trade-off. Conversely, a detector that is not Pareto efficient should not be selected because one can improve at least one of the metrics without losing on the other.
The computational complexicity is assessed as the number of products alone and the number of operations (products plus additions). The first situation refers to implementations on application-specific integrated circuits (ASICs) where the overriding complexity is the number of products due to its difficulty. The second comparison is better adapted to field-programmable gate arrays (FPGAs) since embedded digital signal processors (DSPs) can compute a free addition when computing a product. Table 4 lists all the parameters tested during the described Monte-Carlo simulations for both QPSK and 16-QAM. All combinations of the listed parameters are tested for the geometrical detector. To improve the readability of the figures, only a subset of these parameters are plotted in the following sections. In any case, all the detectors claimed to be Pareto efficient are Pareto efficient for all the parameters from Table 4.

D. SIMULATION RESULTS
The preprocessing described in Section V-C could be adapted to simplify the tree-based references. The results will be provided in two scenarios to permit a meaningful comparison. Firstly with the strict implementation of references and using preprocessing only for geometric detectors in Section VI-D1 and VI-D2. Then by extending the preprocessing to all the compared algorithms in Section VI-D3. Fig. 6 and 7 present the Pareto efficient algorithms for a QPSK. The first one approximates the complexity by the number of products, whereas the second one estimates it by the number of operations (products plus additions). Table 5 details the parameters corresponding to the Pareto efficient detectors for the number of product/BER trade-off.

1) COMPARISONS FOR QPSK
For readability reasons, only a subset of the geometric detectors tested is shown. The plots are restricted to detectors using the sub-constellation projection presented in Section V-B2 with two scale factors (f = 0.25, f = 4.0) and no iterations. Indeed, these parameters include the geometric Pareto efficient detectors.
These parameters highlight that for a very small constellation, the geometrical detector should handle the performance-complexity trade-off using the number of dimensions n d . Indeed, the receivers can select the number of dimensions based on the measured SNR, build the promising set P using the sub-constellation variant, and then apply the max-log approximation without further exploitation. Fig. 6 shows that the proposed algorithm requires ten times fewer products that the canonical K-best mode 1 at the cost of 0.7 dB. Moreover, the geometrical allow for an   on-the-fly configuration since changing n d is enough to select the working point. Fig. 7 shows tighter spreads with the three detectors being efficient at some SNR range.
2) COMPARISONS FOR 16-QAM Fig. 8 and 9 provides the same trade-off analysis for a 16-QAM and Table 6 details the parameters of the Pareto efficient detectors. In this section, we only plot the geometrical detectors with n d = 2 and no sub-constellation projection. It is no more efficient to project on each sub-constellation as the number of projection required grows linearly with the constellation size. Therefore, the overhead is no more neglected when switching from QPSK to 16-QAM. Table 6 highlights that the on-the-fly configuration relies on the number of iterations and the number of scaling factors  when using a 16-QAM. Both Pareto fronts show that the geometrical detector is efficient at some SNR ranges. However, the complexity gap between tree-based and geometrical detectors is not as important as for QPSK.
The hierarchy observed in QPSK remains the same for 16-QAM scheme: • canonical K-best remains effective for the worst SNRs with very high K , • best-first is effective but not flexible, • the geometrical algorithm provides good results and adaptability for moderate SNRs. Fig. 10 and 11 represent the same comparison when all detectors are using the same preprocessing as described in section V-C. For tree-based algorithms, this results in a halving of the number of products. Indeed, the norm computation now requires only products for the squaring. In this scenario, the Pareto efficient detectors were all Pareto efficient in the previous one, and we refer to the previous table for the parameters. We do not know any tree-based implementation featuring this kind of preprocessing. However, [15] reports in section IV-C1 a computation method leading to an equivalent decrease in complexity. Indeed, the products Hx, are computed only by shifts and additions, which restricts similarly the number of products required. Fig. 11 illustrates that although being Pareto efficient, the geometrical detector is not a wise choice in 16-QAM   when preprocessing is possible for the tree-based detectors. K-best is preferable to save in SNRs, and best-first provides a significantly simpler option. Fig. 10 shows that the proposed geometrical detector is particularly relevant in QPSK. The complexity gain is considerable, and the trade-off between computational complexicity and BER peformance can be easily achieved with few parameters.

VII. CONCLUSION
In this paper, we proposed an extensive comparison of a new geometrical detector with tree-based references. The performance-complexity trade-off is studied using the Pareto efficiency framework. We presented new exploration techniques to improve the performance-complexity trade-off and extend the geometrical detectors to higher-order modulation schemes. Moreover, a preprocessing method is introduced to reduce further the number of products required.
The Pareto fronts show that K-best is suitable in the worst SNR regimes, whereas the geometrical detector and best-first are efficient when SNRs are moderate. Besides, the proposed detector outperforms the two references in the QPSK scenario by providing a significant gap in complexity and by allowing a simple on-the-fly configuration. Moreover, the gap is large enough so that switching from mode 1 to mode 3 is not enough for K-best to reach the same performance as the proposed algorithm.