Approximate Soft Successive Detection for Grassmannian Product Superposition Coding

We consider non-coherent single-input multiple-output transmission using a previously proposed Grassmannian product superposition constellation. We develop a novel soft successive detection (SSD) for this structure, which allows to effectively combine the considered product symbol constellation with available forward error correction channel coding schemes that support soft decoding. We derive bit-level log-likelihood ratios (LLRs) for SSD and propose three approaches for successive approximate LLR computation of increasing accuracy and complexity. We demonstrate the performance of the proposed approach by numerical simulations.


Approximate Soft Successive Detection for Grassmannian Product Superposition Coding Stefan Schwarz , Senior Member, IEEE, and Michael Girsch
Abstract-We consider non-coherent single-input multipleoutput transmission using a previously proposed Grassmannian product superposition constellation. We develop a novel soft successive detection (SSD) for this structure, which allows to effectively combine the considered product symbol constellation with available forward error correction channel coding schemes that support soft decoding. We derive bit-level log-likelihood ratios (LLRs) for SSD and propose three approaches for successive approximate LLR computation of increasing accuracy and complexity. We demonstrate the performance of the proposed approach by numerical simulations.

I. INTRODUCTION
T HE Grassmann manifold has been successfully used for non-coherent transmissions over block-fading channels in many previous works, such as [1], [2], [3], [4], [5], [6], [7], [8], and [9], to name just a few. Good Grassmannian transmission constellations are characterized by a large minimum chordal distance between pairs of symbols. However, constructing such constellations is challenging for larger constellation sizes because the associated constellation design optimization problem is non-convex. Similarly, symbol detection for larger constellations is challenging because Voronoi regions of symbols within the manifold are difficult to determine. A viable solution to these problems has been proposed in [10] in the form of a structured constellation that provides good distance properties and allows for efficient symbol detection. An alternative approach has been developed by the authors in [11], which is based on a non-coherent hierarchical modulation approach using a product superposition of smaller, easily constructed constellations. This approach allows efficient approximate maximum likelihood (ML) symbol detection of large constellations using a trellis structure.
In this work, we adopt this product constellation, which superimposes multiple streams on a Grassmann manifold, and develop a novel SSD. This approach allows the Grassmannian product constellation to be effectively combined with any available forward error correction channel coding scheme that supports soft decoding in a structure similar to bit-interleaved coded modulation. To this end, we derive the corresponding bit-level LLRs for SSD. To enable computationally efficient successive detection of the superimposed streams, we propose three novel approaches for approximate LLR computation of increasing accuracy and complexity. These approximations allow to compute LLRs successively stream by stream, which significantly reduces the complexity compared to the known approach of computing LLRs jointly for the product constellation formed by all superimposed streams. We investigate the performance of the proposed non-coherent SSD by numerical simulations and demonstrate substantial gains over hard symbol detection.

A. Notation
The Grassmann manifold of m-dimensional subspaces of the complex-valued n-dimensional Euclidean space is G (n, m). The conjugate-transpose of matrix A is A H and the Frobenius norm is ∥A∥. The operation a max = arg max a∈A f (a) determines the maximizer a max of the function f (a) over the set A. The expected value of a random variable r is E (r) and its probability density is P r .
II. SYSTEM MODEL We consider non-coherent single-input multiple-output (SIMO) transmissions from a transmitter equipped with a single antenna to a receiver equipped with N r antennas. The signal x ∈ C n×1 with ∥x∥ = 1 is transmitted over n consecutive time instances. We assume a block-fading channel with a coherence block length of n, such that the channel vector h ∈ C Nr×1 is constant during the transmission. Hence, the received signal is: with Z ∈ C n×Nr denoting independent and identically distributed (iid) Gaussian noise of variance one. The transmit signal x is generated according to [11] using a Grassmannian product superposition of R independent streams: is the transmit symbol of the i-th stream and is selected from a Grassmannian constellation: Each of the R streams carries coded bits with code rate r coded bits is mapped onto n i /b i symbols (assumed integer), where k i denotes the information block length; see fig. 1. By individually encoding the R streams (rather than jointly), we can exploit a coding gain at the receiver during SSD. The adopted structure shown in fig. 1 allows to perform rate adaptation per stream.
By forming all possible products of symbols according to (2) we can construct the corresponding product constellation: This constellation is of size

A. Hard Symbol Detection
For Rayleigh fading channels, non-coherent ML symbol detection is achieved by [1]: This search has exponential complexity in b, which may be computationally infeasible for larger product codebook sizes. In [11], a low-complexity trellis detector is developed, which can closely approximate ML detection. This trellis detector proceeds recursively through the R streams using Viterbi's algorithm with a path-metric that is obtained from a Grassmannian chordal distance between the transmit symbols and a recursive projection of the dominant subspace of Y.

B. Soft Detection
Given the product constellation Q (n) 1 , it is possible to calculate bit-level LLRs according to [10]: where C is the set of all symbols transmitting bit ℓ ∈ {0, 1} in position j ∈ {1, . . . , b}. The complexity of computing each sum-term of (6) is mainly determined by the vector-matrix product and the norm calculation, which is of order O(nN r ). However, since the size of C (ℓ) j grows exponentially with b, the number of terms to be calculated is proportional to 2 b , which is computationally infeasible for larger constellations. To alleviate this issue, we derive below stream-based LLRs that are obtained successively following a SSD approach, by applying a similar strategy as soft successive interference cancellation (SIC) in coherent detection.

III. APPROXIMATE LLRS FOR STREAM-LEVEL SSD
We assume a SSD structure that successively iterates over the R streams. Consider detecting the i-th stream; at this point estimates of the prior i − 1 symbols are available: whereQ ℓ k denotes a detected symbol of the k-th stream. These detected symbols are obtained at the receiver after a block of n k symbols has been decoded and re-encoded to take advantage of the error correction capabilities of the channel code during successive detection. For the first stream i = 1 we Similarly, we combine the symbols of the R − i streams subsequent to stream i: These symbols are considered unknown when the symbol Q ℓi is detected, as they are detected after stream i. To obtain LLRs for the b i bits carried by Q ℓi , we consider the distribution of Y conditioned on Q ℓi [12]: In the following, we use the notation If the number 2 R k=i+1 b k of possibilities for q i+ is relatively small, we can explicitly calculate the expectation in (9). However, for larger constellations, this is prohibitively complex.
To avoid this complex calculation, we neglect the constellation structure imposed on q i+ and apply a Bayesian prior under the assumption that q i+ can take on any value on the respective Grassmannian G (d i , 1). We thus assume that q i+ is uniformly distributed on G (d i , 1). To calculate the expected value in (9), we propose three approaches in the following.

A. Jensen LLRs
Jensen's inequality provides a lower bound on (9): For q i+ uniformly distributed on G (d i , 1) we have: with λ (m) ℓi denoting the non-zero (positive) eigenvalues of the quadratic form With that we approximate the LLRs as: where C   (6), it can still provide less overall complexity, since the total number of sum terms for all R streams is proportional to

B. Max LLRs
A simple upper bound on (9) is obtained as: where is the largest eigenvalue of W ℓi . Plugging into (12) provides another approximation of the LLRs. Compared to (13), the complexity is further increased, as we now additionally have to calculate W ℓi (O(d i−1 d 2 i )) and its largest eigenvalue. Using power iterations [13], the complexity of finding the largest eigenvalue is of order O(kd i ), where k is the number of iterations used (usually small).

C. Hypergeometric LLRs
We next propose an LLRs approximation that relies on a signal-and noise-subspace decomposition of the relevant norm terms. Consider an eigen-decomposition of matrix W ℓi : We have rank (Λ ℓi ) = min (N r , d i ) since the noisecontribution in Y has rank N r . However, the signalcontribution in Y, corresponding to transmit signal x, is only of rank one. Therefore, let us consider the onedimensional signal-subspace only (largest subspace of this eigen-decomposition): where u ℓi is the eigenvector corresponding to the largest eigenvalue. Notice, if N r = 1 eqs. (15) and (16) are the same. This provides the following lower bound: which is tight for N r = 1.
Since u ℓi is a fixed unit-norm vector and q i+ is a uniformly distributed unit-norm vector, the term u H ℓi q i+ 2 follows a beta-distribution with parameters α = 1 and β = d i − 1 [14]. Let us denote the corresponding random variable by u H ℓi q i+ 2 = β. Thus, to evaluate (9)  calculated from the confluent hypergeometric function of the first kind M β (x) = 1 F 1 (α, α + β, x) [15]. We provide a few specific examples of the MGF in Table I. With this result, we approximate the LLRs as: Here, λ (max) ℓi is a function of Q ℓi according to (15). Notice, for N r = 1 this result is exact; for N r > 1 it is only an approximation since the noise subspace is missing. Complexity-wise, this requires similar calculations as the max LLRs; yet, calculating M β (x) is slightly more complex than the exponential terms in (14), depending on the parameters α, β.

IV. SIMULATIONS
In our simulations, we use Grassmannian constellations (3) optimized to maximize the minimum Grassmannian chordal distance between symbol pairs using the gradient-based approach of [16]. In addition, we optimize the bit-tosymbol mapping to maximize the Hamming distance of bit-vectors associated with nearest-neighbor symbols (in terms of chordal distance) using a variant of the binary switching algorithm [17]. Both approaches are local optimizations that cannot guarantee global optimality. We use the turbo code of 3GPP LTE with its base code-rate r (c) i = 1/3, ∀i and an information block length k i = 2 10 , ∀i. We use a bit-interleaved coded modulation structure as shown in fig. 1 with a bit-level interleaver between the turbo coder and constellation symbol mapper to eliminate error correlations. We consider N r = 1 and uncorrelated unit-variance Rayleigh fading channels. Results are averaged over 1500 code blocks.

A. Performance Comparison of LLR Calculations
In our first simulation, we compare the bit error ratio (BER) and block error ratio (BLER) achieved with the different LLR calculations of Section III. We consider a fading blocklength of n = 10 symbols, superimpose R = 2 streams with  fig. 2 shows the BER/BLER of the first stream i = 1 with d i−1 = 10 and d i = 5. For the second stream i = 2 we have d i = 1, which results in all LLR calculations coinciding. As expected, the most accurate LLR calculation based on the hypergeometric function achieves the best BER performance. Hence, for the remaining simulations we only use this approach. Notice that although not shown in this simulation, the observed gains of the hypergeometric LLR calculation also generalize to scenarios with more than two streams.

B. Performance Comparison With Single Constellation
We next compare the performance of the product superposition using SSD against single (non-product) constellations. We consider a fading block-length of n = 12 symbols and transmit a total of b ∈ {8, 12, 16} coded bits per fading block.
For the product constellation we use the following setups: We compare this approach against a single Grassmannian constellation Q (n) 1 that directly carries b bits of information over the fading block of length n. Here, we calculate LLRs using (6). For complexity reasons, we can simulate this system only for b ∈ {8, 12}. This single constellation can achieve better minimum distance properties than a product superposition because it is not restricted to the product structure [16].
As an alternative scheme with low complexity, we consider a time-split approach where we split each coherence block into sub-blocks and transmitb = 4 bits per sub-block. This allows to calculate LLRs using (6) per sub-block. For each sub-block we use a single Grassmannian constellation Q (ñ) 1 carryingb bits as follows: • For b = 8: we split the fading block of length n = 12 into two sub-blocks each of lengthñ = 6. • For b = 12: we split into three sub-blocks of length n = 4. • For b = 16: we split into four sub-blocks of lengthñ = 3. Because this scheme uses only a smaller block-lengthñ, it has a lower diversity order.
In fig. 3, we show the BER achieved by these three systems, all transmitting at the same net bit rate. We observe that the lowest BER is achieved by the product constellation, followed by the large single Grassmannian constellation Q (n) 1 and the time-split approach using smaller sub-block constellations Q (ñ) 1 . The product constellation achieves the lowest BER as it can exploit the coding gain of the individual streams during SSD. Complexity-wise, the time-split approach outperforms the other two significantly, followed by the product constellation and the large single constellation. For b = 16, calculating LLRs for the large single constellation using (6) is computationally too complex.

C. Performance of Individual Streams
We next investigate the BER of the individual streams of the product constellation. We consider R = 3 streams with  fig. 4. We can see that the BER of the latter streams is worse than that of the former. This is expected due to error propagation during SSD, similar to what is observed in conventional SIC detectors. This behavior can be exploited, for example, for multi-resolution video transmission, where higher streams provide additional resolution to users with sufficiently good signal to noise ratio (SNR), whereas a low-resolution stream can already be decoded at lower SNR.
To realize rate adaptation using adaptive modulation and coding (AMC), the BER and transmission rate of the streams can be influenced in three ways: 1) we can adapt the individual constellation sizes D i ; 2) we can adapt the code rates r (c) i ; 3) we can adapt the dimension step size ∆ i = d i−1 − d i , with larger ∆ i providing more robustness. Note, however, that we should always keep the former streams more robust than the latter to avoid excessive error propagation during SSD.

D. Performance Comparison With Hard Trellis Detection
We next compare the proposed SSD with the hard symbol trellis detector of [11] combined with soft bit-level decoding. For this hard symbol detection approach, when detecting stream i, the symbol estimates of the i − 1 prior streams in (7) are set according to the hard symbol decisions of the trellis detector. Given this hard decision, we then calculate the LLRs according to (18). The system is less complex than SSD because it does not require re-encoding of bits at the receiver for successive detection; however, it also cannot use coding gain to minimize error propagation in successive detection. We consider R = 2 streams with [d 0 , d 1 , d 2 ] = [6, 3, 1] and b i = 4. The BER performance of both systems is shown in fig. 5. For the first stream, both systems perform equally well, since the first stream is not subject to error propagation, but the second stream performs significantly better with SSD by exploiting the coding gain for successive detection. Adding further streams exhibits the same behavior: with SSD, we can always achieve a gain over hard successive detection because we can use the error correction capabilities of the outer turbo code.
V. CONCLUSION We have derived approximate LLRs for SSD of Grassmannian product constellations. We have compared the developed scheme with single (non-product) Grassmannian constellations, demonstrating performance improvements due to the coding gain between SSD stages. We have also compared against hard symbol detection using a previously developed trellis detector, showing that SSD can provide substantial gains. In terms of complexity, the proposed scheme achieves a favorable trade-off because the required LLR calculations are relatively simple, since even for larger product constellations, the sub-constellations of each individual stream are generally small.

APPENDIX MULTIDIMENSIONAL EXTENSION
Consider the multidimensional case x → X ∈ C n×m , with m = N t denoting the number of transmit antennas. We use the product superposition of (2) with d R = m. Consider detecting a symbol Q ℓi of the i-th stream; the conditional distribution of Y is the same as in (9) with q i+ replaced by Q i+ ∈ C di×m : Performing an eigen-decomposition of W ℓi and restricting to the m-dimensional signal subspace similar to (16)  containing the m largest eigenvalues (assuming N r ≥ N t = m). With this, we can lower-bound the norm-term in (19) as: where λ (j) ℓi is the j-th eigenvalue of Λ (max) ℓi and u (j) ℓi is the corresponding column of U ℓi . This is a lower-bound, since we ignore the contribution of the noise subspace. Each term β j = Q H i+ u (j) ℓi 2 follows a beta-distribution with parameters α = m and β = d i − m; note, these terms are correlated.
To evaluate the expected value of (19), we still assume that the random variables β j , j ∈ {1, . . . , m} are uncorrelated, such that we can split the calculation: We can now again use the moment-generating function to evaluate each product term via M β j (x) = 1 F 1 (α, α + β, x).