Soft-In Soft-Out Polar Decoding Aided Three-Stage Concatenated Iterative MIMO-Turbo Transceivers

A novel Soft-Input Soft-Output (SISO) polar decoding algorithm is proposed, which is capable of iterating between an inner and outer decoder in a three-stage serial concatenated iterative receiver. The proposed polar decoding algorithm leverages a hybrid of Soft Cancellation (SCAN) and g-function aided-SCAN (G-SCAN) decoding. The SCAN decoder enables iterative soft-information exchange with the outer decoders and the G-SCAN decoder facilitates iterative soft-information exchange with the inner decoders while exploiting the error correction capability of the classic Successive Cancellation List (SCL) decoder. Furthermore, we present the Three-Dimensional (3D) Extrinsic Information Transfer (EXIT) chart analysis of polar codes for the first time, in order to characterise the iterative exchange of extrinsic information between these three concatenated stages. This offers an insight into the interactions of these three decoders and characterises their iterative convergence. In this three-stage serial concatenated scheme the first stage is a Joint Source Channel Coding (JSCC) decoder, the second stage is a 5th Generation (5G) 3rd Generation Partnership Project (3GPP) New Radio (NR) polar decoder based on our novel hybrid SISO polar algorithm, and the third stage is a 2 × 2 Multiple Input Multiple Output (MIMO) detector. We characterized the Symbol Error Rate (SER) vs. complexity of the proposed scheme, and compare it to various soft- and hard-decision benchmarkers, as well as to the relevant JSCC and Separate Source Channel Coding (SSCC) schemes. In comparison to a three-stage serial concatenated JSCC benchmarker, the proposed SISO scheme offers 11${\%}$ complexity reduction over to the state-of-the-art SISO SCAN polar decoder at a similar SER performance. Additionally, the proposed SISO scheme achieves a 0.75 dB SNR gain over the SCAN polar decoder of a two-stage serial concatenated SSCC benchmarker.


I. INTRODUCTION
Recently, the 3rd Generation Partnership Project's (3GPP) 5th Generation (5G) New Radio (NR) standard has adopted Low-Density Parity-Check (LDPC) codes for data channels and polar codes for control channels [1].Although polar codes were chosen for their superior error correction capabilities at short information block lengths only for the control channels, they have potential in many applications beyond control channels.Their strong performance makes them a promising option for different applications including joint iterative detection and decoding (JIDD) schemes [2], [3], [4], [5], [6], as well as joint source and channel coding (JSCC) schemes [7], [8], [9].
Following the standardization of polar codes in the 3GPP 5G NR [1], a number of advanced polar decoding algorithms have been proposed [10], [11], [12], [13], [14].Although the state-of-the-art Successive Cancellation List (SCL) [11] decoder has been shown to offer the best error correction performance, it only produces hard-decision outputs, which prevents achieving iterative gains in iterative decoders [15], [16], [17], [18].This has motivated the design of several soft-output polar decoders including the Soft Cancellation (SCAN) [12] algorithm.It has been shown that the SCAN polar decoder allows the iterative exchange of extrinsic information with an inner decoder, enabling joint decoding and detection for polar codes concatenated with an inner component such as a Multiple Input Multiple Output (MIMO)detector [18], [19], [20].However, our previous work [21] showed that even when iterating with an inner detector, the SCAN decoder may not perform better than the SCL decoder, which is incapable of exchanging extrinsic information with a MIMO detector and hence can only benefit from 'one-shot' MIMO detection.Motivated by this, our previous work [21] introduced a novel soft-in and soft-out G-SCAN polar decoder which is particularly suitable for iterations with an inner decoder.This offers the best of both worlds by simultaneously generating soft outputs for attaining iterative gains, as well as hard outputs that can improve upon the SCL performance.
However, Soft-Input Soft-Output (SISO) polar decoders have not been characterised in the context of three-stage schemes in concatenation with both an inner and an outer decoder.Such a polar decoder would have to accept a-priori information pertaining to both the encoded as well as decoded bits, and produce extrinsic information pertaining to both bit sequences in return.This would then allow an iterative exchange of extrinsic information with a concatenated inner component, such as a MIMO detector [15], [16], [17], [18], as well as a concatenated outer JSCC decoder [22], [23].Motivated by this, the application of SISO polar decoders to a three-stage serial concatenated scheme is introduced in this article, where we propose a novel hybrid polar decoder that behaves as a SCAN decoder upon iterating with an outer code, and behaves as a G-SCAN decoder, where iterating with an inner detector.Against this background, we boldly and explicitly contrast our contributions to the recently published literature in Table 1.In detail, the novel contributions of this article are summarised as follows: r For the first time, we demonstrate the three-stage serial concatenation of a polar decoder, which iteratively exchanges extrinsic soft information both with an inner and an outer decoder.More specifically, a novel SISO hybrid polar decoding algorithm is proposed, which is capable of exchanging extrinsic soft information pertaining to both encoded and decoded bits.
r We propose a beneficial realisation of the three-stage serial concatenated scheme, in which the outer code is constituted by the recently introduced Unary Error Correction (UEC) joint source and channel code [24], the middle code is the 5G NR polar code, and the inner component is a 2 × 2 MIMO detector.
r Furthermore, we present the Three-Dimensional (3D) Extrinsic Information Transfer (EXIT) chart analysis of polar codes for the first time, in order to offer an insight into the iterative convergence of three decoders, as they iteratively exchange soft extrinsic information.
r Furthermore, the Symbol Error Rate (SER) vs. complex- ity of the proposed hybrid polar decoder is characterised and compared to various soft-and hard-decision output benchmarks, as well as to the relevant three-stage serial concatenated JSCC counterparts and two-stage serial concatenated Separate Source Channel Coding (SSCC) schemes.
r We demonstrate that in comparison to a three-stage se- rial concatenated JSCC benchmarker, the proposed SISO scheme offers 11% complexity reduction compared to the state-of-the-art SISO Soft Cancellation (SCAN) polar decoder, while achieving a similar SER performance.Additionally, the proposed SISO scheme achieves a 0.75 dB SNR gain over the SCAN polar decoder in a two-stage serial concatenated SSCC benchmarker.The rest of the article is organized as follows.Section II introduces the proposed hybrid polar decoder, while Section III highlights our novel three-stage serial concatenated scheme.Following this, Section IV presents various soft-and harddecision benchmarkers.Sections V and VI discuss our novel EXIT charts and the complexity of the proposed scheme and benchmarkers, respectively.Then, Section VII characterises the SER performance of the proposed hybrid decoder and compares it to various JSCC and SSCC schemes.Finally, Section VIII offers our conclusions.

II. PROPOSED HYBRID POLAR DECODER
In this section, a novel hybrid polar decoder is proposed, which has two main components that operate simultaneously, namely an upper SCAN polar decoder [12], and a lower G-SCAN polar decoder [21], as shown in Fig. 1.Here, the G-SCAN polar decoder comprises a conventional SCL decoder [11] which outputs a hard-decision bit vector û, as well as a modified-SCAN decoder, which is capable of exploiting û in order to improve the error correction capability [21].In this way, the G-SCAN decoder achieves superior harddecision performance over the SCL decoder and outperforms the soft-in soft-out SCAN decoder [21].In the proposed hybrid decoder the upper SCAN decoder and the lower G-SCAN decoder beneficially complement each other.More specifically, the SISO SCAN decoder is capable of exploiting both the decoded a-priori Logarithmic-Likelihood Ratio (LLR)s ũa and the encoded a-priori LLRs da in order to generate the decoded extrinsic LLRs ũe and the encoded extrinsic LLRs d1 e .However, it has limited error correction capability even when performing multiple internal iterations I max inner , which requires relatively high complexity.
By contrast, the G-SCAN decoder offers superior error correction capability at lower complexity than the SISO SCAN decoder, even when performing only a single internal iteration within its modified-SCAN component.However, it is not suitable for iterating with a concatenated outer decoder, since it cannot accept decoded a-priori LLRs ũa .In the proposed hybrid polar decoder we exploit the complementary advantages of the SCAN and the G-SCAN decoders in order to compensate for each other's disadvantages.To elaborate further, this hybrid polar decoder enables the iterative exchange of decoded extrinsic LLRs ũe between the SCAN polar decoder with an outer decoder, as well as the iterative exchange of encoded extrinsic LLRs de between the high-performance yet low complexity G-SCAN polar decoder with an inner decoder.
In particular, when the hybrid polar decoder is concatenated with an inner and an outer decoder as illustrated in Fig. 1, the iterative decoding process starts with the inner decoder.Here, the inner decoder uses the signal r received from the channel, as well as the a-priori LLR vector fa , in order to generate the extrinsic LLR vector fe .Note that, the a-priori LLRs in the vector fa are initialized to zero for the first iteration.Next, the order of the extrinsic LLRs in the vector fe would be rearranged by the deinterleaver π −1 2 , in order to generate the E number of a-priori LLRs in the vector ẽa as the input of the rate-dematching component.Here, the deinterleaving operation π −1 2 employs the reverse interleaving pattern used by the interleaver π 2 in the transmitter.Then, rate-dematching is applied to the E number of a-priori LLRs in the vector ẽa in order to generate N a-priori LLRs in the vector of da , as shown in Fig. 1.
Following that, the decoding process of the hybrid polar decoder of Fig. 1 begins with the operation of the SCAN decoder, which takes its inputs from the encoded a-priori LLR vector da , as well as from the decoded a-priori LLR vector ũa , which is provided by the outer decoder.Note that, during the first iteration, each of the decoded a-priori LLRs in ũa is initialized depending on the frozen bit pattern, which is known by both the transmitter and by the receiver, where an infinite-valued LLR is adopted if it corresponds to a frozen bit, and a zero-valued LLR is adopted if it corresponds to an information bit.Observe in Fig. 1 that the SCAN decoder processes the inputs of a-priori LLR vector da and the a-priori LLR vector ũa , in order to generate the decoded extrinsic LLR vector ũe .
Following the operation of the SCAN decoder, the order of the decoded extrinsic LLRs in the vector ũe is rearranged by the deinterleaver π −1 1 , in order to generate the a-priori LLR vector za , which will then be input to the outer decoder of Fig. 1.As before, the deinterleaving operation π −1 1 uses the reverse interleaving pattern of the corresponding interleaver π 1 in the transmitter.Based on the a-priori LLR vector za , the outer decoder generates the extrinsic LLR vector ze of Fig. 1.Following this, the order of the extrinsic LLRs in the vector of ze is rearranged by the interleaver π 1 , in order to generate the decoded a-priori LLR vector ũa , which will be taken as its input by the upper polar component of the hybrid polar decoder in order to pass on the iteration gain offered by the outer decoder, as seen in Fig. 1.The SCAN decoder also generates a vector of encoded extrinsic LLRs d1 e by processing the input a-priori LLR vector da and the a-priori LLR vector ũa .This SCAN encoded extrinsic LLR output d1 e will be combined with the output of the lower G-SCAN decoder through averaging, as it will be detailed in this section.
In parallel with this, the lower component of the hybrid polar decoder is operated, namely, the G-SCAN decoder [21], which takes the same encoded a-priori LLR vector da in order to generate a second encoded extrinsic LLR vector d2 e , using two steps, as shown in Fig. 1.More explicitly, the first step in each operation comprises the operation of the SCL algorithm having a list size of L [11], in order to generate a hard-decision vector of decoded bits û, which selects the best block of decoded bits among a list of L candidates.Then, a second step entails a single internal iteration I max inner = 1 performed by a modified-SCAN algorithm, which accepts the b bits of the decoded vector û as an input.This is then interleaved with (N − b) frozen bits, as well as the vector of N encoded a-priori LLR da in order to generate a vector d2 e of encoded extrinsic LLRs.Following that, in order to generate a vector de of N encoded extrinsic LLRs, the average of the extrinsic LLRs provided by the SCAN and the G-SCAN decoder may be obtained according to de = ( d1 e + d2 e )/2, as seen in Fig. 1.Following this, the N encoded extrinsic LLRs in the vector de are processed by the rate-matching block of Fig. 1, in order to generate the E encoded extrinsic LLRs of the vector ẽe .For further information regarding the schedule and equations of the G-SCAN polar decoder, please refer to [21].
Again, the order of the encoded extrinsic information output LLRs in the vector ẽe is rearranged by the interleaving operation π 2 of Fig. 1, in order to generate the a-priori LLR vector fa , which may be taken as its input by the inner decoder in the next decoding iteration.As shown in Fig. 1, in the successive iterations, the hybrid decoder will process more accurate encoded a-priori LLRs of the vector ẽa , as well as more accurate decoded a-priori LLRs of the vector ũa , which may be exchanged iteratively with the inner and the outer decoders according to decoding schedule of the inner decoder, SCAN decoder, outer decoder, and the G-SCAN decoder.Note that, the notation of UEC-hybrid(2, L=4)-MIMO indicates the upper SCAN polar decoder component of the proposed hybrid polar decoder employs I max inner = 2 internal iterations, while the lower G-SCAN polar decoder component has a list size of L = 4.As will be detailed in the next sections, the proposed hybrid polar decoder imposes lower complexity than using a scheme that relies on SCAN decoding for both the upper and lower polar decoders, while providing 0.25 dB SNR gain as detailed in Section VII.
Note that the performance of the hybrid polar decoder may be further enhanced in some applications by appending a Cyclic Redundancy Check (CRC) to the bit vector u before performing polar encoding.In the receiver, this CRC may be exploited by the SCL decoder in order to perform CRC-aided SCL decoding (CA-SCL), as detailed in [1].Note that, further modification is required in this case to remove the LLRs that pertain to CRC bits from the extrinsic LLRs vector ũe , as well as to append zero-valued LLRs to represent the CRC bits in the a-priori LLR vector ũa .For further information on the comparison between the CRC-aided-SCL and CRC-aided-G-SCAN algorithms in a two-stage serially concatenated scheme please see [21].

III. SYSTEM OVERVIEW
This section introduces a novel three-stage concatenated scheme, which demonstrates how polar coding can iteratively exchange extrinsic soft information with the relevant inner and outer code.More specifically, Fig. 2 shows the outer JSCC decoder, which is a useful application for example in an outer video decoder, in our three-stage concatenated receiver.In particular, we adopt the recently proposed UEC [24] code, which is capable of near-capacity operation at modest complexity for source symbols selected from an infinite cardinality set, such as in the example of video encoding [24].Meanwhile, the inner decoder is provided by a MIMO detector, which is a potent application for inner decoding in three-stage serial concatenated receivers.More specifically, we adopt a Gray-mapped 2 × 2 Quadrature Phase Shift Keying (QPSK) MIMO scheme, which we use for communicating over an uncorrelated narrowband Rayleigh fading channel.Here, the spatial multiplexing MIMO scheme subdivides the data streams into two independent sub-streams, which are mapped onto a pair of transmit antennas.Note that, because this is a joint source-channel coding application, such applications have a tolerance to residual bit and symbols errors, so there is no need to use CRC bits.Hence, for the proposed scheme of Fig. 2, we choose not to adopt a CRC-aided polar decoding algorithm.The signal received from the two receiver antennas is then detected under the assumption of having perfect channel knowledge, with no feedback to the transmitter [28].The details of the transmitter and receiver design of the proposed three-stage concatenated scheme will be provided in Section III-A and Section III-B, respectively.

A. TRANSMITTER
As shown in Fig. 2, the transmitter of the proposed three-stage concatenated scheme is comprised of three encoders, namely the UEC encoder, polar encoder, and MIMO modulator.The UEC encoder consists of two main components, namely the unary encoder and the trellis encoder, as detailed in [24].The encoding process of Fig. 2 begins with the UEC encoder taking a vector comprising a number of symbols x = [x i ] a i=1 , where each symbol x i is a realisation of a corresponding random variable X i , which adopts a symbol value from the infinite cardinality set comprising all positive integers N 1 = {1, 2, 3, . ..}, according to the probability distribution of Pr(X i = x) = P(x) [24].Here, we consider the specific example, where the source obeys the zeta distribution, which is defined as where s > 1 is a parameter of the distribution and ζ (s) is the Riemann zeta function obeying Alternatively, the distribution can be parameterized by p 1 = 1/ζ (s), which quantifies the probability of a zeta-distributed symbol adopting the value of 1, according to p 1 = Pr(X i = 1).
In the case of zeta distribution, the symbol entropy is given by where ζ (s) = x∈N 1 ln(x)x −s represents the derivative of the Riemann zeta function [24].Table 2 exemplifies the first ten codewords of the unary encoder, which corresponds to the first ten symbols of N 1 [24], together with the corresponding probabilities of occurrence for the case of zeta distributions having three example parameterizations of p 1 = {0.7,0.8, 0.9}.For example, the symbol x i = 5 corresponds to the unary encoded codeword y i =  [00001].In this manner, the symbols of x are mapped to unary codewords, which are concatenated in order to obtain unary encoded bit vector y = [y j ] b j=1 [24], which has a length of b.For example, when the source symbol vector comprises a = 5 symbols, such as x = [2, 1, 3, 1, 1], we obtain the b = 8-bit unary encoded vector y = [01100111].In the case of zeta distribution, the average unary codeword length l [24] can be expressed as In the case of the example parametrisation of p 1 = {0.7,0.8, 0.9}, the average unary codeword length is given by l = {2.69,1.51, 1.16}, respectively.Note that, upon using the unary code for zeta distributed symbols, l only remains finite for the case of s > 2 and hence p 1 > 0.698 [24].Note that, the length b of the bit sequence of y varies from block to block and it has been shown that the SER performance of the UEC is dominated by the shortest block length [29].This may be explained by the typical behavior of iterative receiver schemes in which superior performance is obtained when using longer blocks.In order to mitigate this, we adopt the fixed block length method proposed in [29].This technique has the benefit of improved SER performance, as well as having the fixed block length which enables the interleaver π 1 shown in Fig. 2 to adopt a constant design that does not change from block to block, hence avoiding the potentially excessive memory requirements of storing diverse interleaver designs [29].
To elaborate further, a buffer is placed after the unary encoder, which stores the remaining part of any codeword b that extends past the fixed interleaver length p.For example, in the case of a fixed interleaver length of p = 4, the concatenated unary encoded codeword y = [01100111] may be decomposed into two consecutive frames, according to y1 = [0110] and y2 = [0111].Note that, in this example, the codeword that corresponds to the third symbol of x is split between y1 and y2.In this case, the part of this codeword appearing in y1 is stored in the buffer between the generation of y1 and y2.More explicitly, until finishing the transmission of the first frame y1 = [0110], y2 = [0111] must wait in the buffer.
Following unary encoding, each successive frame y is forwarded to the trellis encoder of Fig. 2. The UEC trellis is parametrized by its number of states r and the number of output bits n generated by each trellis stage, where r ≥ 2 and n ≥ 1.A trellis encoder is exemplified in Fig. 3 for the case of r = 4 states and n = 1 output bits.As it will be detailed in Section VI, r = 4 states are chosen to strike a performance vs. complexity trade-off for the UEC decoder.Furthermore, n = 1 is chosen, since the additional error correction that would be afforded by using n ≥ 2 would be made redundant by the concatenated polar code, which has strong error correction performance.
Before encoding each frame y, the trellis encoder is initialised to the start state of m 0 = 1 [24].Each successive bit y j of the frame y is considered by the trellis encoder in order of increasing index j, and stimulates a transition in the trellis from its previous state m j−1 ∈ {1, 2, .., r} to a next state m j ∈ {1, 2, .., r} according to Including the start state of m 0 = 1, the path identified through the UEC trellis traverses through p + 1 states as exemplified in Fig. 3, which shows how synchronization is maintained between the unary encoded symbols and the trellis path [24].By extending the previous buffer example, the paths traversed through in the trellis for each consecutive frame y1 and y2 can be represented by the vectors m1 = [1, 3, 2, 1, 3] and m2 = [1, 3, 2, 1, 2], respectively.Following this, the trellis encoder converts each unary encoded bit y j into an n-bit UEC encoded codeword z j depending on the specific path selected through the UEC trellis.The UEC encoded bits are then concatenated, in order to obtain a b • n-bit UEC encoded vector z = [z k ] b.n k=1 .In the example above, we obtain the two consecutive frames z1 = [0100] and z2 = [0101].Note that, the UEC encoded bit vector z is guaranteed to contain equiprobable binary values if the codewords mapped onto the top and bottom halves of the UEC trellis adopt complementary values [24].For example, the transition from state m j−1 = 3 to state m j = 3 in Fig. 3 is associated with an encoded bit value of z j = 0, which is the complement of the transition mirrored in the bottom half of the trellis from state m j−1 = 4 to state m j = 4 which is associated with an encoded bit value of z j = 1.Hence, the average coding rate of the outer UEC encoder R o is given by [24] Following the UEC trellis encoding, the order of the bits in the UEC-encoded frame z is rearranged by the fixed-length interleaving operation π 1 of Fig. 2, in order to obtain the interleaved UEC encoded frame u.Note that once the interleaving operation of π 1 is complete, the bit vector u no longer represents a sequence of UEC codewords.From the perspective of the polar encoder, u simply represents a sequence of bits.As an example, the frames z1 and z2 may be interleaved by the interleaver pattern  in order to obtain the interleaved frame f.Following this, a Gray-mapped 2 × 2 QPSK MIMO scheme may be employed for communication over an uncorrelated narrowband Rayleigh fading channel, as shown in Fig. 2.
The effective throughput η of this three-stage serially concatenated scheme may be quantified in terms of bits per transmission, according to where N T x is the number of transmit antennas, R o is the coding rate of the UEC scheme, R i is the coding rate of the rate-matched polar code, and M is the modulation order, which is M = 4 for QPSK modulation.More specifically, the throughput of the transmitter in Fig. 2 is given by η = 2 • R o bits/transmission, when R i = 1/2 and 2 × 2 QPSK MIMO modulation are applied.Fig. 4 plots the Discrete-input Continuous-output Memoryless Channel (DCMC) capacity for 2 × 2 QPSK modulated MIMO transmission over the uncorrelated narrowband Rayleigh fading channel, which provides a theoretical upper bound for reliable communication in the proposed scheme [29], [30].Here, E b /N 0 is expressed as As an example, the effective throughput becomes η = 2 • R o = 1.52 bits/transmission when p 1 = 0.797, and R o = 0.7618, as shown in Table 3.As seen in Fig. 4, the effective throughput of η = 1.52 bits/transmission is achieved at E s /N 0 = −0.44 [dB].Hence, the capacity bound may be calculated as , as shown in Table 3.

B. RECEIVER
Upon receiving a frame, the receiver of the three-stage concatenated scheme of Fig. 2 carries out an iterative decoding, in which the MIMO detector, polar decoder and UEC decoder iteratively exchange their extrinsic LLRs.More specifically, each iteration adopts a decoder activation order of {MIMO, upper polar component, UEC, lower polar component}.Hence, the MIMO detection and the UEC decoding are performed once per iteration, and polar decoding is performed twice per iteration.
To elaborate further, the iterative decoding process of Fig. 2 begins with the operation of 2 × 2 QPSK MIMO detection.Here, the 2 × 2 QPSK MIMO detector processes the received signal r provided by the channel, as well as the a-priori LLR vector fa .Note that, the a-priori LLR vector fa is populated by zero-valued LLRs at the beginning of the first decoding iteration.During each iteration, the 2 × 2 QPSK MIMO detector generates the extrinsic LLR vector fe .
The order of the extrinsic LLRs in the vector fe are rearranged by the deinterleaver π −1 2 , which adopts the reverse of the interleaving pattern π 2 used in the transmitter.The result in the vector of E a-priori LLRs ẽa is then provided as the input of the polar decoder's rate-dematching component shown in Fig. 2.Then, rate-dematching converts the vector of E apriori LLRs ẽa into a vector of N a-priori LLRs da .Following this, the scheme of Fig. 2 activates the SISO polar decoder, which comprises two separate polar decoding components, referred to as the upper and the lower polar component, as mentioned above.
According to the prescribed decoding activation order of {MIMO, upper polar component, UEC, lower polar component}, the encoded a-priori LLR vector da is entered into the upper polar decoder, which also processes the decoded a-priori LLR vector ũa of Fig. 2. Note that, the decoded a-priori LLR vector ũa comprises N LLRs, which are initialized in the first iteration depending on the frozen bit pattern known by both the transmitter and the receiver.Specifically, an infinite value is adopted if the LLR corresponds to a frozen bit, while a zero-value is adopted if the LLR corresponds to an information bit.In response to the a-priori LLR input vectors da and ũa of Fig. 2, the upper polar decoder generates the encoded extrinsic LLR vector d1 e and the decoded extrinsic LLR vector ũe .
Next, the order of the polar decoded extrinsic LLR vector ũe is rearranged by the deinterleaver π −1 1 in order to generate the a-priori LLR vector za of Fig. 2. Similar to the deinterleaver π −1 2 , the deinterleaver π −1 1 uses the reverse interleaving pattern of the corresponding interleaver π 1 seen at the transmitter of Fig. 2. The a-priori LLR vector za is then input to the UEC trellis decoder, which generates the extrinsic LLR vector ze in response.Next, the order of the extrinsic LLRs in the vector ze is rearranged by the interleaver π 1 in order to generate the polar decoded a-priori LLR vector ũa , which is forwarded to both the upper and lower polar decoder components, as mentioned above.
The UEC decoding is followed by the operation of the lower polar decoder.This takes the polar decoded a-priori LLR vector ũa , as well as the encoded a-priori LLR vector da as its input, and generates the encoded extrinsic LLR vector de , as seen in Fig. 2. Next, the N encoded extrinsic LLRs of the vector de is rate-matched in order to generate the E extrinsic LLRs of the vector ẽe .The order of the extrinsic LLRs in the vector ẽe is rearranged by the interleaver π 2 , which results in the a-priori LLR vector fa forwarded to the 2 × 2 QPSK MIMO detector of Fig. 2.This completes an iteration of the proposed receiver according to the activation schedule of {MIMO, upper polar component, UEC, lower polar component}.Note that, the notation of I max inner relates to the number of internal iterations of the polar decoder, while I max relates to the number of receiver-level iterations carried out between the { MIMO, upper polar component, UEC, lower polar component}.
The iterative decoding process continues for a prescribed number of iterations I max , whereupon the UEC trellis decoder outputs the a-posteriori LLR vector ỹp .Similar to the transmitter, a buffer is placed between the trellis decoder and the unary decoder of Fig. 2.This buffer enables the a-posteriori LLR vector ỹp to be appended to any LLRs remaining from the previous frame.The bits are processed by the UEC decoder sequentially and the one-valued bit encountered by the UEC decoder provides the end of the unary code word, since every unary codeword comprises a series of zero-valued bits followed by a single one-valued bit [29].In this way, the unary decoder consumes all sequences of positive-valued LLRs followed by a single negative-valued LLR and then interprets the result as a legitimate unary codeword.Allowed by writing any remaining LLRs into the buffer ready to be prepended to the next a-posteriori LLR vector of ỹp [29].

IV. SCENARIOS AND BENCHMARKERS
In order to characterise the advantages of the proposed hybrid polar decoder, this section introduces a three-stage serial concatenated JSCC benchmarker, as well as a two-stage serial concatenated SSCC benchmarker.Furthermore, variations of the schemes are considered to employ diverse polar decoders having both hard-and soft-decision outputs.

A. UEC-SCAN-MIMO BENCHMARKER
This benchmarker adopts the same schematics as the proposed UEC-hybrid-MIMO scheme at Fig. 2.However, rather than adopting the proposed hybrid polar decoder, the upper and the lower polar decoder components are provided by the SCAN decoder in this benchmarker.Note that, the SCAN decoder is adopted because it has been shown to offer reasonable decoding performance at a much lower complexity than the SISO Belief Propagation (BP) polar decoder [12], [26].As described in Section III, the same fixed decoding activation order of {MIMO, upper polar component, UEC, lower polar component} is employed until reaching the maximum number of receiver-level iterations I max .As in the proposed UEChybrid-MIMO scheme, the UEC code uses an n = 1-bit UEC trellis having a fixed number of r = 4-states, as recommended in [22], [31].
In order to consider different scenarios, various numbers of internal iterations I max inner are employed within the SCAN decoder for both the upper and lower polar decoder components.Here, the notation UEC-SCAN(1,2)-MIMO when I max = 4 indicates that I max inner = 1 iteration is used for the upper SCAN polar decoder, while I max inner = 2 iterations are used for the lower SCAN polar decoder when a constant number of receiver-level iterations I max = 4 are applied between the three-stage of the UEC decoding, SCAN decoding, and the MIMO detector.

B. UNARY-SCAN-MIMO BENCHMARKER
This is a two-stage serial concatenated SSCC benchmarker, which comprises a unary source code and a polar code that is serially concatenated with a 2 × 2 QPSK MIMO detector, as shown in Fig. 5.In the transmitter, the unary encoder takes the source symbol vector x and generates the unary encoded vector of y, as discussed in Section III.Here, the buffer mechanism described in Section III is employed in order to produce the encoded bit vector y that has a constant length.In this context, the unary code operates in isolation, without the use of a UEC trellis.As a result of this, the interleaver π 1 reorders the bits in the vector y, in order to produce the bit vector u, which therefore has non-equiprobable values.Hence, when these non-equiprobable bits are channel coded by the polar encoder, some capacity loss may be expected [22].Following polar encoding, the order of the bits in the polar encoded bit vector e is rearranged by the interleaver π 2 in order to generate the bit vector f, which is provided by the 2 × 2 QPSK MIMO modulation scheme in order to generate the QPSK modulated vector g.This is then transmitted over an uncorrelated Rayleigh fading channel.
As shown in Fig. 5, the receiver of the Unary-SCAN-MIMO scheme employs a 2 × 2 QPSK MIMO detector to receive the signal r from the channel.This is combined with the a-priori LLR vector fa , in order to generate the extrinsic LLR vector fe , as described in Section III.Following this, the SISO SCAN polar decoder and the 2 × 2 QPSK MIMO detector iteratively exchange their extrinsic LLR vectors fe , and ẽe through the deinterleaver π −1 2 and interlaver π 2 respectively in order to obtain the a-priori LLR vectors ẽa and fa as seen in Fig. 5. Here, each operation of the SISO SCAN polar decoder involves I max inner internal iterations.The iterations between the SISO SCAN polar decoder and the 2 × 2 QPSK MIMO detector continue until the maximum affordable number of receiver-level iterations I max is reached, whereupon the SISO SCAN polar decoder generates the a-posteriori LLR vector ũp of Fig. 5. Following this, the order of the a-posteriori LLRs in the vector ũp is rearranged by the deinterleaver π −1 1 in order to obtain the a-posteriori LLR vector ỹp , which is forwarded to the soft-decision unary decoder.This uses the buffer mechanism described in Section III for reconstructing the source symbol vector x.Note that, in subsequent discussions, the notation Unary-SCAN(2)-MIMO is adopted, for example, in the case where a maximum of I max inner = 2 internal iterations are used within the SCAN decoder.

C. UNARY-G-SCAN-MIMO BENCHMARKER
The Unary-G-SCAN-MIMO SSCC benchmarker operates in a similar manner to the Unary-SCAN-MIMO benchmarker, with the only difference being that instead of adopting the SCAN polar decoder is utilized the G-SCAN polar decoder.
Here, the G-SCAN decoder is parametrised by the list size L, and the notation Unary-G-SCAN(L=4)-MIMO indicates that a list size of L = 4 is employed for the G-SCAN polar decoder.

D. UNARY-SCL-MIMO BENCHMARKER
This is a non-iterative SSCC benchmarker in which the unary encoder is employed as shown in Fig. 6.Here, the transmitter operates identically to that of the Unary-SCAN-MIMO and the Unary-G-SCAN-MIMO benchmarkers of Fig. 5.However, the receiver differs in terms of the number of iterations employed.Explicitly, once the 2 × 2 QPSK MIMO detector has received the signal vector r and has produced the extrinsic LLR vector fe , 'one-shot' polar decoding is employed.More specifically, the order of the LLRs in the vector fe is rearranged by the deinterleaver π −1 2 in order to obtain the a-priori LLR vector ẽa .Following this, an SCL polar decoder is employed in order to obtain the decoded bit vector ûp .Then, the bits in the vector ûp are rearranged by the deinterleaver π −1 1 in order to obtain the bit vector ŷp .Finally, the unary decoder operates as described in the context of the Unary-SCAN-MIMO and the Unary-G-SCAN-MIMO benchmarkers of Fig. 5, in order to reconstruct the source symbol vector x.As an example, the notation of Unary-SCL(L=32)-MIMO indicates a list size of L = 32 is used for the SCL polar decoder.

V. THREE-DIMENSIONAL (3D) EXTRINSIC INFORMATION TRANSFER (EXIT) CHART ANALYSIS
This section characterises the iterative decoding schedule of the proposed three-stage serial concatenated scheme of Section III.A novel 3D EXIT chart analysis [4], [32], [33] is presented, which is capable of visually characterizing the iterative decoding convergence of the proposed system, as well as verifying the correct operation of the proposed algorithms [4], [32], [33].More explicitly, these 3D EXIT charts characterise the iterative exchange of extrinsic information between the UEC, polar, and the MIMO components of Fig. 2, which adopt the decoder activation order of {MIMO, upper polar component, UEC, lower polar component}.We begin by discussing the EXIT functions of the 2 × 2 QPSK MIMO detector and of the UEC codes in Section V-A and Section V-B, respectively.Furthermore, we introduce the more complicated EXIT functions of polar codes in Section V-C, before introducing the iterative decoding trajectories between these EXIT functions in Section V-D.

A. EXIT CHART ANALYSIS OF 2 × 2 QPSK MIMO DETECTOR
In general, the EXIT function of a component can be characterized by quantifying the Mutual Information (MI) of the extrinsic LLR vector at the output of the component [34], as a function of the a-priori LLR vector at the input of the component.The 2 × 2 QPSK MIMO detector of Fig. 2  Note that, the operation of each of the UEC decoder, polar decoder, and the MIMO detector of Fig. 2 is related to one or both of the LLR vector ẽa and the LLR vector ũa .In the case of the MIMO detector, this relationship is related to ẽa and it

B. EXIT CHART ANALYSIS OF THE UEC DECODER
As shown in Fig. 2, the UEC decoder generates the extrinsic LLR vector ze as a function of the a − priori LLR vector za .Hence, the operation of the UEC decoder may be characterized by the EXIT function I(z e , z) = F UEC [I(z a , z)] [22], [31].Fig. 8(a) visualizes the 2D EXIT function of the UEC decoder for zeta distributed source symbols parameterized by various values of p 1 = {0.797,0.85, 0.9, 0.95}, for the case where the UEC trellis has n = 1 encoded bit per transmission and r = 4 states.Throughout the remainder of this article, p 1 = 0.797 will be adopted, since this maximizes the DCMC capacity, as mentioned in Section III-A.Note that, the 2D EXIT function of the UEC decoder has been extensively explored considering various p 1 and r values in [22], [31].Note that as detailed in [22], in the case of n = 1 the EXIT function of the UEC decoder does not reach I(z e , z) = 1 when I(z a , z) = 1.In the case of a two-stage serial concatenated iterative decoding scheme, this would prevent iterative decoding from having a low decoding error rate.However, since we have a three-stage concatenation with a powerful polar code, we may expect to achieve iterative decoding convergence to a low decoding error rate.
As mentioned above, the operation of each of the UEC decoder, polar decoder, and the MIMO detector of Fig. 2 is related to both the LLR vector ẽa or to the LLR vector ũa .In the case of the UEC decoder, this relationship with ũa is established by interleaving the extrinsic LLR vector ze .In order to jointly characterise the UEC decoder, polar decoder, and the MIMO detector it is beneficial to include both I( ũa , u) and I(ẽ a , e) in the EXIT function analysis.This motivates the conception of the 3D EXIT function plot of Fig. 8

C. EXIT CHART ANALYSIS OF THE POLAR DECODER
In general, in the case of SISO polar decoding, there are two extrinsic LLR vector outputs, namely the polar decoded extrinsic LLR vector ũe and the encoded extrinsic LLR vector ẽe .Additionally, the SISO polar decoder has two a-priori LLR vector inputs, namely the polar decoded a-priori LLR vector ũa and the encoded a-priori LLR vector ẽa .The MI of the polar decoded extrinsic LLR vector output I( ũe , u) relies not only on the MI of the decoded a-priori LLR vector I( ũa , u) but also on the MI of the encoded a-priori LLR vector I(ẽ a , e).Hence, the MI of decoded extrinsic LLR characteristic of the SISO polar decoder may be expressed as the EXIT     For the sake of comparison, Figs.11 and 12 provide the 3D EXIT characteristics of the UEC-SCAN-MIMO benchmarker of Fig. 2 when I max inner = 1 internal iteration is used within the upper and lower SCAN decoders.More specifically, Fig. 11 shows the EXIT characteristic of this benchmarker from the point of view of the 2 × 2 QPSK MIMO detector for transmission over an uncorrelated narrowband Rayleigh fading channel having SNR= 3 dB.Finally, Fig. 12 shows the EXIT characteristics of the UEC-SCAN-MIMO benchmarker from the point of view of the UEC decoder, when using r = 4 states and the zeta distribution parameter of p 1 = 0.797.

D. TRAJECTORIES
The proposed three-stage concatenated scheme can be further validated by measuring the MI obtained during an iterative decoding process, when employing the {MIMO, upper polar component, UEC, lower polar component} decoder activation order.As an example, Table 4 characterises the iterative exchange of extrinsic information between the components of the proposed three-stage concatenated UEC-hybrid polar decoder-MIMO scheme for up to two iterations.More specifically, when the MIMO detector is activated for the first time using the a-priori MI of I( fa , f ) = 0, an extrinsic MI of I( fe , f ) = 0.5484 is obtained, as highlighted in Table 4.As shown in the block diagram of Fig. 2, the extrinsic LLR vector fe is passed through the deinterleaver π −1 2 in order to obtain the a-priori LLR vector ẽa , which accordingly has the same MI of I(ẽ a , e) = 0.5484.Hence as shown in both Figs. 9 and 10, the trajectory evolves along the axis label I(ẽ a , e), I( fe , f ) from the coordinates [I( fe , f ), I( ũe , u), I(z e , z), I(ẽ e , e)] = [0, 0, 0, 0] to the coordinates [I( fe , f ), I( ũe , u), I(z e , z), I(ẽ e , e)] = [0.5484,0, 0, 0], as shown in the first two rows of Table 4. Next, this encoded a-priori MI of I(ẽ a , e) = 0.5484 is entered into the upper SCAN polar decoder component of the proposed hybrid polar decoder, together with a decoded a-priori MI of I( ũa , u) = 0, in order to generate the decoded extrinsic MI of I( ũe , u) = 0.1306.As shown in the block diagram of Fig. 2, the decoded extrinsic LLR vector ũe is passed through the deinterleaver π −1 1 in order to obtain the decoded a-priori LLR vector za , which accordingly has the same MI of I(z a , z) = 0.1306.In this way, the trajectory evolves along the axis label I( ũe , u), I(z a , z), which can only be observed in Fig. 10, since Fig. 9 does not have this axis.Following that, the UEC decoder is activated and the extrinsic MI of I(z e , z) = 0.0801 is obtained, as highlighted in Table 4.
Similarly, as shown in the block diagram of Fig. 2, the extrinsic LLR vector ze is passed through the interleaver π 1 in order to obtain decoded a-priori LLR vector ũa , which accordingly has the same MI of I( ũa , u) = 0.0801.Hence, the trajectory evolves along the axis label I( ũa , u), I(z e , z), as shown in both Fig. 9 and in Fig. 10.Then, the lower G-SCAN polar decoder is activated in order to obtain the encoded extrinsic MI of I(ẽ e , e) = 0.6150.Similarly, as shown in the block diagram of Fig. 2, the encoded extrinsic LLR vector ẽe is passed through the interleaver π 2 in order to obtain a-priori LLR vector fa , which accordingly has the same MI of I( fa , f ) = 0.6150.In this way, the trajectory evolves along the axis label I(ẽ e , e), I( fa , f ), which can only be observed in Fig. 9, since Fig. 10 does not have this axis.Once this first iteration is completed, a second iteration begins with a trajectory that evolves again along the axis label I(ẽ a , e), I( fe , f ), where the MI of I( fe , f ) = 0.6800 is obtained, as highlighted in Table 4.This process continues until reaching the maximum affordable number of receiver-level iterations I max among the three stages.
By plotting the above trajectories between the 3D EXIT surfaces of the hybrid polar decoder and either the 2 × 2 MIMO detector of Fig. 9 or the UEC decoder of Fig. 10, the iterative decoding process can be validated.As seen in Fig. 9, the corner points of the stair-case-shaped decoding trajectory are in close agreement with both the 3D EXIT surfaces of the encoded hybrid polar decoder and of the 2 × 2 MIMO detector, which validates the accuracy of our EXIT functions.More specifically, if the trajectory reaches the encoded extrinsic MI of I(ẽ e , e) = 1, then this implies that an open EXIT tunnel exists and that the iterative decoding process converges to low block error rate [33].The trajectory of Fig. 9 indeed confirms that an open tunnel does exist for the UEC-hybrid polar-MIMO scheme at SNR= 3 dB, reaching the value of I(ẽ e , e) = 1, as shown in Table 4. Similarly, as shown in Fig. 10, the corner points of the stair-case-shaped decoding trajectory are also in close agreement with the 3D EXIT functions of both the decoded hybrid polar decoder and of the UEC decoder.Again, if the trajectory approaches a decoded MI of I( ũe , u) = 1, this implies that an open EXIT tunnel exists and that iterative decoding converges to a low block error rate.Similarly to Fig. 9, the trajectory of Fig. 10 confirms that an open tunnel does exist for the UEC-hybrid polar-MIMO scheme, since the decoded extrinsic MI of I( ũe , u) = 1 is reached in Table 4.As shown in Figs.11 and 12, the corner points of the staircase-shaped decoding trajectory of the UEC-SCAN-MIMO benchmarker are also in close agreement with the 3D EXIT surfaces.The corresponding trajectories are exemplified in Table 5. Observe that the proposed hybrid polar decoder has a wide open tunnel between the 3D EXIT surfaces compared to the SCAN polar decoder benchmarked.Hence fewer iterations are required for reaching the I( ũe , u) = 1 or I(ẽ e , e) = 1 points.More specifically, Table 4 shows that only two iterations are required by the UEC-hybrid polar decoder-MIMO scheme to reach the I( ũe , u) = 1 and I(ẽ e , e) = 1 points.By comparison, five iterations are required by the UEC-SCAN-MIMO benchmarked of Fig. 2. Hence, the proposed hybrid polar decoder scheme can be expected to have a better performance than that of the UEC-SCAN-MIMO benchmarked of Fig. 2, as it will be shown in Section VII.

VI. COMPLEXITY ANALYSIS
This section quantifies the computational complexity of the proposed hybrid polar decoder and compares it to that of the relevant JSCC and SSCC benchmarkers.Note that, the 2 × 2 QPSK MIMO demodulator complexity is identical for all schemes, where the computational complexity of the ML detector is in the order of O(M N t ), which grows exponentially with the number N T x of transmit antennas and the cardinality M of the QAM modulated-signal constellation [35].In this article, N T x = 2 and M = 4 are adopted in all the schemes considered.Additionally, in all the JSCC schemes considered, the complexity of the UEC decoder is common and depends on the number of states r used in the UEC trellis, as detailed in [22], [31].In this article, a UEC trellis having r = 4 states is adopted, since this was shown in [22], [31] to be sufficient for avoiding significant capacity loss.
The complexity of a polar decoder can be quantified by considering the number of Add, Compare and Select (ACS) operations performed [11], [21], [21].More explicitly, the complexity of the SCL decoder [11] mostly depends on the list size L, as well as on the encoded block length N, according to O(LN log N ) [11].By contrast, the complexity of the SCAN decoder depends on the maximum number I max inner of internal iterations, and the encoded block length N, according to O(I max inner N log N ) [12].Note that, the first step in each iteration of the G-SCAN algorithm is to carry out SCL decoding, while the second step is to activate the modified-SCAN decoding [21].Hence, the complexity of the G-SCAN algorithm depends mainly on the list size L, on the encoded block length N and on the maximum number of internal iterations I max In order to make a more detailed comparison, Table 6 compares the computational complexity of the various schemes considered in terms of the number of ACS operations performed by the various polar decoders, when adopting a polar coding rate of R i = 1/2, and encoded block length of N = 1024, as well as various list sizes L, I max inner internal iterations, and I max receiver-level iterations.Table 6 reveals that the proposed UEC-hybrid(2,L=2)-MIMO scheme imposes approximately 11% lower complexity than the JSCC benchmarker UEC-SCAN(2,2)-MIMO.This comparison is particularly relevant, since in Section VII we will show that UEC-SCAN(2,2)-MIMO is the best-performing JSCC benchmarker and it offers a similar SER performance to the proposed UEC-hybrid(2,L=2)-MIMO scheme for I max = 4. Furthermore, when increasing the list size from L = 2 to L = 4, the proposed UEC-hybrid(2,L=4)-MIMO offers 0.15 dB SER gain over the JSCC benchmarker UEC-SCAN(2,2)-MIMO at SER of 10 −2 , while still having 1.8% lower complexity, as shown in Table 6.As it will be shown in Section VII, although the proposed UEC-hybrid(2,L=4)-MIMO scheme may have higher complexity than the some of the benchmarkers, such as UEC-SCAN(2,1)-MIMO, it is the only scheme that outperforms the best-performing SSCC benchmarker, which is the Unary-G-SCAN(L=8)-MIMO scheme of Fig. 5.In terms of the two-stage serially concatenated SSCC benchmarkers, Table 6 shows that the Unary-G-SCAN(L=2)-MIMO scheme imposes 20% lower polar decoding complexity than the Unary-SCAN(2)-MIMO benchmarker, while providing 0.3 dB gain as the analysis of Section VII will show.

VII. PERFORMANCE ANALYSIS
This section characterises the error correction and error detection performance of the proposed hybrid polar decoder in the context of our three-stage serial concatenated scheme, which we refer to as the UEC-hybrid-MIMO scheme.We compare this to the relevant JSCC and SSCC benchmarkers of Fig. 2 and Fig. 5, respectively.More specifically, we begin by characterising the proposed hybrid polar decoder in the context of the UEC-hybrid-MIMO JSCC scheme for various system parameters in Figs. 13 and 14.Then we characterise and compare the various benchmarkers that we introduced in Fig. 15.Finally, we compare the proposed hybrid polar decoder to the best of the benchmarkers in Fig. 16.
In order to characterise the performance of the various schemes, the SER plot versus the signal-to-noise ratio per bit (E b /N 0 ) are recorded for the case of 2 × 2 Gray-mapped QPSK MIMO for transmission over an uncorrelated narrowband Rayleigh fading channel.Throughout our investigation, we adopted the polar coding rate of R i = 1/2 and the encoded block length of N = 1024.Furthermore, the source symbol values of x that obey a zeta distribution having a parameter value of p 1 = 0.797, are adopted as [22].Note that all benchmarkers considered have the same DCMC capacity bound of −2.26 dB, since they all have the same effective throughput of η = 1.5236, as detailed in Section III-A.
Fig. 13 illustrates the beneficial impact of increasing the numbers of receiver-level iterations I max = {1, 2, 3, 4, 5, 6}, when the proposed hybrid polar decoder is applied in the three-stage concatenated JSCC scheme of Fig. 2.More specifically, the proposed hybrid polar decoder adopted I max inner = 2 within the upper SCAN decoder, as well as the list size of L = 4 within the lower G-SCAN polar decoder.As expected, during the successive iterations between the outer UEC decoder, hybrid polar decoder, and the inner MIMO detector, the soft-decision estimates of the transmitted symbols become more accurate as and when more extrinsic information is iteratively exchanged.Fig. 13 shows that increasing the number of receiver-level iterations I max performed by the proposed JSCC UEC-hybrid-MIMO scheme improves the performance, owing to the iterative feedback gain offered by not only the inner 2 × 2 QPSK detector, but also by the outer UEC decoder.For example, increasing the number of receiver-level iterations from I max = 1 to I max = 2 provides more than 1.5 dB gain.However, it may also be observed that the performance improvements gradually saturate, as the number of receiver-level iterations is increased.For example, increasing the number of receiver-level iterations from I max = 3 to I max = 4 provides only 0.2 dB gain.Hence, as mentioned in Section VI, I max = 4 may be recommended for striking an attractive performance vs. complexity trade-off for the proposed UEC-hybrid-MIMO scheme.
Fig. 14 illustrates the impact of adopting various numbers of inner iterations I max inner within the upper SCAN decoder, as well as various list sizes L for the lower G-SCAN polar decoder, when the proposed hybrid polar decoder is applied in the three-stage concatenated JSCC UEC-hybrid-MIMO scheme of Fig. 2. As may be expected, increasing the number of inner iterations I max inner of the upper SCAN decoder improves the performance.For example, increasing I max inner = 1 to I max inner = 2 provides 0.4 dB gain, when a constant list size of L = 2 is used by the lower G-SCAN decoder.However, further increasing the number of inner iterations from I max inner = 2 to I max inner = 3 provides only 0.2 dB gain.Hence, as mentioned in Section VI, I max inner = 2 may be recommended for striking an attractive trade-off between the performance and the complexity of the proposed hybrid polar decoder.
Similarly, when a constant number of inner iterations I max inner is used for the upper SCAN decoder, increasing the list size used in the lower G-SCAN decoder improves the performance.For example, increasing the list size from L = 2 to L = 4 provides 0.15 dB gain, when a constant list size of I max inner = 2 is used by the upper SCAN decoder.However, it may also be observed that the performance improvements saturate, as the list size is further increased.For example, increasing the list size from L = 4 to L = 8 does not provide any significant performance gain, despite increasing the complexity by about 15%.Hence, as mentioned in Section VI, a list size of L = 4 may be recommended for striking an attractive trade-off between the performance and complexity of the proposed hybrid polar decoder.The two-stage concatenated SSCC benchmarker schemes of Section IV are characterized in Fig. 15 for various polar decoders.More specifically, the Unary-SCAN-MIMO benchmark of Fig. 5 employs a SISO SCAN polar decoder, the Unary-G-SCAN-MIMO benchmark of Fig. 5 uses a SISO G-SCAN polar decoder, while the Unary-SCL-MIMO benchmark of Fig. 6 relies on a hard-decision SCL decoder.As mentioned in Section VI, while the SCAN and G-SCAN decoders carry out I max iterations with the MIMO detector, the SCL decoder is unable to iteratively exchange extrinsic information with either an outer or an inner decoder.Hence, in this scenario, the Unary-SCL-MIMO benchmark of Fig. 6 can only benefit from one-shot MIMO detection, polar decoding, and source decoding.Fig. 15 shows that the Unary-SCAN(1)-MIMO benchmark of Fig. 5 provides 0.3 dB gain by using I max = 4 receiver-level iterations at a SER of 10 −2 , when compared to the Unary-SCL(L=32)-MIMO benchmark of Fig. 6.Additionally, Fig. 15 shows that increasing the number of internal iterations I max inner of the SCAN decoder in the Unary-SCAN-MIMO benchmark provides only limited gain, despite increasing the polar decoding complexity.For example, increasing the number of internal iterations from I max inner = 1 to I max inner = 2 provides only 0.2 dB gain at the cost of doubling the polar decoding complexity, as mentioned in Section VI.Furthermore, Fig. 15 also shows that the Unary-G-SCAN-MIMO benchmarker of Fig. 5 outperforms all other SSCC benchmarkers.For example, the Unary-G-SCAN(L=2)-MIMO scheme provides 0.8 dB gain over the Unary-SCL(L=32)-MIMO benchmarker, as seen in Fig. 15.Additionally, Fig. 15 demonstrates that the Unary-G-SCAN(L=2)-MIMO benchmarker provides 0.3 dB gain over the Unary-SCAN(2)-MIMO benchmarker, while requiring 20% lower polar decoding complexity, as mentioned in Section VI.Clearly, the Unary-G-SCAN(L=8)-MIMO benchmarker is the best-performing SSCC benchmarker, which is capable of operating within 3.56 dB of the DCMC capacity bound at an SER of 10 −2 .
On the other hand, the three-stage concatenated JSCC UEC-SCAN-MIMO benchmarker of Fig. 2 is also characterised in Fig. 15.Here, the fixed decoder activation order of {MIMO, upper SCAN polar decoder, UEC, lower SCAN polar decoder} is employed until the maximum number of receiver-level iterations I max is reached.As shown in Fig. 15, the UEC-SCAN(2,1)-MIMO benchmarker offers superior performance compared to the UEC-SCAN(1,2)-MIMO benchmarker by providing 0.25 dB gain.This shows that the iterations performed by the upper SCAN decoder are more beneficial than those of the lower SCAN decoder.This may be explained by the higher mutual information that the SCAN decoder generates for its decoded extrinsic LLR vector ũe relative to that generated for the encoded extrinsic LLR de .Additionally, Fig. 15 reveals that the JSCC UEC-SCAN-MIMO benchmarker of Fig. 2 may outperform the SSCC Unary-SCAN-MIMO benchmarker.For example, Fig. 15 shows that the JSCC benchmarker UEC-SCAN(2,2)-MIMO provides 0.40 dB gain over the SSCC benchmarker Unary-SCAN(2)-MIMO, when they both employ I max = 4 receiver-level iterations.This reveals that increasing the maximum number of internal iterations only provides limited gain, even when the SCAN decoder exchanges extrinsic information with an inner decoder, such as the 2 × 2 QPSK MIMO detector.On the other hand, when the SCAN decoder is concatenated with both an inner and an outer decoder, higher gains may be achieved, since extrinsic information is beneficially exchanged amongst three concatenated decoders.However, although the SCAN decoder is iterated with both an inner and the outer decoder in the UEC-SCAN(2,2)-MIMO benchmarker, it is still unable to outperform the best SSCC benchmarker, namely the Unary-G-SCAN(L=8)-MIMO.
Fig. 16 compares the SER performance of the proposed hybrid polar decoder to that of the above-mentioned JSCC and SSCC benchmarkers, while considering different list sizes L, but a constant number of inner iteration, namely I max inner = 2.As shown in Fig. 16, the proposed hybrid polar decoder is the only one that is able to outperform the best SSCC benchmarker, which is the Unary-G-SCAN(L=8)-MIMO.More specifically, the proposed JSCC UEC-hybrid(2,L=4)-MIMO scheme provides 0.2 dB gain over the SSCC benchmarker Unary-G-SCAN(L=8)-MIMO at SER of 10 −3 .In comparison to a three-stage serial concatenated JSCC benchmarker, the proposed SISO UEC-hybrid(2,L=2)-MIMO scheme offers approximately 11% complexity reduction over the UEC-SCAN(2,2)-MIMO benchmarker, while achieving similar SER performance.Additionally, the proposed UEChybrid(2,L=2)-MIMO scheme achieves a 0.75 dB SNR gain, over the two-stage serial concatenated SSCC Unary-SCAN(2)-MIMO benchmarker.

VIII. SUMMARY AND CONCLUSION
Prior to this article, SISO polar decoders have not been characterized in a holistic MIMO transceiver context in a three-stage concatenated turbo architecture along with an inner and an outer decoder.Motivated by filling this knowledge gap, we have proposed a novel hybrid polar decoder capable of accepting a-priori LLRs pertaining to both the decoded and encoded bits, and producing extrinsic LLRs pertaining to both the decoded and encoded bits in return.More specifically, we proposed a hybrid polar decoder that inherits the error correction performance benefits of the SCL polar decoder, as well as the soft decision benefit of the SCAN decoder.To elaborate further, we adopted a SCAN decoder for processing the LLRs provided by the concatenated outer decoder and the G-SCAN decoder for processing the LLRs provided by the concatenated inner decoder.Furthermore, we presented the 3D EXIT chart analysis of our turbo-MIMO transceiver in order to characterise the iterative exchange of extrinsic information amongst the three concatenated codes leading to iterative detection convergence.
Our simulation results demonstrated that the proposed three-stage serially concatenated JSCC UEC-hybrid(2,L=2)-MIMO scheme is capable of outperforming the SSCC Unary-SCAN(2)-MIMO benchmarker by offering 0.75 dB gain.Furthermore, the proposed UEC-hybrid(2,L=2)-MIMO scheme imposes approximately 11% lower complexity than the best-performing JSCC UEC-SCAN(2,2)-MIMO benchmarker, when both schemes operate at a similar SER.Furthermore, increasing the SCL list size from L = 2 to L = 4 enables the proposed UEC-hybrid(2,L=4)-MIMO scheme to achieve 0.25 dB gain over the UEC-SCAN(2,2)-MIMO benchmarker, while improving 1.8% lower complexity.Our future work may consider different combinations of upper and lower polar decoders within the framework of the proposed hybrid polar decoder.Furthermore, other applications of polar codes in three-stage concatenated iterative receivers may also be considered.Finally, near-capacity irregular polar coding schemes may be designed by relying on the principles detailed in [36].

FIGURE 1 .
FIGURE 1. Detailed block diagram of the proposed hybrid polar decoder in the context of a three-stage serial concatenated scheme, where the first stage is an outer decoder, the second stage is a hybrid polar decoder, and the third stage is an inner decoder.

FIGURE 2 .
FIGURE 2. Block diagram of a three-stage serial concatenated scheme, where a UEC code is serially concatenated with a SISO polar code and a 2 × 2 QPSK MIMO modulator and the demodulator.
1] in order to obtain the interleaved UEC-encoded frames u1 = [0100] and u2 = [0110].After this interleaving operation, polar encoding is applied to the UEC encoded frame u, in order to obtain the polar encoded frame d of Fig. 2. In the example of the input vectors of u1 = [0100] and u2 = [0110] we may obtain N = 8 polar encoded output bits d1 = [11001100] and d2 = [0110110], respectively, when using the frozen bit pattern [0, 0, 0, u 1 , 0, u 2 , u 3 , u 4 ].Here, N represents the encoded block length, which must be a power of 2. Following this, flexible coding block lengths are supported by applying, rate matching to adjust the length of the polar encoded frame from N to E bits, giving a polar coding rate of R i = b • n/E .Then, the order of the E bits in the rate-matched frame e are rearranged by using the second interleaving operation π 2 of Fig.2

TABLE 3 .FIGURE 4 .
FIGURE 4. Discrete-input Continuous-output Memoryless Channel (DCMC) capacity of an uncorrelated Rayleigh fading channel for a 2 × 2 QPSK MIMO scheme, where N Tx and N Rx represent the number of the transmit and receiver antennas, respectively.

FIGURE 5 .
FIGURE 5. Block diagram of a serially concatenated SSCC scheme, where a unary decoder serially concatenated with a SISO polar decoder and a 2 × 2 QPSK MIMO detector.

FIGURE 6 .FIGURE 7 .
FIGURE 6. Block diagram of a non-iterative SSCC scheme, where a unary decoder IS serially concatenated with a SCL polar decoder and a 2 × 2 QPSK MIMO detector.
generates the extrinsic LLR vector fe , as a function of the corresponding a-priori LLR vector fa as well as of the signal received over the channel r.More specifically, the MI I( fe , f ) of fe is related to both the MI I( fa , f ) as well as to the channel's Signal-to-Noise Ratio (SNR).Hence, the 2 × 2 QPSK MIMO detector may be characterized by the EXIT function I( fe , f ) = F MIMO [I( fa , f ), SNR].Fig. 7(a) visualizes the Two-Dimensional (2D) EXIT function of the 2 × 2 QPSK MIMO detector for various SNR values.As shown in Fig. 7(a), a higher SNR value corresponds to a larger area under the EXIT function of the 2 × 2 QPSK MIMO detector, which results in a wider open EXIT chart tunnel for reliable communication [33], as we will show in Section V-D.

FIGURE 8 .
FIGURE 8. (a) 2D and (b) 3D EXIT function of the UEC decoder for different zeta distributed source symbols parameterized by various values of p 1 = {0.797,0.85, 0.9, 0.95}, for the case of using n = 1 encoded bit per transmission and r = 4-states.is obtained by deinterleaving the extrinsic LLR vector fe .In order to jointly characterise the UEC decoder, polar decoder, and the MIMO detector it is beneficial to include I( ũa , u) and I(ẽ a , e) in the EXIT function analysis.This motivates the 3D EXIT function plot of Fig. 7(b) in which I( ũa , u) is presented on one axis.Furthermore, I(ẽ a , e) is presented on a second axis together with I( fe , f ) owing to the is relationship established through the deinterleaver π −1 2 .Similarly, I( fa , f ) appears on a third axis together with I(ẽ a , e) owing to the relationship established through the interleaver π 2 .Note that, the EXIT function of the 2 × 2 QPSK MIMO detector does not depend on the MI I( ũa , u), hence the EXIT function of Fig. 7(b) does not vary along the corresponding axis.
(b) in which I(ẽ a , e) is presented along one axis.Furthermore, I( ũe , u) is presented along a second axis together with I(z a , z) owing to the is relationship established through the deinterleaver π −1 1 .Similarly, I( ũa , u) appears on a third axis together with I(z e , z) owing to the relationship created through the interleaver π 1 .Note that, the EXIT function of the UEC decoder does not depend on I(ẽ a , e) and hence the EXIT function of Fig. 8(b) does not vary along the corresponding axis.
function I( ũe , u) = F uncoded-polar [I( ũa , u), I(ẽ a , e)].Similarly, the MI of the encoded extrinsic LLR vector I(ẽ e , e) of the polar decoder relies not only on the MI of decoded a-priori LLR vector I( ũa , u), but also on the MI of the encoded apriori LLR vector I(ẽ a , e).Hence, it may be expressed as I(ẽ e , e) = F encoded-polar [I(ẽ a , e), I( ũa , u)].Since each output

FIGURE 9 .
FIGURE 9. 3D EXIT chart characteristics of the proposed encoded hybrid polar decoder from the point of view 2 × 2 QPSK MIMO detector for the case of transmission over an uncorrelated narrowband Rayleigh fading channel having a SNR= 3 dB, I max inner = 2 internal iterations being used within the upper SCAN polar decoder component, and a list size of L = 4 being used within the lower G-SCAN polar decoder component.

FIGURE 10 .
FIGURE 10. 3D EXIT chart characteristics of the proposed hybrid polar decoder from the point of view of the UEC decoder for the case of r = 4 states and the zeta distribution parameter of p 1 = 0.797.

FIGURE 11 .
FIGURE 11. 3D EXIT chart characteristics of the UEC-SCAN-MIMO benchmarker from the point of view 2 × 2 QPSK MIMO detector for the case of transmission over an uncorrelated narrowband Rayleigh fading channel having a SNR= 3 dB, I max inner = 1 internal iterations being used within the SCAN polar decoder.

FIGURE 12 .TABLE 4 .
FIGURE 12. 3D EXIT chart characteristics of the UEC-SCAN-MIMO benchmarker from the point of view the UEC decoder for the case of r = 4 states and the zeta distribution parameter of p 1 = 0.797.

TABLE 5 .
Example of Measuring the MI Trajectory of the Extrinsic LLRs Obtained During the Particular Simulated Iterative Decoding Process Between the Three-Stage Concatenated UEC-SCAN-MIMO Scheme When Employing a {MIMO, Upper Polar Component, UEC, Lower Polar component} Decoder Activation Order for Five Iterations I max = 5 inner performed within the modified-SCAN algorithm.Hence we have, O[(I max inner N log N ) + (LN log N )] [21].The complexity of the proposed hybrid decoder may be obtained by adding the complexity of the conventional SCAN algorithm and of the G-SCAN algorithm.More specifically, the complexity depends not only on the maximum number of internal iterations I max inner performed by the SCAN decoder but also upon the list size L of the G-SCAN decoder.Note that, in the proposed hybrid polar decoder only I max inner = 1 iterations are performed by the modified-SCAN algorithm within the G-SCAN decoder.Hence, the complexity of the G-SCAN polar decoder in the proposed hybrid decoder is O[(N log N ) + (LN log N )].

TABLE 6 .
Computational Complexity Analysis of Polar Decoder Algorithms in Terms of ACS Operations, as Well as the Required E b /N 0 [db] to Achieve a Symbol Error Rate (SER) of 10 −3

FIGURE 13 .
FIGURE 13.SER performance of the proposed UEC-hybrid(2,L=4)-MIMO scheme when using various numbers of receiver-level iterations I max = {1, 2, 3, 4, 5, 6} in the context of three-stage concatenated JSCC scheme, where the source is zeta distributed with the parameter p 1 = 0.797, and I max inner = 2 internal iterations within the SCAN decoder and the list size L = 4 adopted in the G-SCAN decoder of the hybrid polar decoder, when communicating over an uncorrelated narrowband Rayleigh fading channel.

FIGURE 14 .FIGURE 15 .
FIGURE 14. SER performance of the proposed UEC-hybrid-MIMO scheme, when adopting the various number of inner iterations I maxinner , as well as list sizes L, for the case of the source is zeta distributed with the parameter p 1 = 0.797, and I max = 4 receiver-level iterations, when communicating over an uncorrelated narrowband Rayleigh fading channel.

FIGURE 16 .
FIGURE 16.SER performance of the proposed UEC-hybrid-MIMO scheme and various benchmarkers, when adopting various list sizes L, for the case of the source is zeta distributed with the parameter p 1 = 0.797, and I max = 5 receiver-level iterations are applied when communicating over an uncorrelated narrowband Rayleigh fading channel.