Low-Complexity CRC Aided Joint Iterative Detection and SCL Decoding Receiver of Polar Coded SCMA System

As the fifth-generation (5G) wireless networks’ key technology, the joint design of SCMA and polar code is concerned by the future communication system. In this paper, a CRC aided joint iterative detection and successive cancellation list (SCL) decoding (CAJIDS) receiver is proposed for the polar coded SCMA (PC-SCMA) system. For the receiver, the SCL decoder’s extrinsic messages construction algorithm is designed by using Bayes rule and soft cancellation algorithm. Additionally, the distributed CRC aided polar (DCA-polar) code and variable list size are used to reduce the receiver’s decoding latency and complexity. Simulation results demonstrate that the CAJIDS receiver has better error rate performance than the joint iterative detection and decoding (JIDD) receiver. It also outperforms the LDPC coded SCMA (LDPC-SCMA) system. Specifically, when code length <inline-formula> <tex-math notation="LaTeX">$N=256$ </tex-math></inline-formula> and code rate <inline-formula> <tex-math notation="LaTeX">$R = 1/3$ </tex-math></inline-formula>, the CAJIDS outperforms the JIDD and LDPC-SCMA 1.2 dB and 1.9 dB over the Rayleigh fading channel, respectively. It also shows that, compared with the fixed-list-size receiver, with a similar error rate performance, the CAJIDS receiver has lower decoding latency and complexity.


I. INTRODUCTION
With the development of mobile internet, more and more devices are connected to the mobile network. To meet the growing demand for mobile traffic, the fifth-generation (5G) wireless networks emerge as time requires. The typical application scenarios of the fifth-generation (5G) wireless networks consist of enhanced Mobile Broad-Band (eMBB), massive Machine Type Communications (mMTC), and Ultra-Reliable and Low Latency Communications (URLLC). For the 5G scenarios, non-orthogonal multiple access (NOMA) is an essential enabling technology to meet the heterogeneous demands on low latency, high reliability, massive connectivity, improved fairness, and high throughput [1]. Recently, many NOMA schemes were proposed for 5G, such as the sparse code multiple access (SCMA) [2], pattern division multiple access (PDMA) [3], The associate editor coordinating the review of this manuscript and approving it for publication was Jie Tang . multi-user shared access (MUSA) [4], low-density signatures (LDS) multiple access [5], and so on. Among the available NOMA schemes, SCMA is a coding-based multiplexing scheme, which can achieve an obvious increase in spectral efficiency compared to orthogonal multiple access (OMA). Its codes are mapped to multi-dimensional sparse codebooks for transmission. Due to the sparsity of the codebook, the message passing algorithm (MPA) [6] with moderate complexity can be used for multi-user detection (MUD).
Most of the current work on NOMA focuses on optimizing throughput and connectivity from the NOMA technology itself [7], [8]. Actually, the quality of service (QoS) of the NOMA system can be realized by using effective channel coding. Turbo code is adopted in many existing NOMA systems for the error correction [9], [10]. Followed by the turbo principle, the turbo decoding and the NOMA multiuser detection can be jointly processed so as to develop an outer-loop iterative multiuser receiver, which enhances the system performance significantly. Since the LDPC code can achieve good error-rate performance and is hardware-friendly for high-throughput realization when the code length is moderate, it is also used as the outer channel coding scheme for the SCMA system in some literature [11], [12]. Furthermore, according to the results in [12], the new radio (NR) LDPC-coded NOMA schemes have almost the same block error rate (BLER) performance as the corresponding turbo-coded NOMA schemes. As an alternative to the LDPC code, the polar code has attracted researchers to use it to improve the SCMA system's performance.
Polar code, which is discovered by Arikan [13], features a highly structured encoder and decoder that asymptotically achieve capacity on discrete memoryless channels. Arikan has proposed a hard-output successive cancellation (SC) decoder that achieved capacity in the limit of large block lengths. But for short and moderate lengths, the error rate performance for polar code with the SC decoding isn't as good as Turbo and LDPC codes. A successive cancellation list (SCL) decoding algorithm was proposed by Tal and Vardy [14], which is superior to the simple SC decoder and close to a maximum likelihood (ML) decoder. Then, [15], [16] proposed a cyclic redundancy check (CRC) aided SCL algorithm by concatenating the CRC with polar code, and further improved the performance of polar code. For short length packets, the performance of polar code, which uses CRC aided (CA) SCL decoding, is better than LDPC codes. Although all of these decoders offer better performance than the SC decoder, none provides the soft outputs essential for turbo-based receivers. The first soft-output decoder of polar code is belief propagation (BP) [17] decoder. It is reported to have a significant improvement over SC decoder but has very high storage and processing complexity. The authors developed a low-complexity soft-output version of the SC decoder called the soft cancellation (SCAN) [18] decoder. This algorithm produces reliability information for both the coded and message bits. The SCAN decoder achieves comparable performance with far less complexity and storage than the BP decoder. For long-length packet transmissions, the SCAN decoder is a good choice because of its low decoding complexity. However, the SCAN algorithm isn't maximum likelihood decoding, and it can't achieve the performance of SCL decoding with a large list size. According to 3GPP TS 38.212 [19], polar code is elected as the standard coding technique for the control channel in support of the eMBB service since it has an excellent error-correction capability based on CA-SCL decoding, especially for short-length packet transmissions.
Based on fully studying the research results of polar code, the joint design of polar code and NOMA technology has also made some progress. In [20], the authors proposed a three-stage channel transform structure of the polar-coded NOMA system and designed a joint successive cancellation (JSC) detection and decoding receiver. The JSC receiver makes a hard decision of the transmitted symbol of a specific user and then subtracts the estimated symbol from the received signal for reducing interference. Apparently, the JSC receiver isn't a soft-input-soft-output (SISO) iterative detection scheme. Based on the sequential user partition (SUP) method, the JSC receiver improves the system's performance but increases the processing latency. Reference [21] presented a joint list decoding algorithm with a sphere detection (SD) scheme for a polar coded SCMA (PC-SCMA) system. It employs a minimum mean square error (MMSE) estimate of the transmitted vector to reduce the number of visited tree nodes of the SD scheme and complexity. It also provides better performance than the conventional SUP method with independent SCL decoding. Similar to [20], due to the SUP receiver, the decoding latency of the system is also relatively high. Reference [22] provided an iterative multiuser detection framework, which consists of an MPA based multiuser detector and a SISO SC decoder. Benefiting from the soft re-encoding algorithm, which is concatenated to the original SC decoder, the resultant iterative detection strategy can obtain a salient coding gain. However, its performance is limited to the decoding ability of SC decoder. In [23], based on the joint factor graph of SCMA and polar codes, the authors have proposed a joint iterative detection and decoding (JIDD) receiving algorithm using the SCAN decoder. It improves the performance of the PC-SCMA system significantly while also save complexity costs. The simulation shows that the JIDD receiver has a better performance than the general iterative detection and decoding (IDD) scheme. However, the SCAN decoder's advantages are achieving a tradeoff among performance, complexity, and storage. It can't achieve the performance of the SCL decoder with larger list size and can't approach the ML decoder. Therefore, the error rate performance of the JIDD receiver isn't good enough. In particular, the JIDD-based PC-SCMA system's performance is worse than that of LDPC coded SCMA (LDPC-SCMA) at a low code rate. In [24], the authors proposed a serial CRC aided joint iterative detection and decoding(S-C-JIDD) scheme to improve the performance and reduce the computational complexity of the PC-SCMA system. Although the performance of the system is improved at the low E b N 0 , the performance is still limited by the SCAN decoder. To enhance the performance of the JIDD receiver, a straightforward approach is to use a better-performing polar decoder, such as the SCL decoder.
Guided by the idea of JIDD, our goal is to design a low-complexity receiver based on an SCL decoder. Two key issues need to be addressed in applying the SCL decoder to the PC-SCMA system. Firstly, to iteratively exchange the extrinsic messages between the constituent MPA detector and polar decoder, they must be SISO. Therefore, the hard-output decoding algorithms, such as SCL, can't be directly applied to the iterative receiver. The calculation of the extrinsic messages of the SCL decoder remains to be resolved. Secondly, with the SCL decoder, the performance of the PC-SCMA system can be improved with increasing list size at the cost of increased processing and storage complexity. On the premise of guaranteeing the performance, the receiver's complexity VOLUME 8, 2020 should be reduced as far as possible. We aim to tackle these challenges in this paper, and the main contributions are summarized as follows.
On the analysis of the SCL decoding algorithms for CRC aided polar (CA-polar) and distributed CRC aided polar (DCA-polar), we have constructed the extrinsic messages of the SCL decoder by Bayes rule and SCAN algorithm. On this basis, a basic joint iterative detection and SCL decoding (BJIDS) receiver is proposed for the PC-SCMA system. It improves the performance of the SCMA system significantly.
To reduce the receiver's complexity, we applied the DCA-polar code for the PC-SCMA system. Based on the early termination decoding, the decoding latency and complexity are reduced. Then, a CRC aided joint iterative detection and SCL decoding (CAJIDS) receiver is designed using variable list size. The users who terminate decoding improve their performance by increasing the list size in the next iteration, while other users maintain a smaller list size. A series of approximate relationships are developed to evaluate the decoding latency and complexity of the PC-SCMA system. Based on the tradeoffs of latency, complexity, and performance, we identified a set of key parameters for the CAJIDS receiver. The simulation results show that the CAJIDS receiver reduces the decoding latency and complexity while preserving the BJIDS receiver's performance. The rest of this article is organized as follows. Section II describes the preliminaries of polar codes in the 3GPP standard and SCL decoding for different polar codes. In Section III, the system model of CRC aided uplink SCMA is described. In Section IV, the SCL decoder's extrinsic messages are constructed and applied to the iterative receiver. Then, the CAJIDS algorithm is proposed to reduce the complexity. Section V provides the analysis of decoding complexity and latency for the PC-SCMA system. Section VI shows the error rate performance comparison of these schemes. Finally, conclusions are reached in Section VII, and the acronyms used in this paper are summarized in the appendix.

II. PRELIMINARIES
Based on the idea of ''using assistant bits for polar decoding'', besides CA-polar [25], several types of polar codes are designed and discussed in 3GPP meetings, including joint parity-check and CRC-aided (PCCA) polar [26], [27], Hash-CRC polar [28], and DCA-polar [29]- [31]. The various solutions differ in terms of the number of assistant bits, their usage (for error detection and/or correction), positions and values. In terms of error rate, the PCCA-polar uses the parity check bits for error correction in SCL decoder and gets better performance than other codes in short-block-size and low code rate. It also has a slightly higher decoding complexity than the others. At the same time, the Hash-CRC polar and DCA-polar have comparable performance to CA-Polar.
On the other hand, due to the early termination, the DCA-polar can save about 30% decoding complexity [31]. In this section, to reduce the complexity of the PC-SCMA system, we focus on the DCA-polar, which is used in the 3GPP standard [19] for the downlink control channel. By the way, CA-polar is considered as the baseline scheme.
A. POLAR ENCODING Fig. 1 shows the location relationships between the assistant bits and the information bits in the two different polar codes.

FIGURE 1. Positions of the information bits and CRC bits.
As shown in Fig. 1, the difference between DCA-polar and CA-polar is the positions of the parity bits. For the DCA-polar, after a permutation of the CRC generator matrix, some distributed CRC bits are relocated in the middle of an information block. In fact, this process is accomplished through interleaving.
Thus, the process of polar coding is described as follows: firstly, the information block u is encoded to b = {b 1 , b 2 , . . . , b K } by CRC encoder. In the case of DCA-polar, b is interleaved to b , otherwise it is not interleaved. Secondly, b or b , corresponding to CA-polar and DCA-polar respectively, are placed on the unfrozen bits of sequence a = {a 1 , a 2 , . . . , a N } as the input for the basic polar encoder, where the frozen bits of a is 0. Finally, the basic polar coding is performed on the constraint c = aG N , where G N is the generator matrix. The generator matrix can be defined , where ⊗ denotes the Kronecker product.

B. POLAR DECODING
Based on [16], in this paper, we introduce the logarithmlikelihood-ratio (LLR)-based SCL algorithm. In this algorithm, instead of deciding to set the value of an information bitâ i to either a 0 or a 1, both options are inspected. When decoding an information bit at each level i, each decoding path is split into two paths. However, to avoid the exponential growth of the number of decoding threads, as soon as the number of parallel decoding threads reaches the list size, L o , at each information bit, only the L o most likely paths (out of 2L o tentative) are retained. Each decoding path corresponding to the path metric, which is used to represent the probability of every path. Assume the input prior messages of decoder are z = {z 1 , . . . , z N }. For each bit i, the LLR of bitâ i can be described as l |u i is the partial likelihood and the past trajectory of the pathâ i−1 1 l . Thus, the corresponding path metric is defined as At level N , we can get the most likely pathsâ l and path For the CA-polar, with CRC aiding, the SCL decoder first discards the paths that don't pass the CRC and then choose the most likely path among the remaining ones. Note that the CRC check is performed after all of the information bits have been decoded.
For the DCA-polar, those parity check bits that are relocated in the middle of an information block are treated as information bits for error detection. Once an error is detected, the decoding is terminated. Fig. 2 shows the early termination algorithm that we simulate with an SCL decoder that treats the distributed CRC bits as information bits. At the initial stage, all the surviving paths are labeled as ''failure-free.'' Once a distributed CRC bit is decoded, the decoder would use it to check all the ''failure-free'' survival paths and labels one path as ''failure'' if it doesn't pass the error test. Note that we shouldn't prune these entries from the list, even though we know that they will fail the CRC. We should continue the decoding of this list entries; otherwise, we will damage the error detection capability of the CRC. If and only if all paths are labeled as ''failure,'' early termination is activated, and we can't get the complete candidate path.  Fig. 3 shows a CRC aid uplink SCMA system. In this paper, we focus on polar coding, which is also applicable to LDPC or Turbo codes. Information bits of V users U = {u 1 , u 2 , · · · , u V } are encoded to C = {c 1 , c 2 , · · · , c V } by the CRC and Channel encoder, where for each user v,

III. UPLINK SCMA SYSTEM MODEL
Then, each user's coded bits are interleaved by the random interleaver, which denoted by s J v by SCMA mapper, and the overall number of SCMA codewords is L = N Q. Thus, the overloading factor of SCMA is defined as λ = V J and the structure of an SCMA code can be represented by an J × V binary mapping matrix F. As shown in Fig. 4, an SCMA factor graph with 6 variable nodes (VNs) and 4 function nodes (FNs) is illustrated, where VNs and FNs representing users and resources, respectively. The corresponding mapping matrix is denoted as  Supposing the channel gains matrix between the base station and user v at the l-th block is denoted as where h l,j v represents the channel gain between user v and resource j at the l-th block. Thus, at the receiver, the l-th received symbols can be described as where 1 ≤ l ≤ N Q, y l = y l,1 , y l,2 , · · · , y l,J T and T . n l = n l,1 , n l,2 , · · · , n l,J T is a complex additive white Gaussian noise (AWGN) vector with the mean vector 0 and covariance matrix N 0 I, and N 0 denotes the variance of Gaussian noise. At the receiver, the received symbols y l are decoded by the MPA detector firstly. The output extrinsic messages of the MPA detector are in the form of LLR, and de-interleaved to Then, z v are decoded by channel decoder as a priori information. The output extrinsic messages L v e is then interleaved and entered into the MPA detector as a priori information. After several iterations, the receiver converges and outputs each user's decoded information bits û 1 ,û 2 , · · · ,û V . In this system, the CRC plays an important role in designing the polar decoder's extrinsic messages and iterative receiver. The detailed algorithm is given in Section IV.

IV. JOINT ITERATIVE DETECTION AND SCL DECODING RECEIVER
To improve the PC-SCMA system's performance, we choose the SCL algorithm as the decoding algorithm of polar code. The joint iterative receivers are designed based on the MPA detector and SCL decoder.

A. BASIC JOINT ITERATIVE DETECTION AND SCL DECODING
As described in [23], in the JIDD receiver, the extrinsic messages are exchanged between the MPA detector and SCAN decoder. In particular, the MPA detector has no inner iteration. The information updating on VNs completely depends on the extrinsic messages fed back by the SCAN decoder through outer iteration. Therefore, the performance of the JIDD based PC-SCMA system is limited by the SCAN decoder, even worse than the LDPC-SCMA system at a low code rate. This section proposes a joint iterative and decoding algorithm based on the SCL decoder. To distinguish it from the improved algorithm presented later, we call it the basic joint iterative detection and SCL decoding algorithm.
To better describe the algorithm, we define some notations as follows: -M g i →q j : The messages passing from the i-th FN g i to the j-th VN q j . -M q j →g i : The messages passing from the j-th VN q j to the i-th FN g i . -Q j : The set of FNs that connect to VN q j .
-{G i }: The set of VNs that connect to FN g i . -Q j \i : The set Q j with FN g i excluded.
-{G i \j}: The set {G i } with VN q j excluded.
-P s l v : The prior information of the l-th SCMA codeword for user v.

1) FUNCTION NODES OF MPA DETECTOR UPDATE PROCESS
The FN g i of the MPA detector update its information and passes it to the connecting VNs, when the receiver receives signal y 1 , . . . , y l , . . . , y L from the channel. The messages passing from the i-th FN g i to the j-th VN q j can be expressed as Then, we can calculated the MPA detector's bits extrinsic messages by where s l v |q m v = x is the SCMA codewords set whose elements satisfy the mapping relationship

2) PRIORI INFORMATION OF MPA DETECTOR UPDATE PROCESS
When the MPA detector's extrinsic messages z v input into the polar decoder, the polar decoder will do the decoding process using the CA-SCL algorithm. For the hard-output CA-SCL algorithm, the polar decoder needs to reconstruct the soft information and then fed it back to the MPA detector as the prior information. The detailed process is as follows.
After the decoding of the N -th bit, the SCL decoder of the user v outputs the most likely pathâ l v and corresponding path metric P l v , 1 ≤ l ≤ L o . For each candidate path l , the corresponding codeword can be obtained by encoding the We use the Bayes rule to calculate the likelihood information of each codeword bit. Firstly, the path metric P l v of each candidate path is normalized by Then, the probability of the bit c i v can be written as 220112 VOLUME 8, 2020 Thus, the extrinsic messages of the SCL decoder L v e = L v,1 e,polar , . . . , L v,N e,polar can be calculated by Assuming that the l -th codeword is selected as the decoding result. In other words, it can pass the CRC detection and has the highest reliability. We can correct L v,i e,polar by The above method is named as Bayes construction (BC) algorithm. It is obviously applicable to the CA-polar code. However, the list size and CRC detection results determine its accuracy. When the list size is very small and the check fails, it will provide an error extrinsic message. On the other hand, for the DCA-polar code, the full candidate paths can't be obtained when the CRC check fails, and the above method can't be used. Therefore, we propose a hybrid construction (HC) algorithm to construct the extrinsic messages of the SCL decoder, as shown in Fig. 5. When the CRC check fails, the SCAN algorithm is used to decode and construct the extrinsic message. For the details of the SCAN algorithm and extrinsic message calculation, we refer to [18] and [23], respectively.
The extrinsic message of user v is interleaved to L v a = L v,1 a , . . . , L v,N a = L v e and transformed into the information of probability domain. Finally, it is re-mapped to the symbol message as the priori information of the MPA detector.

3) VARIABLE NODES OF MPA DECODER UPDATE PROCESS
The VNs update their information when they receive the prior information P s l v , and then sent it to the connecting FNs. After this processing, one iteration between the MPA detector and polar decoder has been completed. The message can be updated by Therefore, we can summarize the basic joint iterative detection and SCL decoding algorithm as Algorithm 1.

B. CRC AIDED JOINT ITERATIVE DETECTION AND SCL DECODING
Although the CA-SCL algorithm provides better performance than the SCAN algorithm, it also increases decoding complexity. The complexity of the SCL decoder is proportional to the list size. In this section, we focus on reducing the complexity of the iterative receiver.
In fact, at a high E b N 0 , we observed that for most of the received blocks, the SCL decoder with a minimal list size could successfully decode the information bits, and there are very few blocks that need a large list size for successful decoding. Therefore, a simple way of reducing complexity is to keep each user's list size as small as possible. A minimum list size L min is given to each user at the beginning of the iteration and is maintained in subsequent iterations if it is decoded correctly. In other words, only those users who fail to decode will increase the list size of the decoder in the next iteration. In fact, the block error rate is dominated by the probability that the correct path isn't in the list. In the next iteration, we essentially increase the probability that the correct path is in the list by increasing list size. Additionally, a maximum list size L max is given to avoid excessive complexity. Thankfully, the CA-SCL decoding already provides the result of the check without additional operations. Also, the decoding complexity can be further reduced using the DCA-polar code since the early termination decoding. CAJIDS and BJIDS have the same function node and variable node process, and only the priori information is updated in Algorithm 2. Note that if early termination is activated in the last iteration, we will take the result of SCAN decoding as the output.
According to Algorithm 2, the maximum average decoding complexity per user can be approximately represented by min L max , 2 i−1 L min N log 2 N . Actually, with VOLUME 8, 2020 Algorithm 1 Basic Joint Iterative Detection and SCL Decoding Input: signal y 1 , . . . , y l , . . . , y L from the channel, maximum number of iterations I max Output: the decoding decisionû = û 1 , . . . ,û V . 1: Initialize:M q j →g i = 1 M 2: for iter_num = 1 → I max do 3: Update the function nods: 4: for l = 1 → L do 5: for v = 1 → V do for v = 1 → V do \\ Update the Priori Information 10: De-interleave the L v e,MPA to z v , and input to the polar decoder. 11: Run the polar decoder by the SCL algorithm, output the candidate sequenceâ l v and corresponding path metric P l v , 1 ≤ l ≤ L o .

12:
Construct the extrinsic message L v e .

13:
Interleave the L v e to L v a . 14: end for 15: for l = 1 → L do 16: for v = 1 → V do 17: Transform the priori information L v a into the probability domain by (15). 18: Update the information of variable nodes by (16). 19: end for 20: end for 21: end for 22: Get the decoding decisionû = û 1 , . . . ,û V the improvement of channel conditions, the average complexity approaches to O L min N log 2 N . However, the effect of early termination is ignored in the approximate formula, and the decoding complexity of the system can only be estimated roughly. In the next section, we analyze in detail the impact of the proposed algorithms on decoding complexity and latency.

V. DECODING LATENCY AND COMPLEXITY ANALYSIS
This section investigates the influence of the BJIDS/CAJIDS receiver on the decoding latency and complexity of the SCL decoder. We develop a series of simple, approximate relationships like [32] to provide an intuitive understanding of decoding latency and complexity. Additionally, the complexity and latency of building extrinsic messages are negligible compared to the SCL decoding.

A. DECODING LATENCY
Due to the latency for SCL decoding is based on SC decoding, we discuss the latency for SC decoding first. For an Algorithm 2 CRC Aided Joint Iterative Detection and SCL Decoding Input: signal y 1 , . . . , y l , . . . , y L from the channel, maximum number of iterations I max Output: Output the decoding decisionû = û 1 , . . . ,û V .
Update the function nods. 4: for v = 1 → V do \\ Update the Priori Information De-interleave the L v e,MPA to z v , and input into the polar decoder. 6: Run the polar decoder by the SCL algorithm, output the candidate sequenceâ l v and corresponding path metric P l v , 1 ≤ l ≤ L v . if Early termination is activated then 8: Calculate the extrinsic message L v e by SCAN decoder.
Modify the list size by 10: else 12: Calculate the extrinsic message L v e by (10) to (14). end if 14: Interleave the L v e to L v a . end for 16: Update the information of variable nodes. end for 18: Get the decoding decisionû = û 1 , . . . ,û V SC decoder, the detailed complexity and latency are depicted in Fig. 6. It is seen that the total clocks are 2N −2, the number of decoding clock cycles used after the i-th bit is decoded is approximately 2i − 2 for variable N . The decoding latency is linearly increased with the number of decoded bits in an SC decoder. For an SCL decoder, the latency is approximated by the sum of 2N − 2 cycles and the decoding latency of sorting for non-frozen bits. It is assumed that all the candidate paths are decoded in parallel. The sorting latency is assumed to be a single clock cycle. Therefore, the total decoding latency scl N , K will be 2N −2+K with K clocks for the sorting operation of K non-frozen bits.
For the t-th DCA-polar decoding, with list size L t , suppose the early termination happens at index P t . The decoding latency can be computed by where κ t stands for the number of non-frozen bits that aren't decoding after early termination. Denote 0 < t = N − P t < N as the number of bits that aren't decoding after early termination. Therefore, the latency saving is expressed as So, the latency gain can be defined as where T is the total number of decoding, D = { t |1 ≤ t ≤ T } is the set of un-decoded frozen/non-frozen bits and K = {κ t |1 ≤ t ≤ T } is the set of un-decoded nonfrozen bits. On the surface, the latency is independent of the list size. However, from the early termination decoding algorithm, the larger the list size L t , the smaller the probability of early termination, and the larger the corresponding latency scl P t , K − κ t . Therefore, the latency gain is constrained by the list size, in other words, it is inversely proportional to the list size.
The decoding latency gain over the AWGN channel is given by Fig. 7. Assume 6 users multiplexed over 4 orthogonal resources; the overloading is 150%. The mapping matrix has already been given in (3), and the corresponding SCMA codebook is designed according to [33]. The polar code is constructed by the Bhattacharyya parameter bound method [13]. The channel coding is DCA-polar code, and the CRC-8 with the generator polynomial g (x) = x 8 + x 2 + x + 1 is used. Additionally, the CRC interleaver pattern and rate-matching pattern are obtained according to [31] and [19]. The number of outer-loop iteration is set to 5. The parametric pair (L min ,L max ) refers to the minimum and maximum list size of the CAJIDS, and the list size of BJIDS is set to 16.
As shown in Fig. 7, the PC-SCMA system obtains different latency gains using BJIDS and CAJIDS receivers. For the BJIDS receiver, the latency gain is only provided by the early termination of the DCA-polar code. In addition to early termination, the variable list size provides additional latency gain to the CAJIDS receiver. Thus, the CAJIDS receiver can get more latency gain than the BJIDS receiver. In the case of (2,16), the CAJIDS achieves the most latency gain. As the list size increases, the latency gain decreases. The CAJIDS(4,16) and CAJIDS (4,32) have the similar latency gain at the majority of the E b N 0 since the same L min . Therefore, we consider the latency gain to be approximately determined by L min . In other words, the smaller the L min , the larger the gain.

B. DECODING COMPLEXITY
For an SC decoder, as shown in Fig. 6, the total sum of number of f functions and g functions are N log 2 N with half for f function and half for g function. Therefore, the total complexity is where C f and C g are the complexity of f function and g function, respectively. According to the analysis in contribution [34], the complexity of one f function is 3 additions and one comparison, where a comparison is assumed to have the same complexity as addition. The complexity of one g function is 2 additions. Thus, we can get C f = 4 and C g = 2.
For an SCL decoder with list size L t , the decoding complexity is the sum of L t C sc (N ) and the complexity of sorting for non-frozen bits. Typically, the sorting complexity grows exponentially with the numbers of path metrics to sort. For a list size of L t , each decoding path is extended by 2 possibilities. Technically, it would require a sort module with 2L t input path metrics. The complexity for sorting is given by Therefore, the total decoding complexity of SCL decoder is Suppose the C s sc ( t ) = ω f C f + ω g C g stands for the saved decoding complexity after the P t -th bit is decoded in SC decoder, where ω f and ω g are the number of f function and g function. From Fig. 6, when t is even, the saving can be VOLUME 8, 2020 computed using the recursive relations C s sc ( t ) = C s sc 2 n + C s sc t − 2 n , t = 2 n (25) where n = log 2 t . When t is odd, C s sc ( t ) can be calculated by Thus, for an SCL decoder with list size L t , the saved decoding complexity is expressed as Based on the above discussion, the decoding complexity of DCA-polar is given as L t , κ t ) (28) Therefore, the complexity gain can be defined as where L = {L t |1 ≤ t ≤ T } is the set of list size and L b is the list size of the benchmark decoder used for comparison. In this subsection, the list size of the benchmark decoder is set to L b = 16. Fig. 8 shows the average list size of BJIDS receiver and CAJIDS receiver. With the increase of E b N 0 , the average list size of CAJIDS receiver decreases gradually. The approximate complexity can be calculated by the the average list sizeL. For example, theL of the CAJIDS(4,32) is about 6.69 at E b N 0 = 4dB, where code rate is R = 1/2. Compared with the BJIDS receiver, the approximate complexity gain of the CAJIDS(4,32) is about 1 −L L b = 58.2%. However, this approximation ignores the complexity saving of early termination and sorting for non-frozen bits. Fig. 9 shows the complexity gain, which is estimated by (29). As shown in the figure, for the BJIDS receiver, the gain from early termination is less than 5%. For CAJIDS (4,32), the complexity is increased by 10% when the channel condition is particularly bad. In such a case, the list size increases in the iterations and larger than that of the BJIDS receiver. However, with the improvement of channel conditions, the average list size of CAJIDS receiver decreases gradually, and more complexity gain is obtained. It is seen that the majority of the decoding complexity gain of the CAJIDS receiver is more significant than that of the BJIDS receiver. Typically, the complexity gain at E b N 0 = 4dB is around 61.9%. This result is very similar to Fig. 8 but more accurate. Additionally, it is observed that the maximum complexity gains obtained by CAJIDS is determined by L min . The minimum gain depends on the maximum average list size. In general, the smaller the list size, the larger the gain. Thus, the CAJIDS(2,16) achieves the largest complexity gain.

VI. ERROR RATE PERFORMANCE EVALUATION
In this section, the simulation results are provided to evaluate the BLER and bit error rate (BER) performance of the uplink SCMA system over the AWGN channel and Rayleigh fading channel. The SCMA and polar codes are configured as in Section V. The LDPC encoder and rate-matching algorithm used in 3GPP standard [19] are employed. The decoding algorithm is log-BP with 30 inner-loop iterations. In the receiver, the MPA detection within one inner-loop iteration is employed for MUD. The number of outer-loop iteration between the MPA detector and channel decoder is set to 5.
In Fig. 10, we compare the BLER performance of CA-polar with BC and HC algorithm in the BJIDS scheme over Rayleigh fading channel. The code rate is R = 1/3, and the list size of SCL decoder is set to 4 or 16. The HC algorithm with list size 4 and 16 achieve about 0.5 dB and 0.1 dB performance gain at BLER = 10 −3 compared to the BC algorithm, respectively. From the result, when the CRC check fails, the HC algorithm can correct the extrinsic messages by SCAN decoding. The larger the list size, the closer the extrinsic information constructed by two algorithms, and the smaller the gain. In other words, when the list  size is small, we must use the HC algorithm to achieve better performance. Therefore, in the following simulation, we use the HC algorithm to construct the extrinsic messages of the SCL decoding. On the other hand, the SCL decoder, which is closest to the ML decoding, can provide better performance than the SCAN decoder. Thus, the performance of the system is improved remarkably. As shown in Fig. 10, no matter what kind of extrinsic messages construction algorithm is used, the performance of BJIDS is better than that of JIDD in [23].
In Fig. 11, we compare the BLER of BJIDS and the CAJIDS over the AWGN channel. Compared with the BJIDS receiver, the CAJIDS(4,32) has only 0.1 dB performance loss. The performance loss for other parameter-configured CAJIDS receiver is about 0.25-0.35 dB. Combining with the discussion in Section V, the CAJIDS(4,32) has achieved a good tradeoff among error rate performance, latency gain, and complexity gain. Therefore, in the following simulation, we use the configuration (L min ,L max ) = (4,32) for the CAJIDS receiver. Fig. 12 shows that the BLER performance of the BJIDS and CAJIDS over the Rayleigh fading channel. It can be   seen that there's almost no performance loss. Thus, we can conclude that the CAJIDS receiver preserves the BJIDS receiver's performance while reducing the decoding latency and complexity.     Fig. 14 show that the BLER and BER performance comparison between CAJIDS and JIDD with different code rates over the AWGN channel. Additionally, we apply the LDPC code to the JIDD scheme (the corresponding legend is given as LDPC-SCMA) and participate in comparisons. From Fig. 13, at a high code rate, the JIDD based PC-SCMA has better BLER performance than LDPC-SCMA over AWGN channel, but it also can be observed that the LDPC-SCMA system outperforms the JIDD based PC-SCMA 0.7 dB at the code rate R = 1 3. Besides, compared to the JIDD, the proposed CAJIDS scheme can obtain 0.4 dB to 1.5 dB gain at various code rate configurations. Importantly, compared with LDPC-SCMA, the CAJIDS scheme can also obtain 0.8 dB gain at the code rate R = 1 3. For BER performance, a similar result can be achieved, as shown in Fig. 14.
For the Rayleigh fading channel, the BLER and BER results are shown in Fig. 15 and Fig. 16, respectively. In analogy to these figures, for PC-SCMA system, the CAJIDS outperforms the JIDD 0.8 dB to 1.9 dB at various code rate configurations. Typically, at the code rate R = 1 3, the performance gain of DAJIDS achieves about 1.2 dB and 1.9 dB compared with the LDPC-SCMA and JIDD, respectively. Fig. 17 shows the BER performance comparison among CAJIDS, JIDD and S-C-JIDD proposed in [24]. Polar code is constructed by Gaussian approximation (GA) method. In addition, the CRC-10 with the generator polynomial g (x) = x 10 + x 9 + x 8 + x 7 + x 4 + x 2 + 1 is used in the case of N = 1024. From Fig. 17, we can see that CAJIDS receiver outperforms S-C-JIDD about 0.5 dB when N = 256 and R = 1/2. Fig. 17 also compares the BER performance of CAJIDS and S-C-JIDD when polar code extends to a lower code rate. It can be shown that CAJIDS can still outperform S-C-JIDD about 0.75 dB. In addition, the performance of S-C-JIDD is better than that of JIDD at low E b N 0 . With the improvement of E b N 0 , the performance of the two receivers approaches the same.

VII. CONCLUSION
In this paper, two iterative receivers, BJIDS and CAJIDS, are designed for the PC-SCMA system. The extrinsic messages construction algorithms of SCL decoder are designed by using the Bayes rule and SCAN algorithm. The simulation results show that the CAJIDS receiver can significantly improve the error rate performance of the PC-SCMA system while maintaining the lower decoding latency and complexity. It seems that the CAJIDS receiver has some advantages in terms of performance, latency, and complexity. However, we think there is still some improvement room for this scheme to combine the PC-SCMA with spatially coupled technology, high-order modulation, and probabilistic shaping.

APPENDIX NOMENCLATURE
The acronyms used in this paper are summarized in Tab. 1.