A Very Fast Joint Detection for Polar-Coded SCMA

We propose a very fast-convergence joint iterative detection and decoding (JIDD) scheme for channel-coded sparse code multiple access (SCMA). In the conventional JIDD, all users’ channel decoding iterations are performed in parallel after all users’ variable nodes in the SCMA factor graph are updated. The proposed JIDD scheme, however, slices all channel decoding iterations into per-user channel decoding, and inserts them deeply into the factor graph. By doing this, message enhancement by channel decoding immediately propagates to the connected users’ messages in the factor graph, even within one message passing algorithm iteration, while maintaining the same total computational complexity per JIDD iteration. Numerical results confirm that in various link scenarios, the proposed scheme requires only two or three iterations for BER convergence, while the conventional JIDD scheme requires more than six iterations. In downlink scenarios, the proposed scheme achieves an even faster convergence rate. Moreover, in a uplink scenario with perfect power control, the converged BER of the proposed scheme is quite a bit lower than the conventional scheme, and the proposed scheme requires only two iterations to get the same BER level as the conventional scheme. Consequently, thanks to fast convergence, the proposed scheme dramatically reduces the overall computational complexity for achieving BER convergence. In addition, the fast convergence rate compensates for the multi-user detection latency issue, which is inherent in sequential algorithms, and the issue is further overcome by employing the group-wise sequential version of the proposed scheme.


I. INTRODUCTION
Sparse code multiple access (SCMA) is a non-orthogonal multiple access scheme that has attracted much attention in recent years. SCMA substantially improves spectral efficiency in linear sparse sequences through multi-dimensional shaping gain of codebooks [1]. In addition, a message passing algorithm (MPA) enables SCMA to achieve near-optimal multi-user detection (MUD) [2], [3]. However, there is a major challenge to be resolved in MPA-based detectors for more practical usages: poor MUD performance in low signal-to-noise ratio (SNR) regions due to high overloading factors [4]. Therefore, in order to overcome this issue, combining SCMA with channel coding is essential.
The associate editor coordinating the review of this manuscript and approving it for publication was Faissal El Bouanani .
As for improving MUD performance by combining SCMA with channel coding, iterative user detection and decoding schemes for channel-coded SCMA systems have been studied [5]- [8]. In [5], the authors proposed an iterative detection and decoding (IDD) scheme for Turbocoded SCMA systems. The IDD scheme, in which each outer iteration consists of two separate inner loop iterations for the SCMA detector and turbo decoder, can achieve considerable MUD performance improvement. However, this scheme requires a lot of iterations, causing high computational complexity. To overcome this problem, a more tight combination of SCMA detection and channel decoding (termed joint IDD (JIDD) throughout this paper to distinguish it from IDD [5]) was recently proposed [6]- [8]. This structure utilizes the output extrinsic messages of the channel decoder in the middle of one cycle of the MPA process, and thus, MUD performance is significantly improved, and BER convergence is accelerated, compared to the IDD scheme in [5].
In particular, the JIDD scheme for Polar coded SCMA systems shows an outstanding convergence rate, compared to rates with other channel-coded SCMA schemes such as turbo-coded or LDPC-coded SCMA systems [8]. This is due to the fast convergence characteristics of Polar code [9]. In addition, Polar code is drawing attention from its lower complexity compared to other channel codes [10], [11], and it has already been adopted for downlink control channels in 5G new radio (NR) interfaces. Although the JIDD scheme improves MUD performance from a Polarcoded SCMA system, the number of required iterations is still high (more than five) [8]. Therefore, it is very timely to study and improve Polar-coded SCMA systems, and in this paper, we mainly consider the Polar-coded SCMA system.
Meanwhile, in the context of fast-convergence SCMA without channel coding, a sequential MPA called a shuffled MPA (S-MPA) was proposed where the two processes for message update (i.e., updating messages from function node (FN) to variable node (VN) and updating messages from VN to FN) are shuffled [12], [14]. This scheme makes updated messages immediately propagate to other message updates connected in a factor graph, even within one MPA iteration, and thus, fast convergence is achieved. However, because S-MPA-based SCMA does not improve the converged BER level, but improves the convergence rate, channel coding needs to be combined for acceptable MUD in low SNR regions. The problem is that just combining S-MPA-based SCMA with channel coding or iterative channel decoding fails to conserve the benefits frome the S-MPA, since the decoding iteration is performed exclusively with the S-MPA iteration, and thus, there is no synergy between the S-MPA and channel decoding. Moreover, the aforementioned JIDD cannot be applied to the S-MPA. This is because the JIDD scheme performs channel decoding iterations in between two one-directional parallel message updates in the factor graph (i.e., FN-to-VN-messages-only updates and VN-to-FN-messages-only updates), and this does not allow VN/FN-shuffled message updates during one joint iteration of MPA and channel decoding.
The main goal of this paper is to develop a new Polarcoded SCMA system achieving fast convergence as well as improved MUD performance. We propose a new JIDD, termed user-sequential JIDD (us-JIDD), which constructively combines the sequential message update idea with JIDD to maximize the synergy between sequential message update and JIDD in terms of fast convergence capabilities. To this end, we first need to fundamentally solve the aforementioned issue, i.e., the structural conflict between JIDD and sequential message updates. The proposed us-JIDD scheme resolves this issue by slicing all channel decoding iterations into per-user channel decoding and inserting them into the SCMA factor graph as deep as the message update unit. Specifically, each user's channel decoding iteration is performed right after the corresponding user's message is updated in each MPA iteration. In addition, due to this tightly shuffled channel decoding and MPA, careful consideration should be taken in combining a message updated from the MPA process and the enhanced message from the channel decoder.
The contributions of this paper are summarized as follows: • We propose user-sequential JIDD, which constructively combines the conventional JIDD with the idea of sequential message updates in the MPA. So far, research on applying a user-sequential manner to channel-coded SCMA has not been reported. The proposed scheme can be extended to SCMA with other iterative channel codes, such as LDPC and turbo code.
• The proposed scheme brings a dramatic performance improvement to channel-coded SCMA. In various link scenarios, the proposed scheme requires only two or three iterations for BER convergence, and achieves an even faster convergence rate in downlink scenarios.
On the other hand, the conventional JIDD-based SCMA scheme requires more than six iterations. Moreover, the proposed scheme ameliorates the multi-user detection latency issue, which is typically inherent in sequential algorithms.
• In light of its low complexity from fast convergence without the latency issue, the proposed us-JIDD scheme is expected to be used as an essential component of nextgeneration SCMA, rather than an option. The remainder of the paper is structured as follows. In Sec. II, the channel-coded SCMA system model is presented. In Sec. III, we give an overview of the conventional JIDD scheme for the Polar-coded SCMA. In Sec. IV, we detail the user-sequential JIDD scheme for Polar-coded SCMA. In addition, we provide an analysis of the latency in JIDD versus the proposed us-JIDD scheme, and we provide a latency reduction method. In Sec. V, we provide an analysis of operational complexities of JIDD and the proposed us-JIDD scheme. Sec. VI presents simulated channel-coded BER as a function of the number of joint MPA iterations, and compares convergence speeds from the JIDD scheme with the proposed us-JIDD. In addition, we compare computational complexity and latency for JIDD versus the proposed us-JIDD. Finally, in Section VII, concluding remarks are provided.
The simulation code for the JIDD-based SCMA and the proposed us-JIDD-based SCMA can be downloaded from [13]. The numerical results in this paper can be regenerated by using this code.

II. SYSTEM MODEL
Consider a Polar-coded SCMA system where K users are multiplexed with a data bit sequence length of m. In the transmitter, the data bit sequence of user k, denoted by where N is the length of the coded bit sequence. Then, the Polar-coded bit sequence is interleaved and mapped to complex domain codewords by the SCMA encoder. A random interleaver is used between the SCMA encoder and the Polar encoder. The interleaved sequence is denoted by b N . Assume that the constellation size of SCMA is M , and every Q(= log 2 M ) bits of b (k) are grouped and mapped to a Ddimensional complex codeword. Thus, the total number of SCMA codewords per frame L is N /Q, and the lth SCMA codeword of user k is denoted by x In the SCMA system, K users' codewords are multiplexed and spread over D orthogonal resources and transmitted. SCMA codewords contain zero and nonzero elements, indicating the use of orthogonal resources. For more than K active users in a cell, the users are grouped with K members in each group, and the separate D resources are exclusively allocated to each of the multiple user groups. We consider a widely used SCMA codeword set where K = 6, D = 4, and M = 4 (a diagram is shown in [4], [6]- [8], [12], [15], [16], [18] and [20]). Fig. 1 shows the corresponding factor graph with variable nodes (VNs) for 6 (= K ) users and function nodes (FNs) for 4 (= D) resources, where u k denotes the kth VN for user k, and r d denotes the d-th FN for the d-th resource.
The received signal of the l-th SCMA codeword, denoted by y l = y l,1 , y l,2 , . . . , y l,D T , is expressed as where H (k) l is the channel matrix between the base station and user k at the l-th SCMA codeword, and is expressed as follows [8]: where h (k) l,d represents the channel gain for the d-th resource. The term z l = z l,1 , z l,2 , . . . , z l,D T is the AWGN vector, and z l ∼ CN 0, σ 2 I , in which σ 2 is the variance of z l,d . We consider a power-normalized SCMA codebook such that the total signal power per resource, i.e., K k=1 x (k) l 2 /D, is equal to unity [26], and power normalized channel gain 1. This signal model corresponds to the case when the data bit energy, E b , is 1 C r × D KQ where C r denotes the Polar code rate, D KQ is equal to the channel bit energy, and noise density N 0 /2 = σ 2 /2. Consequently, for a simulation with a given E b N 0 , we set

III. THE CONVENTIONAL JIDD SCHEME
In this section, we introduce the core concept of the conventional JIDD scheme, which is essential to understanding our scheme in Section IV. Fig. 2a shows the structure of the conventional JIDD, which tightly combines SCMA detection and channel decoding [6]- [8]. Hereafter, this conventional JIDD is simply called JIDD. First, each MPA iteration is split into two one-directional parallel message updates in the factor graph, i.e., FN-to-VN-messages-only updates and VN-to-FN-messages-only updates. The FN-to-VN-messages-only updates correspond to upward message passing in Fig. 1, and the VN-to-FN-messages-only updates correspond to downward message passing. The key idea in JIDD is to perform channel decoding iterations in between these two one-directional message updates, as shown in Fig. 2a, where the left side grey box indicates FN-to-VN-messagesonly updates, and the right side grey box indicates VNto-FN-messages-only updates. By doing this, the messages enhanced by channel decoding can be used in the middle of each MPA iteration.

A. JOINT MPA-BASED DETECTION IN JIDD
The joint factor graph including the MPA and channel decoding can be illustrated as shown in Fig. 3. In each joint iteration, steps 1 to 5 , described as follows, are performed sequentially and repeat until the BER converges.

1) STEP 1 (FN-TO-VN-MESSAGES-ONLY UPDATE)
Let ξ k and ζ d denote the set of FNs connected to VN u k and the set of VNs connected to FN r d , respectively. In addition, let {ξ k \ d} and {ζ d \ k} denote ξ k with r d omitted, and ζ d with u k omitted, respectively. In the JIDD algorithm, the FN-to-VN-messages-only updates follow the conventional MPA [8]. Specifically, the message from FN r d to VN u k , for complex codeword x (k) l at the τ -th iteration denoted by I (τ ) , is calculated as follows [12]: where I updated at the (τ − 1)-th iteration, and the initial value of I

2) STEP 2 TO STEP 4 (MAPPING AND CHANNEL DECODING PROCESS)
To perform a channel decoding iteration (between FNto-VN-messages-only updates and VN-to-FN-messagesonly updates, as shown in Fig. 2a), FN-to-VN-messages I (τ ) , as follows [8]: Then, the likelihood for each hypothesis of data bit b (l−1)Q+q = 1 , is calculated by the SCMA mapping function as follows [8]: As for the soft-input/soft-output (SISO) decoding algorithm required for the Polar decoding process in the JIDD scheme, the soft cancellation (SCAN) algorithm is used [8]. The SCAN algorithm converges quickly and requires less memory, compared to other SISO Polar decoding algorithms [8]. The input messages of the Polar decoders are in the form of a log-likelihood ratio (LLR), and thus, the bit likelihood of the SCMA detector is transformed into LLR form, denoted by LLR SCMA , as follows [8]: Then, the bit-extrinsic information sequence in LLR form, denoted by LLR B SCMA b (k) , is de-interleaved to be used as a prior information for the Polar decoder. The de-interleaved bit-extrinsic information sequence denoted by LLR B a c (k) is expressed as where −1 {·} is the de-interleaving function. We consider the factor graph structure of the Polar code shown in Fig. 1 from [22] or in Fig. 4 from [8], and we use the same notations and terms used in [8]. The Polar decoder based on the SCAN algorithm has two LLR messages: left messages and right messages. Let L  log 2 (N ),t is initialized to 0 or ∞, according to the index of frozen bits. Then, the decoding iteration is performed. More details of the SCAN algorithm and frozen bit index set are provided in [8], [22], and [25], or they can be referred to in our simulation code at [13]. VOLUME 10, 2022 After the decoding iteration is performed (i.e., one iteration of Polar decoding), the output extrinsic messages of the Polar decoder, denoted by LLR B Polar c (k) n , 1 ≤ n ≤ N , can be calculated as follows [8]: where α is the weight factor for the Polar decoder in the JIDD scheme [8], and R (k) N , and the interleaved messages, denoted by LLR B out b (k) , can be expressed as follows: is transformed into the likelihood probabilities for the binary hypotheses, which can be expressed as follows [8]: Then, these likelihood probabilities are re-mapped into the symbol likelihood probability, denoted by P x where b x (k) l (q) denotes the q-th bit of the Q data bits mapped to x (k) l .

3) STEP 5 (VN-TO-FN-MESSAGES-ONLY UPDATE)
In the VN-to-FN-message update process, the re-mapped symbol likelihood probability P x (k) l is merged into the message update process. Specifically, I is computed as

4) SCMA CODEWORDS ESTIMATION
The computation sequence from (3) to (14) is repeatedly carried out until the BER converges or the number of iterations reaches the parameter predetermined for the system, and then, estimated codewords x (k) l are computed as follows [3]: where χ denotes a set of all possible codewords for user k.

B. PARALLEL MESSAGE PASSING SCHEDULE OF JIDD
Note that in each step of the flow described in the previous subsection, all users' messages can be updated or decoded together and in parallel. By enclosing the operations that can be performed in parallel in each step of Fig. 3, denoted with a pair of braces, { }, the list of message updates and decoding iterations performed in each JIDD iteration, denoted by L JIDD , is expressed as follows: where the first and the last (the third) pairs of braces correspond to step 1 and step 5 , respectively, in Fig. 3. The expression r d → u k denotes the message update from FN r d to VN u k , i.e., the calculation of I (τ ) , and the expression u k → r d denotes the message update from VN u k to FN r d , i.e., the calculation of I . Note that each of these expressions (r d → u k or u k → r d ) implies multiple separate updates for all SCMA codewords belonging to the current channel codeword. The second pair of braces, { CD 1 , CD 2 , · · · , CD 6 }, corresponds to step 3 in Fig. 3, and CD k denotes the channel decoding iteration for user k with deinterleaving pre-performed. The two big arrows (⇒) in L JIDD represent the coded bit mapping and de-mapping process from or to the messages in the variable nodes, and thus, they correspond to step 2 and step 4 , respectively, in Fig. 3.

IV. THE PROPOSED US-JIDD SCHEME A. USER-SEQUENTIAL MESSAGE PASSING SCHEDULE FOR US-JIDD
In this subsection, we present the user-sequential message passing schedule in the proposed us-JIDD scheme for channel-coded SCMA systems. The main difference from JIDD is that the proposed us-JIDD scheme slices all channel decoding iterations into per-user channel decoding, inserting them into the SCMA factor graph as deep as the message update unit, as shown in Fig. 2b. To do this, we break the three processes in braces in L JIDD shown in Section III-B, and then fully rearrange the elements into a user-sequential schedule. In updating messages sequentially from user u 1 to user u 6 , without loss of generality, the list of message updates and decoding iterations in each us-JIDD iteration, denoted by L us−JIDD , is ordered as follows: L us−JIDD : 6 for all l's } ⇒ {CD 6 } ⇒ { u 6 → r 2 , u 6 → r 3 for all l's } from which it can be noted that in order to maximally accelerate message propagation within one joint iteration of the MPA and channel decoding, we employ three important rules in L us−JIDD as follows.
• Rule 1: Each VN update 1 is followed by a channel decoding iteration only for the user corresponding to the just-updated VN. For instance, if VN u k is updated from all FNs connected to u k and for all l's (for all SCMA codewords in the current channel codeword), then CD k is directly performed.
• Rule 2: Channel decoding iteration is followed by updating the FN nodes that are connected to the VN node just updated by channel decoding. For instance, if CD k is performed, then all FN nodes that are connected to u k are updated.
• Rule 3: Each FN update is followed by updating the VN of the next user to be channel-decoded. For instance, if the FN nodes that are connected to u k are updated for all l's, then the next user's VN node, i.e., u k+1 , is updated from all FN nodes connected to VN u k+1 for all l's. Note that L us−JIDD and L JIDD have the same operational elements, but the elements are listed in a different order. Therefore, JIDD and the proposed us-JIDD have identical computational complexity per joint MPA iteration. However, they have quite different message propagation acceleration. In JIDD, all users' channel decoding iteration results are used at the same time in between two one-directional parallel message updates in the MPA, i.e., FN-to-VN-messages-only updates and VN-to-FN-messages-only updates as shown in L JIDD . Whereas, in us-JIDD, each user's channel decoding iteration is inserted into a different timing position in the sequential MPA, as shown in L us−JIDD , and is sequentially and sparsely performed in the middle of the sequential MPA. Thus, even with a single joint MPA iteration, usersequential channel decoding gradually enhances message reliability, and then, L us−JIDD effectively achieves the number of channel decoding iterations equal to K , which significantly accelerates message propagation. The power of fast convergence with us-JIDD comes from this usersequential joint MPA mechanism.

B. JOINT MPA-BASED DETECTION IN US-JIDD
To accommodate the rearranged processing order in us-JIDD, as shown in L us−JIDD , the joint MPA needs to be modified accordingly. In this subsection, the mathematical operations 1 We define the term VN update as updating messages from all FNs connected to a single VN such that the phrase VN update for u k includes (r j → u k ) for all r j 's connected to u k and for all SCMA codewords in the current channel codeword, i.e., for all l's. The term FN update is defined likewise. of the us-JIDD scheme are described based on those for JIDD shown in Section III-A. Thus, only the modified parts from the JIDD scheme are described.

1) FN-TO-VN-MESSAGES UPDATE IN THE US-JIDD SCHEME
In the us-JIDD scheme, the update process r j → u k follows the S-MPA [12] (where the message update is performed in a sequential manner), rather than the original MPA (where the message update is performed in a parallel manner). Thus, Eq.
(3) is modified as follows: Note that, unlike (3), the message I (16) is the sequentially updated version in the middle of the current MPA iteration, i.e., the τ -th MPA iteration. Moreover, distinguished from the S-MPA, message I (τ ) (16) is enhanced further by sequential channel decoding in the middle of the current MPA iteration. This sequential message update scheme can utilize more-reliable and fresh messages, i.e., messages updated in the current iteration, and thus, it can achieve accelerated convergence. (which corresponds to the message from the SCMA detector) and the right message R (k) 0,c (k) n , and the weight between the two messages is adjusted by weight factor α. This is because, in JIDD, all users' messages are updated in parallel, and thus, the weighted combination of the Polar decoding message and the SCMA detector message is performed inside the SCAN algorithm.
Whereas, in us-JIDD, the users' Polar decoding messages are sequentially updated in the joint MPA, and thus, they need to be combined with the SCMA detector message right after they are updated. Hence, LLR B Polar c (k) n in (9) is modified as follows: and instead, the weighted combination of the Polar decoding message with the SCMA detector message is performed in the VN-to-FN-messages update stage, which is explained next. VOLUME 10, 2022 In the us-JIDD scheme, update process u k → r j in (14) is modified to where β is the weight factor in log scale for the symbol likelihood probability from the Polar decoder, i.e., P x (k) l . The role of β in us-JIDD is the same as that of α in (10) for JIDD in the sense that it adjusts the portions of the two messages, i.e., one from the Polar decoding iteration and the other from the MPA iteration in SCMA.

C. LATENCY ISSUE AND ITS SOLUTION 1) LATENCY ISSUE
The overall latency to get the final decoding result, denoted by T D , is given as (19) where T itr denotes the execution time required for one joint MPA iteration, and S denotes the number of joint MPA iterations required for BER convergence.
Recall that in L JIDD and L us−JIDD , all operational elements in each pair of braces can be executed in parallel. Whereas, the braces should be executed one after another, because the results of the preceding braces should be used for the operations in the next braces. Consequently, there exists an inevitable computation time equal to the sum of the processing times for each brace pair ({ }). Let T FN , T CD , and T VN denote the processing times for the three pair of braces in L JIDD , each of which corresponds to the FN-to-VN-messages update, the channel decoding iteration, and VN-to-FN-messages update, respectively. Then, T itr for JIDD, denoted by T (JIDD) itr , is expressed as On the other hand, the number of brace pairs in L us−JIDD is K times more than L JIDD , because the operations are grouped and reordered in user-sequential order. Instead, the operations in those three kinds of brace pairs in L us−JIDD are K times fewer than in L JIDD . Nonetheless, in order to consider the most pessimistic latency in us-JIDD compared to JIDD, let us assume that the computation times for the three kinds of brace pairs in L us−JIDD are identical to those for L JIDD in (20). Then, the worst case (i.e., the upper bound of) T itr for us-JIDD, denoted by T (us−JIDD) itr , is K times more than JIDD, as follows:

2) SOLUTION TO LATENCY ISSUE
We alleviate this latency issue in two ways. First, the fast convergence rate of us-JIDD inherently reduces latency, because overall latency T D is simply proportional to S, as shown in (19). The numerical results in the following section show that S for us-JIDD is substantially smaller than S for JIDD. Despite the reduced S in us-JIDD, T D of us-JIDD is still longer than T D in JIDD because T itr of us-JIDD is K times more than JIDD, as shown in (21). Hence, in order to fundamentally decrease scaling factor K for T itr in (21), we additionally apply a group method to us-JIDD. If we set the number of groups G to 3, then six (=K ) users can be grouped into three sets, i.e., {u 1 , u 2 }, {u 3 , u 4 }, {u 5 , u 6 }, and the list of message updates and decoding iterations, denoted by L G=3 us−JIDD , is structured as follows: for all l's } where the number of brace pairs decreases to half (= G/K ) of that for L us−JIDD . Therefore, K in (21) is replaced by G, and this is a T itr only 3 (= G) times larger than for JIDD. Moreover, L G=3 us−JIDD obtains exactly the same message updates as L us−JIDD . This is because the users are paired (grouped) to satisfy a condition that their VNs do not share resources in the SCMA code structure. For instance, Fig. 1 shows that VNs u 1 and u 2 are connected to disjointed resource sets, i.e., (r 2 , r 4 ) and (r 1 , r 3 ), respectively. This is the case for the other user pairs, (u 3 , u 4 ) and (u 5 , u 6 ), as well. Therefore, the message updates from or to the paired VNs (users) do not affect each other, and thus, even if they are performed in parallel, the message update results are the same as when done sequentially. Thus, L us−JIDD and L G=3 us−JIDD have identical detection performance, which will be confirmed in the next section. This also implies that even with a fully usersequential MPA by L us−JIDD for the considered factor graph structure in Fig. 1, the number of effective user-sequential message propagations within a single MPA iteration is 3, i.e., half the number of users.
Summing up, owing to the two factors of us-JIDD, i.e., 1) the substantially smaller S compared to JIDD, and 2) the reduced T itr in the grouping method with L G=3 us−JIDD instead of L us−JIDD , the overall latency of us-JIDD becomes equal to that of JIDD. This will be confirmed in the next section, too.

3) FURTHER ACCELERATION IN DOWNLINK SCENARIOS
Due to the sequential message propagation of us-JIDD, the user's message to be updated later becomes more reliable. Hence, in downlink scenarios where the user detection order in L G=3 JIDD can be set differently, user by user, we can further accelerate the convergence rate simply by setting the detection order so that the desired user's message is updated last. For instance, L G=3 JIDD illustrated in the previous section is the optimal detection order for the receivers of user 5 and user 6, who are updated last. In the receivers for user 1 and user 2, we only need to move the first three lines of L G=3 JIDD in the previous section to the end of the list, and in the receivers for user 3 and user 4, we only need to move the second three lines (lines [4][5][6] to the end of the list. Note that this further acceleration technique is not valid for uplink scenarios because the different orderings are not allowed in a single receiver on the BS side due to computation complexity, which is the main concern of this paper.

V. COMPLEXITY ANALYSIS
The overall number of operations for BER convergence, denoted by O C , is given as where O itr denotes the number of operations for one joint MPA iteration.
Recall that L JIDD and L us−JIDD contain the same operational elements. Therefore For reference, Table 1 provides details about O FN , O PD , and O VN [11], [23], where d f and d v are the number of branches connected to each FN and VN, respectively. Because O itr is the same for JIDD and us-JIDD, overall complexity O C in (22) is determined by S, which is different for JIDD and us-JIDD. In Section VI, we compare O C between JIDD and us-JIDD based on the simulated S values for both schemes. Table 2 shows the system parameters used in our simulations. The SCMA codebook used is shown in Section II.A of [26]. Fig. 4 shows the simulated channel-coded BERs as a function of the number of joint MPA iterations for a downlink scenario on an AWGN channel, where h (k) l,d is commonly set to 1 for all k, l, and d [27]. The S-MPA combined with IDD achieves worse BER performance than JIDD. This confirms  that the benefit from the of S-MPA (i.e., sequential message propagation within an MPA iteration) fails to be conserved in the IDD process. Remarkably, the proposed us-JIDD scheme converges to the same BER as the converged BER of JIDD after only two iterations, i.e., S = 2, whereas for JIDD, S = 6. In accordance with our analysis in Section IV-C1, the user grouping method (G = 3) for latency reduction achieves BER performance identical to that of the fully user-sequential method (G = 6 (= K )). Also shown is that with message update order L G=3 us−JIDD , the users updated first, i.e., user 1 and user 2, require S = 3, whereas the users updated last, i.e., user 5 and user 6, require S = 2. This confirms the property of the proposed us-JIDD, mentioned in Section IV-C3, that BER convergence can be further accelerated by updating the desired user last in the user-sequential message update order. Fig. 5 shows the simulated channel-coded BERs as a function of the number of joint MPA iterations for an uplink scenario on a flat channel with perfect power control, where h (k) l,d is set to e jθ and random phase θ is uniformly distributed over [0 2π] and is identical for a different l and d, but independently generated for a different k. In the uplink scenarios, the designed coding gain from SCMA with the multidimensional complex codebook is not preserved due to phase mismatch among the signals received from the users. This is confirmed in Fig. 5, which shows the increased BERs compared to the downlink scenario in Fig. 4, despite the increased SNR. Note that the converged BER level of JIDD on a flat channel with perfect power control is unacceptably high. Meanwhile, the converged BER level with us-JIDD is significantly lower than with JIDD. This result is meaningful in that, for the flat channel with perfect power control, the proposed us-JIDD outperforms JIDD in terms of converged BER level rather than in terms of convergence speed. From a practical point of view, this out-performance can improve the convergence speed as well. For instance, if the target BER is the same as the converged BER with JIDD, the effective S of the proposed us-JIDD scheme is reduced to 2 because the proposed us-JIDD scheme requires only two iterations to get the same BER level as the converged BER with JIDD. Fig. 6 shows the simulated channel-coded BERs as a function of the number of joint MPA iterations for a downlink scenario on a Rayleigh fading channel, where h (k) l,d is a complex Gaussian random variable with unit variance, and is identical for a different k, but is independently generated for a different l and d [27]. The proposed us-JIDD has the same trend as that in the AWGN channel in Fig. 4, such as very fast convergence compared to JIDD, identical performance based on grouping method and further acceleration from the proper detection order in the downlink scenario. Most of all, the proposed us-JIDD scheme achieves the same BER as the converged BER with JIDD after only two iterations (i.e., S = 2), whereas with JIDD, S = 6. Also shown is that all these features hold for different SNRs. Fig. 7 shows the simulated channel-coded BERs as a function of the number of joint MPA iterations for uplink scenario in a Rayleigh fading channel, where h (k) l,d is a complex Gaussian random variable with unit variance, and is independently generated for a different k, l, and d [28].  Unlike the results for the uplink scenario on a flat channel in Fig. 5, there is no improvement in the converged BER performance, but very fast BER convergence is achieved. For E b /N 0 = 6 dB, the proposed us-JIDD converges when S = 3, whereas JIDD converges when S = 7. For E b /N 0 = 8 dB, both the proposed us-JIDD and the JIDD converge when S = 3, but the proposed us-JIDD almost converges even after two iterations, whereas the BER under JIDD after two iterations is significantly larger than under the proposed us-JIDD.

B. COMPUTATIONAL COMPLEXITY AND LATENCY COMPARISON
The ultra-reliable and low-latency communication (URLLC) service is one of the major service categories of 5G, and on downlink where the receiver is a small device, computational complexity and latency are more important system design factors, compared to uplink. Thus, in this subsection, we analyze computational complexity and latency in a downlink scenario with a Rayleigh fading channel.   (22), for the three schemes in comparison. Note that O itr in (22) is common to the different schemes, and thus, the relative difference in O C among the different schemes is determined by S in each scheme. Recall that in Fig. 6, S = 6 for the JIDD scheme, whereas S = 2 for both the us-JIDD scheme and the us-JIDD scheme with the group method. Hence, the proposed us-JIDD scheme saves 66.7% of the operations, compared to the JIDD scheme. Reducing computational complexity by decreasing S is a remarkable benefit of the us-JIDD scheme, especially on downlink because power consumption is more critical in small receivers such as user terminals, and thus, the low complexity-detection algorithm is preferable on downlink. Fig. 8b shows the total latency for BER convergence normalized by T (JIDD) itr , i.e., T D /T (JIDD) itr is plotted. Despite the reduced S, the latency of the us-JIDD scheme is longer than the JIDD scheme due to the sequential message update strategy. However, by using a group-wise sequential method, the us-JIDD scheme can reduce latency and achieve the same latency as the JIDD scheme. Summing up the results in Fig. 8a and Fig. 8b, we conclude that the us-JIDD scheme with the group method substantially reduces computational complexity while maintaining the same latency as the conventional JIDD scheme.

VII. CONCLUSION
In this paper, we proposed a novel, user-sequential JIDD scheme for a channel-coded SCMA system. In the proposed scheme, message enhancement from channel decoding immediately propagates to the connected users' messages in the factor graph within each joint MPA iteration. We showed that the proposed scheme significantly reduces computational complexity by dramatically accelerating BER convergence, compared to conventional JIDD schemes. We also confirmed that the latency issue with the sequential message update idea was alleviated by two factors: 1) a fast convergence rate, and 2) the group-wise sequential method. Therefore, we conclude that the proposed scheme is a competitive solution for channel-coded SCMA systems.