Power Allocation and Receiver Design for D2D Assisted Cooperative Relaying Downlink Systems Using NOMA

This paper describes a scenario where D2D assisted cooperative relaying downlink transmission utilizing non-orthogonal multiple access (DC-NOMA) in the same time or frequency resource. Aiming to enhance the performance including ergodic sum rate (ESR) of the whole system and output block error rate (BER) of detection receiver. Firstly, we investigate the convexity of ESR maximization optimization problem and then prove its non-strictly concave property. Unlike existing works of optimizing the power coefficient only in one-stage, a jointly two-stage power allocation scheme based on iterative search is proposed for the formulated ESR maximizing problem. It is proved that the proposed algorithm converges to a stable state within limited numbers of iteration. Secondly, in order to improve the detection performance of the NOMA system downlink receiver, we propose a novel receiving detection algorithm based on the max-log-map algorithm and design a new receiver structure adapted to this algorithm. Finally we confirm the feasibility and verify the enhancement of the receiver design. The numerical results illustrate that: i) the jointly two-stage power allocation scheme can achieve up to 45.6% of performance gains in term of ESR in comparison of with the one-stage optimizing scheme; and ii) the delay and BER of the receiver both decrease significantly using novel receiver detection algorithm in the case of high-order signal modulation and multi-user multiplexing.


I. INTRODUCTION
As one of numerous candidate multiple access technologies in the fifth-generation (5G) wireless network, non-orthogonal multiple access (NOMA) has been recognized as a promising enabling technique to improve the spectrum efficiency [1], [2]. The core idea of NOMA is enable to serve more than one user in each orthogonal resource block, e.g., a time slot, a frequency channel, a spreading code or an orthogonal spatial degree of freedom, but in different transmit power levels. Under the general principle of NOMA, a variety of mainstream non-orthogonal multiple access schemes have been derived, such as sparse code multiple access (SCMA) [3], [4], pattern division multiple access (PDMA) [5], [6], low density extension (LDS) [7], lattice partition multiple access The associate editor coordinating the review of this manuscript and approving it for publication was Wei Wang .
As PD-NOMA scheme has been researched in multiuser superposition transmission (MUST) study items of LTE in 3GPP [12], PD-NOMA is different from OMA in which multiple users are placed on resource blocks orthogonal to each other, such as frequency domain, time domain and code domain. The multiplexing on the same resource block is realized through differential power level allocation. PD-NOMA expands the new multiplexing domain and will increase the access density and spectrum efficiency in the form of orders of magnitude. Therefore, it is considered to be one of the most potential multiple access techniques. Using PD-NOMA as the multiple access technique, only the serial interference cancellation (SIC) is needed to separate the signals of multiple users at the receiving end. Therefore, although the signal to noise ratio (SNR) of the useful signal is reduced, the spectrum VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ efficiency (SE) can be increased as the result of multiple users sharing the same spectrum, and then the system can obtain a higher system capacity. Apart from invoking NOMA technique to reuse the frequencies of the cellular networks and to increase SE, D2D communication is also a promising technique which can relieve pressure of core networks as well as increase mobility without rerouting data through NodeB [13] in 5G scenario applications. By D2D pairs communicating in short distance at the same frequency, the SE and energy efficiency (EE) can be improved significantly. Many similar communication techniques have appeared around us, such as Bluetooth, Wi-FiDirect and FlashlinQ etc. The main distinction between them is that whether it can be permitted to work in the license frequency band.

A. RELATED WORKS
Since NOMA 1 technique has strong property of accommodation with other technologies, many studies have combined it with multi-input and multi-output (MIMO) [14], [15], D2D (Device-to-device) [15], [16], cooperative relaying system (CRS) [17]- [19] and other emerging technologies for computation-intensive tasks need to be offloaded to either the cloud or the computing resource at the edge of the cell, like heterogeneous cloud radio access networks (H-CRANs) [20] and mobile edge computing (MEC) [21], to improve user experience. Due to the ability to offer spatial diversity to mitigate fading, while resolving the difficulties of mounting multiple antennas on small communications terminals, cooperative NOMA (C-NOMA) communications has gained a great deal of attention [19]. Researches on C-NOMA generally focused on outage probability [17], [22]- [24], ergodic sum rate [25], [26], user fairness [27], [28], bit error rate [29], packet delay [30], and energy-efficient [20], [21]. In addition to verified the performance of the collaborative NOMA technique, some other works focused on the way to strengthen the performance of C-NOMA.
A previous scheme combing cooperative communication with NOMA was proposed in [17], which consists of only two users communicating with one antenna. For single-carrier C-NOMA (SC-NOMA), sub-channel allocation is not involved, and performance improvement is achieved mainly through optimizing the power allocation strategies [18], [31]- [33] of base stations, user/relay selection and relay model selection, where the last two were investigated in work [34]. Literature [18] analyzed the proximate expression of the average rate of the cooperative NOMA relay system (CRS-NOMA) and proposed a NOMA sub-optimal power allocation scheme. Based on the system model proposed in [18], the work of [31] proposed a new detection scheme, that is joint-decoding by adopting the maximum ratio combination (MRC) to improve the SNR at the receive end before SIC. By solving the derivative of the ESR with respect to the power allocation factor, the optimal power allocation 1 The word of NOMA refers to PD-NOMA in the following. factor can be obtained. Considering the impact of interrupt behaviors of the source-relay and relay-destination on system ESR, paper of [32] proposed an optimal power allocation scheme that minimizes the interruption probability to ensure ESR, then established a min-max problem, and proved its convexity, finally obtained the optimal power distribution coefficient through the derivative approximation. Taking fairness between central user and edge user into account, the study of [33] characterized the optimal power allocation with close-form expressions at BS and relay to maximize the achievable rate of users.
Unlike SC-NOMA system above, multi-carrier cooperative NOMA (MC-NOMA) involves sub-channel allocation [13], [35]- [37]. Considering joint optimizing sub-channel and power allocation in CRS-NOMA, work in [35] proposed the combination of stable matching algorithm and water injection power allocation scheme. Similarly, to enhance the system sum rate performance, paper of [36] proposed a novel solution based on many-to-one two-sided matching theory for the sub-channel assignment. While for power allocation, this work adopted sequential convex programming to transform the original power allocation problem into a convex one by DC algorithm. Apart from optimizing sum rate, paper of [13] investigated the sub-channel assignments and power control of the total transmission power. Remarkably, the heuristic solution is applied to D2D users pairing in this work. Furthermore, to maximize the energy efficiency of the D2D pair while guaranteeing the quality of service of cellular users, paper of [37] proposed a low-complexity iterative algorithm by analyzing the Karush-Kuhn-Tucker conditions to derive the close form expression of an optimal solution.
In addition, for NOMA receiver design, paper of [38] developed a joint user identification, channel estimation, and signal detection algorithm, and then proposed a rotationally invariant Gaussian mixture model with low-complexity to achieve signal detection. Changing from the joint multiuser matching pursuit detector, the adaptive multiuser matching pursuit detector is proposed in [39], where the detector does not require any prior knowledge about the user activity information. However, both of the above schemes are hardly to implement on a large scale because the time-consuming for collecting user status information before decoding is high. Giving up adopting the user state information awareness, a joint maximum ratio combination and SIC detection algorithm is proposed in [31]. Due to the enhancement of twostage communication in CR-NOMA, receiver can collect the greater SINR. Considering the strength of the downlink NOMA with multiple receiver structures, work in [31] proposed a new type NOMA transmitter and receiver designing scheme, where multiple user signals are jointly modulated at the base station (i.e. transmitting end) and the signal are verified at the user (i.e. receiving end).

B. MOTIVATIONS AND CONTRIBUTIONS
Recently, a novel scheme of D2D aided CRS using NOMA (DC-NOMA) is considered for a three target users [13], where the whole system works in the same frequency bandwidth, and the relay is predefined and configured with the ability of decoding and forwarding multiple messages. Furthermore, there is always a nearby user paired with relay. Comparing with cooperative relaying system using NOMA (C-NOMA), DC-NOMA proposes an efficient way to adopt the non-orthogonal D2D transmission at relay to improve achievable rate. However, this work only focuses on the scaling of the system rate with fixed transmit power under the limit of the signal to interference plus noise ratio (SINR).
In order to improve the SE and system capacity, meanwhile considering the difference in transmit power between BS and D2D, power allocation in the first stage attract attention a lot [40]. Based on the system model proposed in [13], work in [41] divided the original optimization problem into two stage to solve, and then explained the importance of the first stage, finally focus on the optimizing the power distribution coefficient in the first stage. However, it is worth noting that the second stage of D2D-NOMA transmission is as important as the barrel effect in the DC-NOMA system. Therefore, it is worth exploring to quantify the impact of the second stage of D2D-assisted transmission on the system.
As we know, NOMA allows more than one user to spread message in the same spectrum that means NOMA adds supernumerary interference signals at the transmitting end, so interference cancellation operations must be performed at the receiving end to obtain accurate target signal. At present, the mainstream interference cancellation technology at the receiver is SIC, because it has been maturely used in the in the 3 rd mobile communication [9], [42]. The basic idea of SIC is to eliminate multiple access interference step by step, that is, the receiving end first detects the strong interference user, then uses the channel information to reconstruct the user signal after obtaining the user signal estimate, and finally repeats the process for many times to achieve the removal multiple access interference from all users. From the information point of view, SIC does not have any sum rate loss. But if a non-ideal channel code is adopted for each user, an error propagation problem will occur, which means the decoding error of the strong user is not conducive to the decoding of the weak user [23]. Another problem with SIC is ''user delay'', because weak users should wait for decoding of the strong users. Therefore, using the corresponding signal detection strategy to reduce the influence of interfering signals has become the core of NOMA downlink receiver design.
Based on the above previous works, the main contributions of this paper are summarized as follows 1) The ESR maximization optimization problem of joint two-stage power allocation under the DC-NOMA system is proved to be a non-strictly and non-convex optimization problem by Hessian matrix. 2) Moreover, we propose a two-dimensional plane optimization scheme based on golden-section to solve such non-convex optimization problems. In addition, we also prove the convergence of the algorithm and obtain the algorithm complexity.
3) Finally, we propose a multi-user constellation interference cancellation (MCIC) algorithm for signal detection based on the improvement of the soft-decision signal detection. In addition, in order to perform the performance of the algorithm perfectly, the downlink receiver structure is redesigned.

C. PAPER ORGANIZATION
The remainder of this paper is organized as follows: Section II presents the DC-NOMA transmission system model with assumption, the ESR problem formulation for optimal power allocation and the analysis of convexity property of the formulated problem. Section III introduces the joint two-stage power allocation scheme and the verifies its convergence and global search capability. Section IV discusses the novel receiver design including the principle of the detection, a novel interference cancellation scheme and the feasibility of the design. Section V, respectively, discusses the performance gain utilizing power allocation scheme, and evaluates the strengthen of the new receiver. Finally, in Section VI, we provides the main conclusions of this paper.

1) NOTATIONS
E(X ) notes the expectation of the random variable (r.v.) X , means the function f (x) can be replaced as function g(x). Matrix calculation is used in the proof of this article, the operator |.| is to calculate the matrix determinant. The operator ∇ 2 is the second-order derivation of the function. For a r.v. X , X ∼ Nc(a, b) denotes r.v. X is a complex Gaussian variable with mean a and variance b.

A. SYSTEM MODEL AND ASSUMPTIONS
As shown in figure 1, the scenario in this paper contains a base station (BS), three cellular users UE1, UE2 and UE3, where the UE2 is closest to the base station with best channel condition. The UE1 is located at the edge of the cell, and the channel condition is the worst. UE2 can act as a relay for the UE1 in the two-staged NOMA-DC model. Meanwhile, UE2 can also communicate directly with the UE3 as the transmitter. In the first stage, BS sends signals to UE1 and FIGURE 1. DC-NOMA downlink system model. VOLUME 8, 2020 UE2 synchronously in NOMA. While in the second stage, UE2 plays as relay and D2D transmitters in order to forward signal after coding to UE1 and share its own massage to UE3 respectively in NOMA.
Hereafter, all the subscripts B, U1, R, U3 denotes base station, user UE1, relay (i.e. UE2) and user UE3 respectively. All the fading channel are assumed as Rayleigh fading where h XY S ∼ N c (0, β XY ) denotes the channel coefficient between X ∈ {B, R} and Y ∈ {R, U 1, U 3}. In this paper, we assume that the receiving end can utilize a fully SIC, each node in this system is equipped with one single antenna and the relay works in a half-duplex model using decode-and-forward strategy. In order to increase the SE, we adopt the C-NOMA scheme to obtain three independent data streams in two time slots.
During the first time slot, BS broadcasts superpositioncoded signal √ a 1 P B X 1 + √ a 2 P B X 2 to UE1 and relay in NOMA, where P B represents the total transmit power of BS, and a i denotes the power distribution coefficient for message X i . Then, the receivers of both UE1 and relay obtain the multiple messages respectively from BS as follows Treating symbol X 1 as noise, relay first decodes X 1 , and remove it using SIC from (1). After that, relay can obtain its own target SINR. The SINRs for symbols X 1 and X 2 received at UE2 are respectively given by Similarly, since the target signal for UE1 is the strongest with regard to UE1, the expression of SINR for UE1 is easy to get by UE2 in the edge is assumed as NOMA-far user from the viewpoint of BS, according to NOMA principal. UE1 is the worse user (contrarily UE2 is the strong user), and it should be allocated more power to transmit, so there should be a 1 > a 2 , and meets a 1 + a 2 = 1.
To further improve the SE as well as offload the traffic, during the second time slot, UE2 could establish link with proximate user (i.e. UE3) and send its own signal to UE3 when forwarding X 2 to NOMA-far user (i.e. UE1) in NOMA. The superposition signal in UE2 is The multiple messages in receivers UE1 and UE3 during the second time slot can be expressed respectively as where P R is the transmit power of relay, b 1 is the power allocation factor sent to the UE1, b 2 is the power allocation factor sent to the UE3. In the second stage, UE3 firstly decodes X 2 as a strong user and takes advantage of the SIC to remove X 2 and then obtains the target SINRs for X 3 as On the UE1 side, the SINR of the received signal X 2 are It is assumed that all the wireless channels are frequencyflat fast-fading channels, and h XY ∼ N c (0, β XY ) notes the channel coefficient which are independently random vari- The achievable rates available for each symbol are expressed as follows. Because the symbol X 1 has gone through two stages of NOMA transmission from formulas (3) and (5), its achievable rates is given by By the same token, the achievable rates of the X 2 are obtained by formulas (4), (8) and (10) Achievable rate of X 3 only taking place in the second stage is obtained by formula (9) where 1/2 of the various expressions indicates that the duration of the two stages t 1 and t 2 are equal.

B. PROBLEM FORMULATION AND ANALYSIS
The ergodic sum capacity of the DC-NOMA system can be expressed as where R Xi denotes the achievable rate correspond to Xi, E{·} is the operation of expectation, and E[R Xi ] = R Xi . Let ρ B = P B /σ 2 , ρ R = P R /σ 2 be the transmit SNRs for the BS and the relay respectively. R X 1 , R X 2 and R X 3 can be derived as follows, the specific derivation process please refer to Appendix.
210666 VOLUME 8, 2020 where Ei(·) is an exponential integral function, and it meets In this section, we aim to improve the whole system ESR by optimizing power transmit coefficients both a i and b i . From above, the total ESR maximization problem of P1 in this system is formulated as In view of the complexity of the complete formula, R sum is rewrite as where Since C on is fixed, the original optimization problem is equivalent to the maximum value of the first solution f 1 (a 2 ) + f 2 (a 1 , a 2 )+f 3 (b 2 ), that is the conversion optimization problem is f sum (a 1 , a 2 ) is a non-convex problem, and also it is not a strictly concave problem.
Proof: Since the constrains of problem above are linear, we only need to check the objective function f sum (a 2 , b 2 ). In the following, we show that the Hessian matrix H [43] in objective function above is not positive semi-define. Start with rewriting objective function above as the Hessian matrix H of above has the following structure [46, eq(3.352.4)] and e x ≈ 1 + x for small x at a high ρ, where E c is a constant. Assumed f sum (a 1 , a 2 ) has a second derivative, the approximation T can be written as It is hard to confirm whether the upper bound and lower bound of T with two variables a 2 , b 2 is greater or less than zero under uncertain environmental parameters. But we can derive that when ρ B , ρ R → ∞, T = 0. Therefore, H is not always a positive semi-define matrix. Thus, P1 can be treated as a non-convex optimization problem as well as not a strictly concave problem.

III. POWER ALLOCATION SCHEMES ANALYSIS FOR DC-NOMA SYSTEM A. FSO ALGORITHM FOR POWER ALLOCATION
For simplify, work in [41] proposed a FSO as Algorithm 1 to solve this optimization problem because it was analyzed that the transmission powers of the two stages are quite different. Generally speaking, the relay transmission power is one-tenth of the base station transmission power. Therefore, the optimization problem was divided into two parts to solve the problem. That is, primarily, optimizing the first-stage power allocation factor a 2 causing it plays a decisive role in system capacity improvement, then the second-stage power allocation factor b 2 is fixed.
Obviously, the ingenuity of the FSO algorithm was to transform the binary function into the unary function by ignoring secondary influence factors. Consequently, the maximum point could be obtained by derivation of the unary numbers of iteration, k and set A for storing the optimal power factor generated after each iteration. [43,eq(34), eq(35), (36)]a 2 = e ξ refers to [43, eq(37)]. 5. If a 2 < 0.5 then 6. A(i) = a 2 7. Substitute A(i) and b 2 intoR X 1 ,R X 2 ,R X 3 8.R sum =R X 1 +R X 2 +R X 3 9. End If 10.R * sum = E(R sum ) 11. End For 12. OutputR * sum function, although it derived a closed expression with low complexity o(K ), it sacrificed part of capacity enhancement.

B. TPG ALGORITHM FOR POWER ALLOCATION
Generally, to solve this optimization problem with linear constraints and being concave function, relax convex is an access to consider by modifying the form or definition of the problem. But we find it is hardly to split the above formula (18) into multiple convex functions. Thus, we consider a iterative scheme, a two-dimensional plane gold-section (TPG) optimization scheme to obtain sub-optimal value for power allocation coefficient.
The basic idea is to put a feasible rectangular area of a plane D = {(x, y)|a ≤ x ≤ b, c ≤ y ≤ d}. Divide it by 0.382 and 0.618 in the vertical and horizontal directions respectively, as shown in figure 2 which is divided into 9 small rectangles. Then each small rectangle is divided by 0.312 and 0.618 in the vertical and horizontal directions, and the cycle continues until the diameter of the rectangle is less than the specified precision. Finally, write down the one with the smallest function value of the center position among all the small rectangles. When the segmentation reaches the required accuracy, the last point and its function value are the global optimal solution.
Suppose the objective function f sum (a 2 , b 2 ) has upper and lower bounds on the closed area D = {(a 2 , b 2 )|0 < a 2 < 1, 0 < b 2 < 1}, the optimization problem is formulated as follows For the convenience of presentation, a 2 , b 2 are replaced by x, y respectively in the following algorithm for solving P3 in which we define the diameter of the n th divided rectangle i is φ i n , which is the minimization of two sides of rectangle i, [a n , b n , c n , d n ] i denotes position of the rectangle I after n th iteration. Algorithm 2 Iterative, Two-Dimensional Plane Gold-Section (TPG) 1. Input: ε ∈ R + , a = 0, b = 1, c = 0, d = 1, n = 0, center point is (x * 0 , y * 0 ), corresponding diameter for D is φ i n , the function value of this point is f * sum,0 , i = 1, 2, . . . , 9 n . 2. Calculate: Separately calculate locations of a 1 , b 1 , c 1 , d 1 , p 1 , p 2 , p 3 , p 4 , p 5 , p 6 , p 7 and p 8 in figure 2 For the first time, the rectangle is divided into 9 small rectangles, each of which has different diameters according to this method, but their diameters meet ϕ 2 i ≤ φ 1 = 0.618φ 0 , (i = 1, 2, . . . 8,9). For the second time, each of the 9 small rectangles obtained above is divided into 9 2 smaller rectangles in the same way as before, and their diameter will not be greater than 0.618 times the diameter of their parent rectangle, that is ϕ 2 i ≤ φ 2 = 0.618φ 1 (i = 1, 2, 3, . . . 81). Suppose the diameter of 9 n small rectangles obtained after the n th division is ϕ n i (i = 1, 2, . . . 9 n ), which meets ϕ 2 i ≤ φ n = 0.618 n φ 0 . Then after the n + 1 th division, the diameter of 9 n+1 small rectangles is ϕ n+1 i (i = 1, 2, . . . 9 n ), and it certainly satisfies the 0.618 times the diameter of its parent rectangle, that meets ϕ n+1 i ≤ φ n+1 = 0.618φ n = 0.618 n+1 φ 0 , ∀n ∈ N + thus, lim n→∞ ϕ n i = 0(i = 1, 2, 3 . . .). Given the function z = f (x, y) is continuous and it has a lower bound on region D consisting of the rectangle a, b, c, d. If a, b, c, and d are divided by 0.618, the minimum value of the objective function at the center of all small rectangles after the n th division is denoted as f * n . f * n is convergent and must be the globally optimal solution of function z.
Proof: Suppose the initial best point and the best value are (x 0 , y 0 ) * and f * , record the center positions of rectangles as a, b, c, d and their values of function to them respectively. After each small rectangle is formed, compare the function value at its center position with f * . If it is smaller, then replace (x, y) * and f * with the point and the function value of the point. The value after the n th permutation is denoted as (x n , y n ) * and f * n , such that f satisfies f * n > f * n+1 , and then according to the assumption that f has a lower bound on D. There is a bound below f * n , and it must be convergent. Theorem 2: The computer complexity of the proposed algorithm is of the order o(9 n ).
As shown in TPG, the complexity of the proposed algorithm mainly depends on the number of iterations in splitting the feasible region. For the first time, the rectangle is divided into 9 small rectangles, for the second time, each of the 9 small rectangles obtained above is divided into 9 2 smaller rectangles in the same way as before. Thus the diameter of 9 n small rectangles obtained after the n th division, segmentation after several separations can be seen as a computation. Thus the complexity of the proposed algorithm is of the order o(9 n ).
Theorem 3: The solution of algorithm must be the global optimal solution.
Proof: Finally, we prove the limit of f * n must be the global optimal solution. In fact, if the value of the function at a certain point (x 1 , y 1 ) is less than f * 0 , then it is known from the continuity of the function that there must be a small neighborhood U of this point, in which the function value of all points is less than f * 0 . From theorem 1, we know that the diameter of the constructed small rectangle tends to 0, so there must be a small rectangle completely falling into U . So the function value at the center of the small rectangle is less than f * 0 , which is the same as f * 0 , he center of all small rectangles, but it contradicts the minimum value of the function. Therefore f * n must be the globally optimal solution.

IV. A NEW RECEIVER DESIGN AND FEASIBILITY ANALYSIS IN NOMA DOWNLINK SYSTEM A. THE WORKING PRINCIPLE OF THE RECEIVING DETECTOR
The SIC detection receiver on NOMA can be divided into two categories according to the structure: symbol-level SIC detection receiver and codeword-level SIC detection receiver. After completing the demodulation process of the received signal, the symbol-level SIC receiver will perform a hard decision, and then, reconstruct the decision result (interfering signal re-modulation) to obtain an interference signal (cell edge signal) and move it out of the received signal. After that the received signal will enter the next level and repeat the operation until the target signal is detected successfully. The codeword-level SIC receiver performs soft decision, which means firstly receiver decodes the received signal after demodulation, afterwards re-encodes and re-modulates the interference signal. Compared with symbol-level SIC detection, channel decoding and re-encoding in codeword-level SIC detection brings additional computational complexity and increases detection delay. But it has error correction ability due to the execution of code-decoding processing, which improves the accuracy of the signal recovery of users at the edge of the cell, as well as reduces the deviation of interference signal reconstruction, ultimately makes the detection result more reliable.
In the codeword-level SIC detection receiver, the demodulated data needs to be descrambling and de-rate matching before performing the decoding operation. In order to reduce the error rate of the decoding process, the soft decision algorithm is often used for signal detection in the detection process. Similar to the signal detection performed in the LTE system, the soft decision signal detection of the NOMA system can use the Max-Log-Map algorithm. Assuming that there are 2 2 n constellation points on the modulation constellation diagram, the modulated complex symbol is expressed as: x = x 1 + jx Q , each of n information bits are mapped on the in-phase component x I and the quadrature component x Q , the mapping relationship between the complex symbol and the information bit is a 1 , a 2 , . . . , a 2n−1 , a 2n = a 1,xI , a 1,xQ , . . . , a n,xI , a n,xQ (29) From this mapping relationship, it can be seen that the odd-numbered bits in the information bits correspond to the in-phase components in the complex symbols, and the even-numbered bits correspond to the quadrature components in the complex symbols. Both of them are independent in the demodulation process, then the two-dimensional modulation constellation diagram is simplified to a one-dimensional modulation constellation diagram. The received signal is expressed as where ω is the AWGN signal with a mean value of 0 and a variance of σ 2 . When there is no error in channel estimation, the received signal after channel compensation is The specific steps of the max-log-map algorithm used to detect the received signal are Step 1: Calculate the input information bits a i , and a i , the logarithmic posterior probability ratio corresponding to the received signal y LLR(a i,x1 ) and LLR(a i,xQ ), where LLR(a i ) = ln p{a i = 1|y, h} p{a i = 0|y, h} (32) Step 2: Take a i,x1 as an example, let C0 and C1 be the symbol set of a i = 0 and a i = 1, α ∈ C 0 , β ∈ C 1 , we have LLR(ai, xI ) = ln Step 3: The maximum approximation function is used to simplify the calculation process and to reduce the calculation complexity Step 4: The max-log-map algorithm is used to calculate the LLR value of each information bit, and the value of each information bit can be restored by performing a soft decision based on the LLR value.

B. MULTI-CONSTELLATION MODULATION DIAGRAM INTERFERENCE CANCELLATION ALGORITHM
In this part, a multi-user modulation constellation interference cancellation (MCIC) for signal detection algorithm improved from the soft-decision signal detection algorithm is proposed. This algorithm uses multiple characteristic information (such as modulation mode, power allocation ratio, etc.) of multiple user signals to construct a multi-user joint modulation constellation to calculate the LLR value of each user information bit. With this algorithm, in addition to the redesign of the NOMA downlink receiver, the design of the NOMA downlink transmitter also needs to be improved, as shown in figure 3. The multi-user modulation constellation is constructed by jointly modulating information bits from different user signals on the same symbol according to the Gray mapping rule. It is worth noticing that although joint modulation is used, the specific modulation and coding scheme are still determined by the traditional NOMA downlink scheme. Considering the constraints of user fairness, for the central user and the edge user, the system adopts an appropriate power distribution scheme to maximize the sum rate.
The steps to design a multi-user modulation constellation diagram are as follows: Step 1: Superimposed and transmitted user signals x n on sub-carriers are arranged in descending order of transmission power Step 2: The plural symbols obtained after the modulation of the user signal x n are: x n = x n , + jx n , then the number of modulation bits in the multi-user modulation constellation diagram is 2m 1 , 2m 2 , . . . , 2m n(s) (37) The odd-numbered modulation bits correspond to the in-phase components x n , and the even-numbered modulation bits correspond to the quadrature components x x,Q . Therefore, the information bits of these n(s) superimposed user signals corresponding to the modulation bits on the constellation diagram can be set as a x 1 ,1,I , a x 1 ,1,Q , . . . , a x 1 ,m 1 ,I , a x 1 ,m 1 ,Q ;  a x 2 ,1,I , a x 2 ,1,Q , . . . , a x2,m 2 ,I , a x 2 ,m 2 ,Q ; . . . Step 3: According to the modulation method of each user signal, there are 2 2m 1 +2m 2 +···+2m n(s) modulation constellation points on the multi-user modulation constellation map. The modulation bits on each modulation constellation point are a 1 , a 2 , . . . , a 2m 1 ; a 2m 1 +1 , a 2m 1 +2 , . . . , a 2m 1 +2m 2 ; . . . a 2m 1 +2m 2 +...+1 , . . . , a 2m 1 +2m 2 +...+2m n(s) ; Step 4: The information bits of the user signal x n map the modulation bits on the multi-user modulation constellation map one-to-one a x n ,1,I , a x n ,1,Q , . . . , a x n ,m n ,I , a x n ,m n ,Q ↔ a 2m 1 + . . . + 2m n−1 +1 , . . . , a 2m 1 +...+2m n−1 +2m n (40) At the receiving end, using the mapping relationship between information bits and modulation bits, the LLR value of each information bit on the received signal is obtained according to the multi-user modulation constellation diagram. Then, the information bits of the transmitted signal can be obtained by sending the value of LLR to decoder. For example, in a cell containing only two users, the edge user UE1 and the center user UE2 both use QPSK modulation, and the transmit power P 1 allocated to UE1 is greater than the transmit power P 2 allocated to UE2. The information bits of UE1 (UE2) are a x1,1,I and a x1,1,Q (a x1,2,I and a x1,2,Q ), the modulation bits on the modulation constellation diagram as a 1 and a 2 (a 3 and a 4 ), then the LLR of a 1 , a 2 , a 3 and a 4 can be derived as where √ 2, Re{·} denotes the real part, and Im{·} denotes the imaginary part.
The design of the multi-user modulation constellation diagram enables the receiver of the NOMA system to calculate the LLR value of (40) for the downlink study of the NOMA system of multiple superimposed users at the same time, thus eliminating the need for the original NOMA receiver. The SIC processing process effectively reduces the number of operations and working delay of the NOMA receiver. Since the detection of multiple superimposed user signals can be performed in parallel through the multi-user modulation constellation diagram, the cancellation of multiple interfering user signals can also be performed in parallel. In the reception and detection process, the user signals are arranged according to the allocated transmission power and are divided into L user signal sets (L ∈ {l 1 , l 2 , . . . , l L }). The user signal set l 1 is selected at the first level of performing signal detection and interference cancellation and it enters the next stage until the detection is completed. The advantage is when a large number of user signals are superimposed on a sub-carrier at the transmitting end of the NOMA system or the user signal adopts high-order modulation, the receiver design is relatively easier to be implemented.

C. RECEIVER DESIGN AND FEASIBILITY ANALYSIS
The design and working steps of the two stages of the NOMA receiver based on the MCIC algorithm are shown in figure 4. The working steps of receiver design are given by decoding to obtain the information bits of each user signal in l 1 . 4) Use the information bits of each user signal in l 1 to reconstruct the interference signal to obtain the estimated signal set l 1 . 5) Remove the estimated signal set l 1 from the original signal to eliminate the multipath interference caused by the interference signal set l 1 . 6) The remaining received signals enter the next stage, and the above process is repeated until the desired signal is detected. In terms of the feasibility of the NOMA receiver, the MLM algorithm is used to calculate the LLR value of each information bit and the MCIC algorithm is used to calculate the LLR value of each information bit. The number of addition operations and multiplication operations in these two algorithms are equal to the number of modulated bits of the user signal, so the MCIC algorithm will not add additional computational complexity in the algorithm flow. Moreover, the MCIC algorithm constructs a multi-user modulation constellation diagram at the base station transmitting end, so that the detection and elimination of multiple received signals can be completed at the user receiving end in the same time. Compared with the symbol-level SIC receiver and the codeword-level SIC receiver, it reduces the number of receiver processing stages and effectively reduces the computational complexity.

A. POWER ALLOCATION SIMULATION
This section presents the numerical results for capacity performances of DC-NOMA with FPA, FSO and TPG respectively. Suppose the cell radius is 500 m, the number of cellular users is 3, the minimum rate of UE1, UE2 and UE3 users is limited to r min 1 = r min 2 = r min 3 = 1 bps/Hz. Setting the relay or D2D transmitter transmit power P R is 1/10 of the base station transmit power P B , and assuming that all receiver noise is equivalent. The channels are all small-scale fading with independent distribution of Rayleigh fading. Without loss of generality, the average channel gain of each Rayleigh fading channel is assumed to be β BU 1 = 0.1, β BU 2 = 1.5, β U 2U 3 = 0.5 and β U 2U 1 = 2. The variance of transmit SNRs between BS and relay is ρ R = ρ B /10.r max = 35dB. Figure 5 shows the variation of ESC with the two-staged power allocation factors a 2 , b 2 . When a 2 → 0 and b 2 → 1, the entire communication system is actually an OMA direct communication. When a 2 → 1 and b 2 → 0 or a 2 → 1 and b 2 → 1. It is a traditional two-user C-OMA system. Obviously, their ESRs are very low compared to DC-NOMA. Especially, the bright part of figure shows ESR can reach the maximum value when a 2 , b 2 ∈ (0, 0.5). Therefore, it is meaningful to find a proper optimal allocation of the two-staged power factors of the system.  Figure 6 shows a set of values randomly selected from the results of over a thousand calculations under a unified computer environment. The values of a 2 , b 2 stabilize after 7 iterations of calculating within 100 ms and time consumption will be further improved if industrial-grade computing equipment is applied. According to the parameter pre-setting in DC-NOMA transmission model above, power allocation coefficients a 2 , b 2 in these three algorithms can be summarized in table1.  In table 1, the values of a 2 and b 2 in FPA are estimated according to channel conditions of all links or figure 5, the value of a 2 refers to [42, eq(37)], while the values of a 2 , b 2 in TPG are calculated by TPG algorithm. Figure 7 and figure 8 show the ESCs of the system as a function with respect SNRs in cases of the one (i.e. the first stage) and two stages (i.e. the first stage) optimization respectively. It can be seen from the two figures that the ESCs grow as the SNRs increase. In the area of high signalto-noise ratio (SNR>30 dB), these two algorithms have a more apparent improvement effect in capacity. However, the capacity performances of both optimization schemes are more outstanding than the fixed power allocation scheme. Because power factor values in two-staged are both fixed with a 2 = min(0.01, 1/ρ B ) and b 2 = 0.2, while the dynamic optimal allocation of a 2 , b 2 can take variable transmit SNRs  of BS and relay into account, thereby bringing better capacity performance to the system. Moreover, since ESCs are bounded by SNR upper bound γ max corresponding to the maximum modulation and coding scheme in the practical scenario, the ESCs adopting joint two-staged optimization scheme are larger than that with only first-stage optimization scheme as shown in the convergence value comparison of dashed line.
Comparing figure 7 with figure 8, we can see clearly that the capacity gain with TPG is higher than TPF. In addition, when the transmission power of relay is flexibly allocated, the improvement is more obvious from D2>D1 to D1>D2 which under the same SNR. That because TPG optimizes two-stage power allocation jointly, though the BS power is greater than that of relay, the approximation of channel state between R-Edge and B-R can compensate the disparity of power between BS and relay after optimizing the relay power distribution.
Concretely, figure 9 details the gain radio of optimizing b 2 with fixed ρ R and ρ B , from which we obtain that the capacity gain brought by the optimizing b 2 accounts for an upward trend in the two-staged optimization. When SNR increases to 40 dB, the radio grows to 45.6%. Furthermore, when MCS levels are finite, the achievable capacity is bounded by the   SNR upper bound γ max corresponding to the maximum MCS level which is described by dash line in figure 7 and figure 8. In above two cases, when SNR=60 dB, TPG can achieve 216.7% and 190% rate gain respectively.

1) SIMULATION OF RECIVER DESIGN
In order to verify the performance advantages of the MCIC algorithm for the NOMA downlink receiver, we investigate and compare the influence of SINR (Signal-to-interferenceplus-noise ratio) on BLER with the three algorithms, the symbol-level SIC receiver based on the hard decision algorithm, the codeword-level SIC receiver based on the soft decision algorithm and the new MCIC algorithm by constructing a modem system in Simulink. In the results, the horizontal axis notes the SINR of the central user UE1 of the cell, and the vertical axis notes the block error rate (BLER) of the central user UE1. Then, changing user modulation mode and user power allocation, we investigate the impact of these two factors on the detection performance of the receiving end. The simulation parameters are based on the existing LTE/LTE-A specification [44] and modified for the experimental environment required in this article. The specific parameters are shown in Table 2. In terms of system operating delay, the NOMA downlink system including three user terminals with different modulation methods is analyzed as in figure 10. In this NOMA system, UE1 uses 64 QAM modulation while UE2 uses 16 QAM modulation, and the cell center users (UE1) use MCIC receivers. Compared with SIC receiver, the delay of the whole system is effectively reduced.
From the analysis of figure 10, the SINR/BLER performance of codeword-level SIC detection is closest to ideal SIC detection that theoretically eliminates interference completely. The performance of MCIC algorithm is second, and the performance advantage of symbol-level SIC detection is the least obvious.
Comparing figure 11(a) with figure 11(b), it can be seen that when cell users adopt higher-order modulation methods, the detection performance of the receiver decreases. Compared with the SIC receiver, the receiver based on the MCIC algorithm copes with the detection performance degradation caused by the change of the user modulation method is more robust. Comparing figure 11(b) with figure 11(c), we can see that the block error rate improves significantly after optimizing power allocation. Unlike the power distribution coefficient a 2 and b 2 in the FPA algorithm, which is estimated roughly based on channel conditions, TGP is accurately calculated through iteration. That means b 2 ≈ 0.21, a 2 ≈ 0.28 are closer to optimal allocation. Signal detection performance of the four algorithms of NOMA receivers has increased. Among them, the performance of receivers based on the MCIC algorithm has changed more significantly. The reason is that power allocation scheme will allocate more power to weak users on the basis of ensuring that strong users meet the detection conditions.

VI. CONCLUSION
In this paper, the power allocation scheme and downlink detection receiver in DC-NOMA scenario have been studied. Firstly, the optimization problem of ESC maximization in this scenario has been proved as a non-strictly and non-convex optimization problem. To solve that, TPG algorithm was proposed. Meanwhile, we analyzed its convergence complexity and globally optimal characteristic. Numerical results showed that the ESC of DC-NOMA using TPG performed better than using both FSO and FPA with or without limitation of MSC level. Then, in order to enhance the reliability in practical mobile communication systems, a NOMA downlink receiver design with acceptable complexity was proposed to reduce the work delay and BLER. Moreover, the feasibility of the design has been the analyzed and verified.
However, DC-NOMA cooperative communication with single antenna in the case of sufficient channel resources and relay users is only an ideal case. Considering tendency of using MIMO widely in the next-generation wireless networks, the power allocation optimization in DC-NOMA using MIMO will be a challenge. In addition, in parallel with the explosive growth of the Internet of Things (IoT) devices, the participation of MEC that are of storage, processing, and forwarding functions will provide new options for relay selection and resource allocation in DC-NOMA coexisting with cellular users. The related expanding researches are left for the future.