Over-the-Air Implementation of NOMA: New Experiments and Future Directions

Non-orthogonal multiple access (NOMA) is widely recognized to increase the number of users and enhance the spectral efﬁciency in ﬁfth-generation (5G) wireless networks and beyond. NOMA is still in the theoretical analysis and simulation phases and fewer experimental works are reported to date. In this paper, we design and implement NOMA in software-deﬁned radio, and evaluate its performance. This includes the real-time realization of the key components of NOMA, i.e., superposition and successive interference cancellation. The main novelty of this paper is to introduce constructing superimposed signals with varying symbol rates to enlarge the achievable rate region of the experimented NOMA. By applying varying symbol rates, the set of possible transmission rate pairs enlarges and we can reach higher data rates compared to existing modulation and coding schemes (MCS). We also propose an algorithm to efﬁciently ﬁnd the rate pairs. Simulations and experiments demonstrate that NOMA with a varying symbol rate not only can reach higher data rates than orthogonal schemes such as time division multiple access, but it can also outperform existing MCS-based methods which have a ﬁxed symbol rate. The experiments also show that there is a noticeable gap between NOMA in theory and practice. In addition to the new NOMA experiments, we review the state-of-the-art in experimental NOMA. We also discuss several directions for future experiments that can help bridge the gap between theory and practice and bring NOMA to practical communication systems.


I. INTRODUCTION
Non-orthogonal multiple access (NOMA) in the downlink, also known as power-domain NOMA, is a promising solution for increasing the number of users and enhancing the spectral efficiency of communication systems [1].Unlike orthogonal multiple access (OMA) techniques that serve only one user per time/frequency resource block, in theory, NOMA can serve an arbitrary number of users in each resource block.For this reason, NOMA has appeared as an appealing solution for increasing connectivity and improving the spectral efficiency in fifth-generation (5G) cellular networks and beyond [1]- [6].
NOMA exploits the difference of channel gains between multiplexed users.The key techniques for single-input and The associate editor coordinating the review of this manuscript and approving it for publication was Miguel López-Benítez .
single-output (SISO)-NOMA are superposition coding (SC) at the transmitter and successive interference cancellation (SIC) at the receiver.SIC is used at receivers with a stronger channel to decode and cancel interference from all weaker users before decoding its desired information.If K users are multiplexed to use one resource block, and the users are sorted based on their channel strength, ith user should apply interference cancellation i − 1 times to decode and remove the signal of all users who have a weaker channel gain.Each interference cancellation will incur some error and this error propagation will increase the bit error rate (BER).In this sense, two-user SISO-NOMA is the simplest case in which only one interference cancellation is required.
While NOMA has attracted significant attention for academic research, an important question to move forward is whether this method will enjoy the theoretically-promised gains over OMA in practice.Generally, although NOMA has been under investigation in academia, standardization bodies [7], and industry [8], the research is still mostly in the theoretical analysis and simulation phases [5], [6], [8].Attempts on the experimental validation and feasibility analysis of NOMA in real wireless environments remain limited.Some pioneering work can be found in [2], [9]- [11].There are still several open questions and challenges on practical implementation issues, which are the main interest of this article.Software-defined radio (SDR) [12] is considered a valuable tool to evaluate the performance of practical systems.One can flexibly reconfigure an SDR and implement communication modules in software instead of hardware.Universal software radio peripheral (USRP) developed by National Instruments (NI) [13] is a type of SDR that is commonly used by research labs and universities.
This article is meant to serve as a reference for experimental NOMA.We have used the NI USRP-2974 [13] with LabVIEW communication system design suite [14], [15] to set new experiments for the two-user SISO-NOMA to enhance spectral efficiency and shed more light on when and to what extent NOMA can improve spectral efficiency in practice.Besides, we have summarized existing literature on experimental NOMA and discussed challenges and future directions.
We have both simulations and experiments for over-the-air transmission of the two-user NOMA system.The contributions of this paper can be summarized as follows: 1.One novelty in the designed simulation and experiment is the introduction of varying symbol rate transmission to improve the achievable rates.By varying the symbol rate of each NOMA user, the solution space is enlarged compared to only applying the existing modulation and coding scheme (MCS).This, in turn, helps to achieve a higher rate region for NOMA users.In experiments, practical imperfectness such as synchronization, channel estimation, SIC, NOMA header transmission is included.2. It is well-known that NOMA reaches higher rates than OMA in theory.However, in this paper, we show this is not always the case in practice.Specifically, the experiment shows that NOMA outperforms OMA only in asymmetric channels.However, in symmetric channels where the channel gains of the two users are very close, the NOMA rate region could even be worse than that of OMA for various reasons, including SIC error and imperfect channel state information.3. We propose an algorithm to more efficiently find the rate pairs for implementation.The algorithm first finds OMA rate pairs for each user based on the code tables with varying symbol rates and then uses a bisection search to find the feasible NOMA rate pairs for both users.The algorithm is more time-saving compared to Monte Carlo simulations over all possible rate pairs.4. Unlike several existing results, e.g., [9], [16], where the modulation, coding rate, and/or power allocation information are assumed to be known by the receivers and sometimes a cable connection is used for stable power control, in our experiments, we transmit this information, wirelessly which is more practical.Moreover, we provide various details of the design which is not always reported in the literature, hoping to be useful for the reproduction of the results.This includes preamble and NOMA header which contains power allocation factor, adaptive MCS, symbol rate, etc. NOMA header is designed in detail and transmitted through the air.
The remainder of the paper is organized as follows.We divide the paper into three main parts.
1. Existing literature: We first review existing experimental results in Section II and discuss practical challenges in Section III.In Section IV, we describe the theory of NOMA.

New experiments development: The implementation of
NOMA is introduced in Section V, in which we discuss the details of our designed experiment and transmission protocols.We present the concept of varying symbol rates by simulation in Section VI and experimental results in Section VII. 3. Future directions and outlook: We summarize future directions in Section VIII, and we conclude the paper in Section IX.

II. PRIOR EXPERIMENTAL VALIDATIONS OF NOMA
The earliest work on the implementation of downlink twouser SISO-NOMA can be traced back to [9], where the authors show SC can provide significant spectral efficiency gain over time-division multiplexing.The experiment is designed using orthogonal frequency division multiplexing (OFDM) and the IEEE 802.11a parameters, and achievable rates are measured using the MCS [17].The USRP boards of the transmitter (Tx) and user equipments (UEs) are connected via cables.NOMA combined with multiple-input multiple-output (MIMO) has been investigated by Benjebbour et al. from NTT DOCOMO in [2], [18]- [20].The benefits of two-user NOMA over OMA with a two-transmit and two-receive antennas system (2 × 2) are evaluated in a system-level simulation and experimental trials in [2].In [19], [20], a setup of an outdoor experimental trial using NOMA with openloop 2 × 2 MIMO is introduced and closed-loop 4 × 2 MIMO with multiple users is investigated in [18], [19].The results show that a significant difference channel gain can increase the performance gain, and the user pairing and feedback in the closed-loop case are important for NOMA performance.
Xiong et al. [11] and Wei et al. [10], [21] implement a few SISO-NOMA testbeds.The offline and real-time experiments are considered.In the offline experiment, USRP is treated as the radio frequency (RF) terminal.The decoding and SIC are processed offline.In a real-time experiment, the time complexity of real-time SIC has been evaluated in which the SIC procedure requires around twice the time as direct decoding [10].Paper [11] includes upper-layer protocols in the SISO-NOMA system and the throughput of NOMA and OMA has been compared there.Later, the authors increased their testbed bandwidth from 5MHz to 10MHz and evaluated the throughput loss [21].The authors in [22] have exploited network coding in NOMA.The scheme is named network-coded multiple access, which jointly uses network coding and multi-user decoding to boost NOMA throughput.Paper [23] has realized NOMA over downlink shared channels according to the Third Generation Partnership Project (3GPP) standard.Specifically, low-density parity-check (LDPC) channel coding instead of Turbo coding is applied.The simulation and an offline experiment show the feasibility of NOMA but also indicates a high complexity implementation in SDR.An uplink NOMA implementation is presented in [24], where the preamble frame structure for each uplink user is designed.
Very recently, a multiple-input and single-output (MISO) NOMA system (2 × 1) with two and three users is considered in [25].Linear precoders are designed in PHY and medium access control (MAC) protocols, and a SIC scheme is proposed using reference signals for interference subtraction and signal detection, which can achieve higher weighted sum rates in NOMA networks compared with zero-forcing SIC.Paper [26] studies the performance of the quadrature phaseshift keying (QPSK) modulation over the downlink NOMA system and derives and validates the error probability on a USRP SDR platform.A four-user NOMA network is implemented in [27], [28], [32].The bit error behavior of binary phase-shift keying (BPSK) for NOMA has been analyzed and validated in the SDR platform [28].The BER loss is evaluated with and without SIC [32].Khorov et al. introduced the IEEE 802.11WiFi standards into NOMA [29]- [31], [35].Specifically, in [29], they have validated the WiFi network in a slave computer (e.g., field-programmable gate arrays (FPGA) board) rather than a master computer (CPU in the host), which is more efficient than offline experiments.In [30], modifications of the physical (PHY) layer frames and the medium access control (MAC) protocols for downlink transmissions are considered based on IEEE 802.11 standards.To decrease the influence of phase noise in WiFi NOMA systems, [31], [35] extend their NOMA WiFi prototype to enable constellation rotation.The channel estimation and channel quantization impacts on NOMA OFDM have been investigated in [33] with varying pilot-to-data ratios.It reflects that with long channel estimation pilots the BER performance can be better.A dedicated relay-cooperative NOMA implementation is explored in [34] to enhance the performance of the weak user.
Overall, the study in [9]- [11], [25] concludes that NOMA can improve system performance compared with OMA.Besides, papers [9], [10], [23], [32] also indicate that real-time implementation and SIC requires a high complexity.The summary of technical contributions to NOMA implementation is shown in Table 1.One main difference between the existing literature and our work is that we proposed and use a varying symbol rate in this paper.Besides, we design a detailed NOMA header and transmit it over the air.In [9], [16], the modulation, coding rate, and power allocation information are assumed to be known by the receiver, and a cable is connected for stable power control.In contrast, in this paper, we transmit the users' information wirelessly which is more practical.

III. PRACTICAL CHALLENGES
There are several implementation challenges in experimental NOMA.We mainly elaborate on synchronization and SIC in the following.

A. SYNCHRONIZATION
Synchronization is crucial in achieving reliable and robust communication system performance.In long-term evolution (LTE) [37], there are two downlink synchronization reference signals, a primary synchronization signal (PSS) for symbol timing and cell search, and a secondary synchronization signal (SSS) for frame timing.In IEEE 802.11 wireless local access network (WLAN) standards [17], all synchronization is realized within symbol sequences of legacy preambles, i.e., short training field (L-STF) for packet detection and carrier frequency offset (CFO) correction, and long training field (L-LTF) for fine synchronization and channel estimation [38].There are three major synchronization functions in PHY [39], as described below.
a: FRAME SYNCHRONIZATION Frame synchronization, also known as initial acquisition, is the first and coarse task through which a receiver (Rx) establishes a communication link with the Tx and calibrates its parameters.Employing an external clock module [26]- [28] can help coarsely synchronize multiple devices.

b: TIME SYNCHRONIZATION
This is needed to achieve a fine time and/or frequency synchronization between the Rx sampling clock and the Tx channel symbol clock.In [26]- [28], symbol timing synchronization is desired for a single-carrier system.In NOMA-OFDM systems [25], [34], time and frequency synchronization can be realized by the Schmidl-Cox algorithm [36].Briefly, it requires inserting a known preamble at the beginning of transmission, and receivers detect them periodicity via the correlation and accumulation.

c: CARRIER SYNCHRONIZATION
This concerns the estimation and compensation of the carrier frequency and phase differences between the Tx and Rx oscillators.The CFO synchronization is necessary for uplink NOMA [24] since CFO leads to multiple access interference.The users use the auto-correlation of the PSS in LTE to perform synchronization [24], [40].Table 2 shows the synchronization types and technologies employed in NOMA papers.

B. SUCCESSIVE INTERFERENCE CANCELLATION d: IMPERFECT SYNCHRONIZATION AND CHANNEL ESTIMATION
Timing mismatch or imperfect channel estimation will result in residual cancellation errors [41], and finally imperfect SIC.The synchronization errors may come from pulse-shaping and the alignment of symbols.On the other hand, if the strong user's channel state information (CSI) is not estimated perfectly, then the weak user's signal cannot be reconstructed correctly from the superimposed signal.

e: SIC ERROR PROPAGATION
Once decoding error occurs to a user, it is quite likely that all subsequent users in the SIC decoding order will be decoded improperly as well.There are several ways to lower the error propagation.If the number of users is small enough, this effect can be mitigated by utilizing stronger codes or longer blocks [41].The choice of different SIC schemes can also affect the performance of BER and achievable rate in practice, for example, codeword-level SIC can reduce BER compared with symbol-level SIC [42].The main difference between the two methods is that channel decoding and re-encoding are included in codeword-level SIC while not in symbol-level SIC.

IV. NOMA: FROM THEORY TO PRACTICE
We summarize the system model achieving the capacity of the SISO-NOMA using SC and SIC.To realize the superimposed signal, signal reconstruction, and interference cancellation in NOMA receivers, we build the NOMA modules in LabVIEW communication suites system based on Fig. 1.

A. SUPERPOSITION CODING
As illustrated in Fig. 1, the Tx generates two independent random data streams for UE1 and UE2 following a Bernoulli distribution with p = 1 2 .The data of two paired UEs are separately encoded, modulated into complex-valued symbols using modulation types such as BPSK, QPSK, 16QAM, and filtered.In general, UE1 upsamples with k 1 samples per symbol and applies pulse shaping filter p 1 (m), whereas UE2 upsamples with k 2 samples per symbol and applies pulse shaping filter p 2 (m), in which m denotes the input index.Denote the signal sequence for each user per channel use as s i , i = {1, 2}.Then, a fraction α ∈ [0, 1] of the power P is assigned to UE1, and the rest of the power is allocated to UE2.The signal of two users with different power are superposed in the power domain, which can be constructed as The superimposed signal is then sent out through USRP with radio frequency (RF) 3GHz.To elaborate on how SC is performed, schematic diagrams are given in Fig. 2. When α is small (Fig. 2(a) to Fig. 2(c)), the QPSK constellation of UE1 with smaller power is superposed on that of UE2 with a higher power.The super-constellation of the superimposed signal performs as a 16 quadrature amplitude modulation (QAM).When α is large (Fig. 2(d) to Fig. 2(f)), UE1 chooses QPSK, and UE2 chooses BPSK, the super-constellation performs as an 8PSK constellation with possible constellation points overlapping.

B. SUCCESSIVE INTERFERENCE CANCELLATION
Denote the complex channel gains for UE1 and UE2 as h 1 and h 2 , and assume, without loss of generality, The received signal at each user is: where n i ∼ CN (0, N 0 ), i = 1, 2 denotes the additive white Gaussian noise (AWGN) at each user.UE1 has a stronger channel to Tx, and hence can always decode the information which can be decoded at UE2.The key steps are [4]: • The weaker UE (UE2) estimates its channel and decodes its data from y 2 by treating s 1 as Gaussian noise.Let ĥ2 be UE2's estimated channel.The recovered signal of UE2 is The signal then is filtered using the matched filter p 2 (−m), downsampled with k 2 , and demodulated using modulation type 2 to recover UE2's data streams.
• At the stronger UE (UE1), four steps are required to recover its message from the received signal y 1 : -Estimate CSI ( ĥ1 ).
-Filter, decode, and demodulate the signal of UE2 from y 1 , then reconstruct UE2's signal using modulation type 2 and k 2 to obtain s2 .-Subtract ĥ1 √ 1 − αs 2 from the received signal y 1 : -Decode and demodulate the signal of UE1 from ŷ1 and obtain the estimated signal ŝ1 .The Tx needs to share the knowledge of the power allocation efficient α and modulation types with both users so that they can realize SIC and decode their signals.In the example of Fig. 2(a) to Fig. 2(c), UE1-with a stronger channel-can decode the QPSK for UE2 first, and then subtracts it from the superimposed signal to obtain its QPSK.UE2-with a weaker channel-treats UE1's signal as noise and decodes its message by considering the superimposed 16QAM as a QPSK.In the example of Fig. 2(f), the 8QAM super-constellation is cross overlapped.To successfully decode and satisfy BER requirements, one approach is to avoid constellation cross overlapping and erroneous decision and assign suitable power allocation [43], [44].
In practice, although channel coding can be used to ensure that signals are decoded correctly, imperfect synchronization may cause an imperfect SIC at UE1 and will degrade the performance of NOMA.A more detailed analysis of imperfectness has been discussed in Section III.

C. ACHIEVING RATE PAIRS
The achievable data rates for UE1 and UE2 in NOMA are given by [41] in which P represent the total power.The signal-tointerference-plus-noise ratio (SINR) of UE2 in (5b) is given by: The capacity region of the NOMA channel is obtained by varying α from 0 to 1 to get all possible transmission rate pairs (r 1 , r 2 ).It is worth noting that when all power is allocated to UE1 (α = 1), the channel reduces to the point-to-point (P2P) transmission between the Tx and UE1, which implies an OMA scheme since only UE1 is served and r 2 = 0.When α = 0, only UE2 is active in the network and r 1 = 0.In the OMA case, assume time division multiple access (TDMA), the achievable rate region for UE1 and UE2 is written by [45], [46] in which a fraction τ ∈ [0, 1] of time is dedicated to UE1, and the remaining fraction 1 − τ of the time is dedicated to UE2.With NOMA, the entire time or bandwidth is simultaneously shared by two users.However, with OMA, UE1 uses part of time or bandwidth and the remaining resource is assigned to UE2.

V. NOMA WITH VARYING SYMBOL RATES A. PRACTICAL NOMA USING AMC
Adaptive modulation and coding (AMC) is used to adjust the modulation types and coding rates based on the channel conditions.The algorithm to select the optimum MCS is a key for AMC [47].When the channel conditions are poor, a smaller modulation method and coding rate are used; when the channel conditions are good, a larger modulation method is favored to maximize the transmission rate.Table 3 shows part of the MCS parameters used in IEEE 802.11ac [17], [48].M ∈ {2, 4, 16, . . .} denotes the order of states corresponding to the modulation types, BPSK, QPSK, and 16QAM, etc. ρ is the coding rate, and the coding efficiency η = ρ log 2 M represents the useful data bits each symbol can carry.
For the OMA case where only one user is being communicated to, the optimal MCS can be selected by comparing the instantaneous signal-to-noise ratio (SNR) to the range of SNR values [47].In NOMA, MCS adaption for users should be done simultaneously within targeted BERs.Besides, employing MCS adaption based on Table 3 should also consider power allocation and user scheduling [3].Besides, employing MCS adaption based on Table 3 should also consider power allocation and user scheduling [3].There are some solutions for employing MCS selection.In [10], a fixed MCS combination (UE1: QPSK, 16QAM, and 64QAM; UE2: QPSK) is selected and the targeted block error rate is satisfied by varying the SNR and power allocation.In [11], [21], suitable MCS schemes of LTE (UE1: 64QAM; UE2: QPSK) are roughly chosen and applied manually according to the average throughput for each UE.In [27], the 4QAM constellation is applied for all users with different channel gains.In [9], [16], MCS adaption is employed based on the SNR for initial reference, and then adjust the power allocation to support targeted packet error rates.
We employ MCS adaption for OMA cases while maintaining the BER below a threshold.In NOMA scenarios, we use the selected MCS in OMA first.Then, we change the MCS to assure the targeted BERs for both users.Table 3 shows part of the MCS parameters used in IEEE 802.11ac.Convolutional coding can be applied to reduce the bit error probability for a noisy channel [49], [50].However, the system will suffer a remarkable power penalty, coding gain, and rate reduction [51].Our main attention is to reach as high achievable rate as possible for the NOMA system.Thus, to increase the transmission rate within acceptable BER, we choose to switch between enabling and disabling the convolutional coding.
The MCS provides the code tables to determine a set of largest simultaneously achievable rates at each user.Thus, we can traverse the library to find all possible rate pairs and keep the BER within a practical reliability regime ( 10 −3 ) [44], [52], [53].

B. SYMBOL RATE AND RATE TABLE
In this section, we design three finite libraries of finite-blocklength codes based on finite constellations.The main novelty in building these code libraries, also called code tables, is the introduction of varying symbol rates.These VOLUME 9, 2021 libraries are named C 1 , C 2 , and C 3 which include various possible combinations of the transmission parameters.In the following, we describe their construction and differences in detail.
Each code table is built using three independent parameters M , ρ, and k, where M denotes the modulation order, ρ is the channel coding rate (convolutional coding), and k is the samples per symbol, or oversampling factor-see Table 4 for an example of C 1 .The last column of each code table is the spectral efficiency r [bps/Hz] evaluated by in which B is the assigned bandwidth [Hz] and variable R s is the symbol rate [symbols/s].The M-QAM constellations have Nyquist data pulses B = R s,max ≥ R s [54].In digital communication, the sampling frequency f s is provided at UEs.We have R s = f s k and R s,max results in the minimum samples per symbol k min .Thus, the spectral efficiency can be rewritten as We should indicate that we will use r for measured spectral efficiency.Further, all parameters may be used with subscript i, i ∈ {1, 2}, referring to the UE index.For example, r1 = for UE1.The difference between code tables C 1 , C 2 , and C 3 is that in C 1 and C 2 one parameter is fixed whereas in C 3 all parameters are varied.Specifically, in C 1 ρ is fixed to 1-no channel coding-and in C 2 k is fixed to 10.
An example of code table  pairs, i.e., r1 versus r2 , has the size of 32 × 32 combinations in total, shown in Fig. 3.
The code table C 2 , in which the samples per symbol is fixed whereas the coding rate varies, is built for the sake of comparison with [9], [25].Specifically, two modulation mappers M = {2, 4} and four coding rates ρ = { 1 2 , 2 3 , 3 4 , 5 6 } are considered.Then all 8 possible pairings of MCS are . The code table example can be found in Table 5.The cardinality of C 2 is 8, i.e., there are eight possible achievable rate pairs for one user.Then, the solution space of the rate pairs is 8 × 8 combinations in total, as is shown in Fig. 4 for k 1 = k 2 = 10.Hence, a fixed sample per symbol may reduce the solution space.Thus, in one frequency carrier, varying the symbol rate can enlarge the solution space.• C 2 : Fixed symbol rate with varying coding rate.This code table is the existing MCS-based method commonly used in many works [9], [11].
• C 3 (proposed): Varying symbol rate and varying coding rate; We combine the two methods to enlarge the solution space.
Given an α and any code table with finite feasible rate pairs, the problem of finding the set of largest simultaneously achievable rates becomes a search over the feasible rate space.Thus, the problem can be written as following optimization [9]: where BER i is the BER for UEi.This problem will give the rates for both UEs, and the optimization order, i.e., swapping indices 1 and 2, will not affect the achievable rate region.
The throughput Ri in megabits per second [Mbps] and the spectral efficiency ri for user i have the following relation Ri = ri B. (11) Usually, coding efficiency η i = ρ i log 2 M i for UEi is adjusted based on MCS.Thus, if we can adjust k i , which is equivalent to vary the symbol rate R s , the solution space of the achievable rates can be enlarged.
It is worth mentioning that employing varying symbol rates does not mean AMC cannot be applied.On the contrary, the varying symbol rate scheme can help real-world AMCbased NOMA to reach a required throughput.The role of AMC is to provide a reasonable MCS corresponding to the channel gain.

C. THE ALGORITHM FOR THE RATE REGION
Given the SNR of each user, one can solve (10) by an exhaustive search over C 3 , i.e., by checking the feasibility of every candidate rate pair in C 3 .However, traversing the entire code table is time-consuming.For this reason, we first find OMA points in Algorithm 1 and then use it to find NOMA rate pairs in Algorithm 2. 1 Algorithm 1 is to find the transmission rates of UE1 and UE2 achieved by OMA.First, the threshold in (10c), modulation types, coding rates, and samples per symbol k can be set based on the requirement.Second, we decreasingly sort the code table based on the last column r.This step is a preparation for a bisection search.For each user i, using a bisection search over the sorted C 3 , we search for the l * i th combination that satisfies the BER constraint, and record its corresponding combination of {M , ρ, k, r}.The order of UEs does not affect the results.For simplicity, we set UE1 first in Line 4. The outputs are the indices of rates r for UE1 and UE2 in the decreasingly sorted C 3 .Algorithm 2 starts based on the knowledge of OMA points obtained in Algorithm 1.We traverse the rest of the code table for UE2.This is because NOMA points on the rate region cannot exceed OMA boundary points.For each iteration, 1 In the downlink NOMA, stronger users need to apply SIC.Thus, the algorithm complexity rises with the number of users.A practical way to limit complexity is to divide the users into groups.A Tx can exploit the spatial deployment to pair the users located close to the Tx and those near the cell edge.Within each group, SC and SIC are performed between a few users with disparate channels.Set the transmission parameters of UE2 same as those in the l 2 th row in decreasingly sorted C 3 ;

Algorithm 1
7: Bisection search for maximum α * ∈ [0, α] that ensures the BER 2 ≤ ; the parameters of UE2 is first set according to row l 2 in the sorted C 3 .Then in Line 7, the power assigned to UE2 is expected to be less with the constraint BER 2 , thus, UE1 can get more power (a higher α) to reach a higher rate.Next in Line 8, a bisection search is used to find the higher rate for UE1 that just satisfies BER 1 and record the l * 1 th combination.Note that in the decreasingly sorted code table a smaller index l 1 means a higher rate.The method for NOMA utilizes the test results of OMA and reduces the searching space and time compared with the Monte Carlo simulation which traverses possible combinations in code tables [9].

D. FRAME STRUCTURE
Transmission occurs in frames.The frame structure includes a preamble, header, and payloads as shown in Fig. 5.The preamble part is designed for data synchronization and channel equalization.It is a bit sequence that receivers start to locate the transmission.The header provides information about the packet configuration, such as MCS schemes, symbol rates, and power allocation coefficient.Finally, the payloads contain the data streams of UE1 and UE2.

1) SYNCHRONIZATION
Synchronization is one of the key procedures in practical communications.Accurate control of timing and the ability to synchronize operations will help improve the performance of SIC.Time synchronization is needed for symbol detection.We apply a predetermined threshold for rising edge detection.The decision of symbol timing is made according to the threshold calculated by the statistical properties of the noise.
We use trigger operation in the USRP transceiver to successfully collect pieces of complete and finite packets.Transmission in Tx is triggered once the host sends a ''start trigger'' command after running the code.In the two UE chains, the transceiver chooses the same trigger time as the Tx chain for timing.To make the transmission more practical, an idle gap is appended at the beginning of each transmission.The duration of the idle gap is unknown to the UEs.This simulates a practical scenario that UEs need to synchronize and detect the start of signals.Besides, due to hardware response delay, trigger operation alone cannot perform perfect frame synchronization.The threshold detection can be robust to detect the signal with delay and idle gap.

2) CHANNEL EQUALIZATION
The goal of equalization is to correct the distortion incurred by the channel and mitigate the effects of inter-symbol interference (ISI).We apply the zero-forcing (ZF) equalizer.Specifically, the ZF equalizer is designed as the inverse of the frequency response of the channel [55].In the system design, the Tx sends the reference signals for channel estimation.Upon the reception of the pilots, the users perform channel equalization and calibrate the channel distortion.

3) NOMA HEADER DESIGN
We design the PHY layer header with the information that the proposed NOMA system requires.Specifically, in Fig. 5, k 1 and k 2 are 7-bit float binary numbers representing even decimal numbers in the range of 10 to 264. 2 The modulation types for OMA and the combined modulation types for NOMA include 15 possible cases, thus 4 bits are needed.Three coding rates ρ = {1, 1  2 , 3 4 } for each user are considered, where 1 means disabling the convolutional coding.Doing SIC is an on-off switch requiring one bit, ''0'' means to perform SIC, and ''1'' means to treat interference as noise.Finally, the power allocation factor α supports four decimal digits which requires at least 13 bits.Based on IEEE 802.11b packet format [38], the modulation type for preamble and header is BPSK.As a low-rate modulation, BPSK can assure the important information is transmitted correctly.
Besides, for implementation simplification, in [9], [16], it is assumed that the receivers know the coding and power allocation information, which are not practical assumptions.In our experiment, we transmit NOMA parameters over the air.

VI. SIMULATION RESULTS AND ANALYSIS
Before we move to the implementation results, we perform simulation without imperfectness, i.e., we assume synchronization is perfect, and channel and NOMA header are known.The goal is to validate the varying symbol rate method on MATLAB R2020a.In this simulation, we include three scenarios: • Varying symbol rate, which results in varying samples per symbol k i = {8, 10, . . ., 24}, and ρ = 1.
• Fixed symbol rate (k i = 8 for each user) and varying coding rate ρ = { The results are shown in Fig. 6, where γ i as the estimated SNR at UEi which is measured by in which N s is the number of samples for UEi, P i is the received signal power at UEi, P N 0 is the noise power at antennas of UEs, and n is the number of noise samples (the noise power will be measured separately when there is no signal).For all cases, the SNRs are set as γ 1 = 17.7dB and γ 2 = 5.7dB.For a fair comparison with the second case in [9], where the unit is bits per second per Hz (bps/Hz) with packet error rate (PER), 3 we set 10 −4 .The feasible rate pairs are limited within .The red square points denote the feasible rate pairs in C 1 with varying R s .The red dash line is the optimal rate-pairs obtained by solving (10).The rate region boundary is a convex hull of the optimal set along with the single-user OMA points.The fixed symbol rate cases with MCS are denoted by blue color.The yellow curve denotes the rate region obtained from C 3 .For UE1 OMA rate point D 1 achieved by varying R s is larger than point C 2 achieved by fixed R s .This is because UE1 has a better channel and thus it can reach the system limit, i.e., ρ = 1 for 16QAM, and R s = 1.8 × 10 6 with k 1 = 8.Point C 2 is reached by setting ρ = 5/6 for 16QAM.On the other hand, point A 2 (ρ = 2/3 for 16QAM, and R s = 1.8 × 10 6 with k 1 = 8) is greater than point A 1 (ρ = 1 for 16QAM, and R s = 7.14 × 10 5 with k 1 = 14).The reason is that UE2 suffers from a poor channel thus channel coding is required to reach a high rate while satisfying BER < .In the achievable NOMA rate pairs, points B 1 and C 1 span a larger rate region (A 1 →B 1 →C 1 →D 1 ) compared with the fixed R s (A 2 →B 2 →C 2 ).The combination of the two code tables, or say, the existing MCS with varying symbol rates, can further enlarge the rate region (A 2 →B 1 →B 2 →C 1 →D 1 ) for NOMA compared with applying only MCS [9].Table 7 shows the detailed configuration of the NOMA rate pairs.An observation from Fig. 6 is that the rate pairs below the OMA rate region do not have to be searched.Thus, we can apply a Monte Carlo simulation via Algorithm 1 to first find he OMA points to reduce the search space by removing the rate pairs bellow the OMA rate region.The operation time is listed in Table 8.The Monte Carlo simulation realized in [9] takes about 4 hours.Reproducing their main work (without OFDM) taking about one and a half hours in our simulations.We apply bisection search over C 2 which reduces the operation time from hours to several minutes.Besides, using C 3 takes more time since the number of feasible rate pairs is increased.We ran simulations on a Windows 10 workstation with CPU 3.2GHz and with 16GB RAM, compared with dual-core Linux running at 2.4GHz with 2GB RAM in [9].

VII. EXPERIMENTAL RESULTS AND ANALYSIS A. EXPERIMENT SETUP 1) THE TESTBED
We experiment the two-user SISO system over one SDR transceiver USRP 2974 and a Window 10 system host whose central processing unit (CPU) model is AMD Ryzen 3 2300X.We develop NOMA over a USRP 2974 single-device streaming sample project software [13] on LabVIEW communications system design suite [14], which is supported by the LabVIEW NXG environment.The USRP 2974 transceiver supports the inner loop mode allowing signal transmission and reception.The host and USRP device should be in the same wireless local area network (WLAN).We connect USRP and the router with one Ethernet cable.USRP 2974 can support a frequency range from 10 MHz to 6 GHz.In the experiment, the carrier frequency is set as 3GHz.Based on the latest release of 3GPP, we are sitting on the frequency range 1 (FR1) of 5G new RF bands, which is from 410 MHz to 7125 MHz [56].
The experimental environment and the space map of our test scenarios are shown in Fig. 7. Regarding the environments, the Tx is placed in an indoor environment with a height of 23cm.UE1 and UE2 are placed at distances d 1 and d 2 from the Tx with heights h 1 and h 2 . 4The fundamental experiment parameters and configurations for Tx and UEs are denoted in Table 9.

2) POWER CONTROL
We adjust the SNR for each user according to antenna direction and locations.The work in [9] has emulated a perfect power-controlled environment by connecting the Tx and UEs with coaxial cables to mitigate the propagation loss caused by physical environment variation.In this paper, the experiment is carried out in an indoor wireless environment within a stable temperature.

B. EXPERIMENT SCENARIOS
We consider three scenarios in NOMA, which are The measured SNRs for user 1 and user 2 are γ 1 = 17.7dB and γ 2 = 5.7dB according to (12).The distances and heights of each user in Fig. 7(a) for the SNR configuration is shown in Table 10  The encoded signals with power allocation of UE1 (blue circle line) and UE2 (purple dash line) are shown in Fig. 8 and Fig. 9.The superimposed signal filtered by root raised cosine filter with roll off factor β = 0.5 is denoted as the green solid line.At UE1, the received in-phase (I) link signal after equalization with power allocation coefficient α = 0.0301 is shown in Fig. 8 in yellow circle line (only 500 points are plotted).We demodulate and decode from the received signal at UE1 ( yellow circle line in Fig. 8), and then reconstruct the signal for UE2 with the knowledge of UE2's modulation type and power allocation coefficient.The value of α affects the amplitude of the reconstructed signal.The red line represents the reconstructed signal of UE2 at UE1.Finally, the signal of UE1 ( yellow solid line) is obtained by subtracting the UE2's signal from the received signal at UE1, i.e., the yellow circle line minus the red line.Figure 9 shows the received signal  Figure 10(a) shows the rate pairs.The OMA rate region (yellow line) connecting the two marginal points is obtained by time-sharing.The circle points represent the measured rate pairs obtained by solving (10).To obtain the rate region boundary, we find the convex hull of the set including single-user OMA points to have a convex polygon that is in a solid blue line.
To obtain the achievable rate regions in Fig. 10(a), we can adjust the power allocation coefficient.When α = 1, the P2P communication of UE1 is realized, which is also the point (R 1 , 0) in the rate region boundary for OMA and an extreme case of NOMA.To reach this point, the modulation type is set as 16QAM (M = 16), and the minimum k s is measured at 22 while keeping the threshold in (10) of BER less than 0.1%.Thus, the rate obtained from ( 11) is R1 = log 2 (M 1 ) f s k 1 = log 2 16 • 10•10 6 22 = 1.82 Mbps.For the rate pairs (R 1 , R 2 ) achieved by SC and SIC, the modulation types and symbol rates for both users can be measured by Algorithm 2. For example, when α = 0.0301, UE1 and UE2 are both QPSK, i.e., M 1 = M 2 = 4, and the samples per symbol for UE1 k 1 is 18 and k 2 is 16.The decoded QPSK constellation for each user is shown in Fig. 11(b).Other values of α reaching the boundary in Fig. 10    4 can support a higher symbol rate, which can compensate for the cost of coding rate reduction in this experiment.In fact, without convolutional coding, the minimum k 1 for an acceptable BER is 28, whereas with ρ = 3  4 , k 1 and k 2 are 18 for both users, the transmission rate enabling convolutional coding with fixed symbol rate can be higher than that varying symbol rate.In the experiment, BER for point A is = 0, while BER for point B is ∈ [0%, 0.18%].We show the decoded constellation for each user in Fig. 12. Specifically, the two P2P points for UE1 and UE2 are shown in Fig. 12  constellation possibly due to the start-oscillation.The four clusters of QPSK get closer and blurrier with a higher rate pair (0.83, 1.11) compared with that of Fig. 12(b) where a lower rate pair (0.71, 1.11) is tested.This is reasonable because, with a lower rate, the decoding performance is better, thus, the constellation in Fig. 12(b) is more distinguishable than Fig. 12(c).
Example 3 (γ 1 ≈ γ 2 ): When γ 1 = γ 2 , the channel is called symmetric.The capacity region for OMA and NOMA should be identical [57].In this experiment, we set γ 1 = 8.8dB and γ 2 = 7.3dB.From the results in Fig. 10(c) and Fig. 13, it is hard to find a NOMA rate pair lying on the rate region.In other words, the point lying on the rate region in practice will have an unacceptable BER.
In summary, when γ 1 is much higher than γ 2 in Example 1, SIC can be implemented successfully with a low BER.NOMA points can be easily found and can outperform OMA rates.We also find that in some cases, such as Example 2, MCS along with convolutional coding is very useful to reduce the BER and at the same time maintain a high rate.When the channel gains become closer, the NOMA rate region could be even lower than that of OMA, which is the reason why

C. GAP BETWEEN PRACTICE AND THEORY
In this part, we discuss the gap between simulation, experiments, and capacity.To illustrate this, let γ 1 = 9.5dB and γ 2 = 4dB.The theoretical SNR is defined as where N 0 is the power spectral density of the noise.The noise power in the measured γ 1 and γ 2 is over the whole bandwidth W = f s .The relation between γ th i and measured γ i given in ( 12) is The experimental rates are (11) and (10) while the theoretical capacity region is obtained in (5).Similar to Figs. 10(a)-10(c), in Fig. 14, the yellow line shows the OMA rate region obtained by time-sharing the experimental results.The blue circle points denote the measured rate pairs obtained by Algorithm 2. The solid blue line represents the convex hull of all measured rate pairs.The green line is achieved using code table C 3 in Section VI with k min = 10.Each green circle denotes a rate pair obtained by simulation.The simulation is the ideal case of implementation in which practical issues such as synchronization error, channel estimation errors, SIC error are not there.Those issues and hardware limits reduce the transmission rate in practice.
The theoretical capacity is shown by a red line.Points A 1 , B 1 , C 1 , D 1 , E 1 , and F 1 on the boundary of the NOMA capacity region are obtained for α at 0, 0.05, 0.25, 0.5, 0.8, and 1, correspondingly.The experimental values of α at corner points A 2 , B 2 , C 2 , and D 2 are 0, 0.1081, 0.1918, and 1, respectively.In theory, to achieve the capacity, the power allocation α could be any number between 0 and 1 [41, Chapter 6.2.2, pp.279], i.e., the power allocated to the weak user (UE2) can be higher than, equal to, or less than that of the other user.That is, |h 1 | > |h 2 | does not imply α < 1/2 and it is not necessary to allocate less power to the strong user [45, Myth 1].In a practical communication system where a specific modulation type is used, the power allocation may have certain constraints to avoid constellation overlap and successful decoding [43], [44].
The gap between practical experiments and theoretical capacity is still noticeable.Although AMC is well-developed and favored as a baseline scheme in practice, it is suboptimal and cannot achieve the Shannon capacity.As seen in [25, Fig. 3] and [58, Fig. 5], there is a significant gap between Shannon capacity and the data rate achieved by the MCS-based approach.Specifically, Shannon capacity [59] is based on the assumption that a code table with infinite length and the elements therein drawn from a Gaussian distribution while the inputs of MCS are discrete and have a finite length.Therefore, the system performance will not achieve capacity with a near-zero BER.

VIII. FUTURE DIRECTIONS
We describe some important practical issues as well as the future directions for NOMA experiments in this section.
Several experiments need to be carried out before NOMA being accepted in wireless communications standards.We discuss some future directions here.

A. MIMO-NOMA
As listed in Table 1, SISO-NOMA has been experimented with by various groups whereas the number of experiments on MIMO-NOMA is limited.More importantly, those results are using SC-SIC for MIMO-NONA while SC-SIC is not optimal in multi-antenna BC due to the non-degradedness of BC [45], [60].It is well-known in information theory that dirty paper coding (DPC) is the capacity-achieving scheme for the multiple antenna BC [45], [60]- [62].Due to the high VOLUME 9, 2021 computational cost of DPC, linear precoding is used for the transmit signal design [63].Linear precoding can achieve the same region as DPC achieves.MIMO-NOMA [19], [20] and MISO-NOMA [25] require precoding and power allocation to achieve spatial multiplexing, thus, the precoders are sent in the preamble before the data transmission.Low-complexity linear precoding schemes to reach the capacity, such as iterative water-filling [64], simultaneous triangularization [65] can be adjusted to practical constraints and applied in future practical experiments.

B. MULTI-CELL NOMA
All existing experimental results are based on one Tx, i.e., a single-cell network.Todays' cellular networks are multi-cell, however.In a multi-cell network, inter-cell interference (ICI) is a major issue that affects the performance of cell-edge users.A classical method to avoid interference at receivers is to orthogonalize the transmissions of different users, which is not spectral efficient.In single antenna networks, Han-Kobayashi (HK) encoding the best-known scheme [66].HK is based on decoding part of interference and treating part of that as noise.In MIMO systems, interference alignment (IA) is a cooperative transmission technique to remove the ICI in which the Txs jointly optimize their beamforming vectors.IA validation via testbed measurements can be found in [67]- [72] and the references therein.
The above IA implementations are based on OMA.The combination of IA with NOMA increases the number of users and spectral efficiency in multi-cell systems [73], [74], and is worth implementation trials if NOMA is to welcome to practical networks.To perform multi-cell NOMA, normally, global CSI is required to be perfectly known at all terminals.However, acquiring such channel knowledge is a challenging problem in practice.Blind interference alignment techniques are important to this end [75].

C. SYSTEM-LEVEL IMPLEMENTATION
A system-level implementation refers to the large-scale deployment of NOMA with multiple Txs and Rxs in multicells.In the large-scale NOMA, two key issues may affect the performance.First, the high inter-user-interference, which is the interference caused by other NOMA users sharing the same resource [76].Second, the high hardware complexity requirements of SIC scales significantly with the number of NOMA users [73], [76].To solve these issues, coordinated and joint transmission NOMA schemes pair the cell-edge users and cell-center users as a NOMA group, and then reduce the inter-user-interference and enhance the diversity [76].In an implementation, synchronization should be considered and, PSS and SSS will be applied for cell search.

D. SIGNAL PROCESSING USING MACHINE LEARNING
Machine learning (ML) and deep learning are becoming a ubiquitous solution in wireless communications, e.g., in blind detection of modulation orders [77], modulation classification [78], [79], real-time interference cancellation [80], precoding design in MIMO networks [81], and so on.ML could be used in real-time SDR-based experiments [80] or for offline processing.Real-time application in the context of NOMA could be SIC and synchronization.Possible applications of ML in offline processing could be in symbol detection and BER reduction.ML also can be used to design a fast and efficient precoder [81] in MIMO-NOMA experiments.This is a processing field and is expected to flourish in near future.

IX. CONCLUSION
We have developed a NOMA wireless transmission system on LabVIEW NXG and USRP.We implemented the key NOMA methods, SC at the Tx and SIC at the strong user side while the other user treats interference as noise.Our designed system can support a varying transmission symbol rate with and without convolutional coding.By varying the symbol rate of each NOMA user, the code table or solution space based on existing MCS can be enlarged which, in turn, can provide additional rate-achieving schemes in terms of modulation types, coding rate, and symbol rates.This variation can help NOMA users to achieve a higher rate region within acceptable BERs.We propose two algorithms to efficiently find the achievable rate pairs for OMA and NOMA.The algorithm is time-saving compared with Monte Carlo simulation over all possible rate pairs combinations.We have also provided a detailed design for the NOMA protocol including preamble and NOMA headers.The results verify that NOMA can achieve a better rate region than OMA in asymmetric channels, but there still exists space for practical NOMA implemented by AMC to further bridge the gap between the theory and experiment.Finally, we have discussed implementation challenges and some possible future directions.

FIGURE 1 .
FIGURE 1. Two-user SISO-NOMA system configuration used in this paper.

FIGURE 3 .
FIGURE 3. Achievable rate pairs using different symbol rates, coding rate, and modulation combinations in code table C 1 .

FIGURE 5 .
FIGURE 5.The proposed frame structure for NOMA transmission.

FIGURE
FIGURE Transmitted and received signals at UE1 before and after SIC with α = 0.0301 and γ 1 = 17.7dB.
(a)  are shown in Table10.The constellation for the boundary points is shown in Fig.11(b), in which the red points are the samples, and the blue points are the decoded symbols.

FIGURE 11 .
FIGURE 11.Constellation of Example 1. Example 2 (γ 1 > γ 2 ): In this case, γ 1 = 18.4dB and γ 2 = 10.5dB.The configuration of each user is shown in Table 10-Example 2. Figure 10(b) shows the rate pairs.The rate region is getting close to the OMA scheme because the difference between channel gains has reduced.The measurement of the blue points in Fig. 10(b) is very similar to that in Example 1. Two points in Fig. 10(b) are very interesting, point A (ρ = 1) and point B (ρ = 3 4 ) with the same α = 0.1 and modulation type.The point B with ρ = 34 can support a higher symbol rate, which can compensate for the cost of coding rate reduction in this experiment.In fact, without convolutional coding, the minimum k 1 for an acceptable BER is 28, whereas with ρ =3  4 , k 1 and k 2 are 18 for both users, the transmission rate enabling convolutional coding with fixed symbol rate can be higher than that varying symbol rate.In the experiment, BER for point A is = 0, while BER for point B is ∈ [0%, 0.18%].We show the decoded constellation for each user in Fig.12.Specifically, the two P2P points for UE1 and UE2 are shown in Fig.12(a) and Fig. 12(b), respectively.The constellation for the boundary point α = 0.1, ρ = 3 4 is shown in Fig. 12(c), in which a set of red samples are jumping away from the

Figure 10 (
b) shows the rate pairs.The rate region is getting close to the OMA scheme because the difference between channel gains has reduced.The measurement of the blue points in Fig.10(b) is very similar to that in Example 1. Two points in Fig. 10(b) are very interesting, point A (ρ = 1) and point B (ρ = 3 4 ) with the same α = 0.1 and modulation type.The point B with ρ = 3 (a) and Fig. 12(b), respectively.The constellation for the boundary point α = 0.1, ρ = 3 4 is shown in Fig. 12(c), in which a set of red samples are jumping away from the

FIGURE 13 .
FIGURE 13.Constellation of Example 3. many papers have the assumption |h 1 | 2 > |h 2 | 2 to ensure a successfully encoding and decoding in NOMA.

FIGURE 14 .
FIGURE 14.A comparison among experimental, simulation-based, and theoretical achievable regions.

TABLE 1 .
Summary of existing NOMA implementation.

TABLE 5 .
An example of code table C 2 .Lastly, by combining the varying symbol rate method and MCS with the fixed symbol rate method we create a larger rate table C 3 in which the search is over ρ, M , and k for both users.The cardinality of this code table is |C 3 | = |M ||ρ||k| = 2 × 5 × 16 = 160.Note that in this case The corresponding code table example can be found in Table 6.The solution space of the rate pairs in this case is 160 × 160.To summarize, three code tables are considered: Achievable rate pairs only using different coding rate and modulation combinations in code table C 2 [9].
• C 1 (proposed): Varying symbol rate with fixed coding rate.We use this code table to verify the efficiency of the proposed varying symbol rate method.FIGURE 4.

TABLE 6 .
An example of code table C 3 .
Searching for OMA Rate pairs . . ., |C 3 |} that ensures the BER i ≤ ; 1: Set a threshold for BER; 2: Set desired parameters for {M , ρ, k} and build C 3 ; 3: Sort C 3 based on r's column in descending order; 4: for UEi i ∈ {1, 2} do In all cases, the sampling rate f s is 10 Msps, and modulation types include BPSK, QPSK, and 16QAM.For the first case (proposed), the feasible solutions are from code table C 1 .We vary the samples per symbol k i from 8 to 24 with a step two in which the symbol rate varies correspondingly from 4.167 × 10 5 to 1.8 × 10 6 symbols per second.In the second case, the MCS with coding rate { 1

TABLE 7 .
Simulation parameters for each point in Fig.6.

TABLE 8 .
Average execution time comparison over different methods.

TABLE 10 .
Parameter settings of the achievable rates.