Augmented Envelope Neural Networks on RF-SoC for Digital Self-Interference Cancellation

This paper addresses the challenge of self-interference in in-band full-duplex radios, which can double the operational bandwidth and wireless channel capacity in 5G New Radio’s sub-6 GHz spectrum. To achieve high isolation between simultaneously transmitted and received signals, the study explores envelope neural networks for self-interference cancellation. These networks model non-linear artifacts arising from both the transmit power amplifier and the receiver’s low-noise amplifier. Trained model parameters are subsequently applied in real-time via a neural network-based digital signal processor to mitigate self-interference. A real-time prototype operating at 2.4 GHz, featuring direct-RF sampling at 4.096 GS/s and a 20 dBm transmit power through an external PA, was implemented using an AMD-Xilinx ZCU-111 RF-SoC. The system demonstrates digital self-interference cancellation exceeding 30 dB in real-time over a 32 MHz passband bandwidth, utilizing a novel augmented envelope neural network realized as a systolic array architecture.


I. INTRODUCTION
Modern radio standards, such as 5G NR, aim to connect not only people but also vehicles, drones, and household equipment using massive machine-type communications in machine-to-machine, vehicle-to-vehicle and vehicle-toinfrastructure networks [1], [2], [3].Therefore, the sustainable growth of wireless communications and networking technology would only be possible if new technologies are invented and adapted to improve spectrum efficiency in spectral bands where access to spectrum is scarce.In particular, there is spectral scarcity in sub-6 GHz ''legacy'' bands, where bandwidth is at a premium.Although the mm-wave spectrum is abundant and the future of wireless relies on moving to mm-wave bands, the legacy frequencies remain extremely important for communications due to favorable physics for highly-scattering urban environments The associate editor coordinating the review of this manuscript and approving it for publication was Pedro Miguel Cabral .
(e.g., low path-loss) [4].As a consequence, there is and always will be much demand for the sub-6 GHz spectral band from commercial, public safety, and military systems.In-band full-duplex (IBFD) communication is a promising technology focused on addressing this impending spectrum scarcity challenge [5].IBFD radios can simultaneously transmit and receive (STAR) different information-carrying signals over the same bandwidth [6], [7], thus doubling the bandwidth of operation and therefore the capacity of the wireless channel [8].Such a doubling of capacity is a tremendous advantage for wireless communications in legacy spectral bands with massive applications for future 5G/6G systems [9], [10], [11].
The primary challenge faced by IBFD communication is that the transmit-signal (Tx) leaks to the receiver (Rx).Such leakage is known as self-interference (SI), which can shadow the signal of interest (SOI) and may even damage a sensitive receiver.In typical wide-area cellular communications, the mobile receiver sensitivity can be as low as −80 dBm, while the base station transmits power can be as high as 40 dBm [12].In an example scenario, to achieve a 20 dB signal-to-interference and noise (SINR) ratio, the IBFD radio should achieve 140 dB SI cancellation.This is a very challenging engineering specification due to a host of technical reasons.Despite these challenges, previous work has shown encouraging performance using a combination of electromagnetic, analog, and digital cancellation [8], [13], [14], [15], [16].In [15], we demonstrated a two-stage electromagnetic-analog SI canceller with up to 85 dB of SI cancellation.
In this work, we present a digital canceller that utilizes envelope neural networks for real-time SI cancellation that achieves up to an additional 30 dB SI cancellation using machine learning (ML).This digital canceller can be combined with an RF/analog canceller, as shown in Fig. 1 to achieve the target SI cancellation performance typically required in practical communication systems.This paper makes two primary contributions to the domain of IBFD systems.First, we introduce and evaluate three envelope neural network architectures designed for SI cancellation using simulation data.Second, we demonstrate a real-time digital hardware implementation using a systemon-chip (SoC) architecture, which facilitates real-time data acquisition, and augmented envelope neural network-based SI cancellation.Our experimental results confirm that the system achieves over 30 dB of digital SI cancellation and supports simultaneous transmission and reception.The rest of the paper is organized as follows.Section II presents a review of current SI cancellation methods.The effect of RF front-end components on SI is modeled in Section III.The proposed envelope models for SI cancellation is introduced in Section IV, which is followed by a performance comparison using simulation data in Section V. Section VI describes the hardware architecture for real-time digital SI cancellation.The experimental measurements are presented in Section VII, which also demonstrates the STAR capability using a remote quadrature phase shift keying (QPSK) transmitter as the SOI.The final section summarizes our findings and present possible future work.
Antenna cancellation can be achieved using differential feeding [26], [27] or polarization diversity [28].In arrays or multiple input multiple output (MIMO) systems, antenna cancellation can be achieved by placing the Tx elements at a specific distance from the Rx elements, such that, the SI experiences destructive interference at the Rx antenna [16], [29].In this work, we simply used Tx-Rx antenna separation as the front-end isolation technique.The typical approach to achieve RF cancellation is to use a ferrite circulator to isolate the Tx and Rx [30].Recently, alternatives to ferrite circulators have been proposed that can operate on wider bandwidth, with low insertion loss, and with the ability to integrate into monolithic microwave integrated circuits (ICs) [31], [32], [33], [34].We demonstrated how a sequentially switched delay line (SSDL) circulator achieves over 25 dB Tx-Rx isolation over a 1GHz bandwidth [35].Another method to achieve RF cancellation is balanced duplexing [36].This technique has shown even better isolation at the cost of poor insertion loss [37], [38], [39].Tapped delay line filters can be used to generate an estimate of the SI signal, which in turn, is subtracted from the received signal to perform RF cancellation [15], [22], [40], [41].Antenna-and RF/microwave circuit-based SI cancellation play a crucial role in IBFD radios because if the SI signal is much stronger than the received signal of interest (SOI) when it reaches the analog-to-digital converter (ADC), the SOI can be completely shadowed by quantization noise.Digital SI cancellation is useful to capture higher-order non-linearities that are very difficult to model using analog circuitry.Several approaches to achieve SI cancellation in the digital domain are reviewed next.
A simple but effective method to perform digital SI cancellation is to use an adaptive finite-impulse-response (FIR) filter [42], [43] where the coefficients are calculated using the least square error method.In [44], variable fractional delay filters are used to further improve the performance.The performance of FIR interpolation significantly reduces when the SI signal becomes more non-linear at high transmit power levels.As an alternative, non-linear models like memory polynomial [45], [46] and Hammerstein model [47] are used to model non-linear SI signals.In [48], spline-based interpolation is used to reduce the computation complexity of the Hammerstein model-based SI cancellers.
In recent years, there has been a growing interest in leveraging neural networks for digital self-interference (SI) cancellation [23], [49], [50], [51], [52], [53].One notable approach, presented in [54], utilizes a real-valued time-delay neural network (TDNN) for performing digital SI cancellation.In [55], two modified structures for the TDNN are proposed, namely the ladder-wise grid structure (LWGS) and the moving-window grid structure (MWGS), resulting in significant complexity reduction.Additionally, in [56], the authors explore the utilization of dual-neuron neural networks, which effectively model memory effects, further enhancing the efficiency and performance of the SI cancellation process.Our research draws inspiration from TDNN-based SI cancellation and proposes an approach that improves performance by incorporating envelope terms for self-interference modeling.

III. SELF-INTERFERENCE MODELING
Digital SI cancellation is done by subtracting a predicted SI signal from the received signal [25], [57].Therefore, digital cancellation performance relies on the accuracy of the self-interference modeling.This section highlights the key contributors to self-interference and the challenges in accurately modeling these effects.For a more comprehensive study of self-interference modeling, the readers are referred to the dissertation by Korpi [58].
If we consider a perfectly linear RF front-end, the SI signal y(t) can be expressed by Eq. ( 1), where x(t) is the transmit signal and h 21 is the unit impulse response of the mutual coupling between the Tx and Rx.In this work, two separate Tx and Rx antennas are used for the RF front end.In such cases, the mutual coupling can be expressed as the sum of the direct propagation path and other significant nearby reflected paths.Then y(t) can be described by Eq. ( 2) where w i and t i is the attenuation and the delay of each propagation path [59]. (1) In reality, RF front-ends have analog impairments which introduce both linear impairments, as well as non-linear harmonic-and intermodulation-distortion to the transmit signal (baseband) x(t) ∈ C. The dominant source of non-linear distortion is the power amplifier [60].The prevalent issue of the mixer is the IQ imbalance which introduces an image component to the signal whose spectrum is inverted [61].The output of the mixer x IQ (t) ∈ C in the presence of IQ imbalance is given by Eq. (3) where g mix 1 (τ ) and g mix 2 (τ ) are I-and Q-phase impulse responses of the transmitter IQ mixer [47].However, this effect can be ignored for RF systems with IQ calibration [62], [63] or systems that perform digital down-conversion using a numerically controlled oscillator (NCO); such is the system proposed in this paper.
In the presence of high-power jamming/blockers, the receiver's LNA or gain stages will cause signal distortion.However, the power amplifier (PA) is the main source of nonlinear distortion.PA non-linearities are introduced due to PA saturation and become more dominant when the transmit power is high since power efficiency reasons call for biasing into a nonlinear regime of operation.PA non-linearity can be modeled using the memory polynomial shown in Eq. ( 4) [64], [65].An effective digital SI cancellation should be able to model PA output at both low and high transmit power levels.
In section IV, we propose an envelope-based NN model to efficiently cancel the SI where the non-linear artifacts contaminating the SI is dominated by the transmit PA.

IV. PROPOSED ENVELOPE NN BASED DIGITAL CANCELLATION
In this section, we describe three variations of envelope neural network models for capturing non-linear distortions.The baseband SI canceller operates in discrete time t = T clk k t ∈ R, k, k = 0, 1, . . .and T clk is the sample period.A memory-less PA non-linearity can be represented by AM/AM and AM/PM conversion functions [64].This relationship can be expressed using Eq. ( 5), where A(t) ∈ R and θ(t) ∈ [−2π, 2π] represents the amplitude and phase of the input signal, respectively; furthermore, G A (•) and φ G (•) are non-linear AM/AM and AM/PM conversion functions, respectively [64].
Eq. ( 5) can be rearranged as follows, ∈ C is the baseband input signal and f (•) ∈ C is a complex-valued function that represents the non-linear distortion; here, f (•) depends on the PA type, and operational condition (i.e., DC bias point, temperature, and age) [66].The dynamic nature of the wireless environment will introduce time-dependent changes to the SI signal.Therefore, an adaptive approximation function is required to model f (•).Let f nn be an approximator for f (•) then, ∀ x(t) ∈ T Eq. ( 6) is satisfied where T ∈ C is the set of all complex baseband transmit symbols and ϵ ∈ R is a positive tolerance parameter.

A. ENVELOPE NEURAL NETWORK (ENN)
We employ a feed-forward network with a single hidden layer that contains a number of hidden neurons as the approximator.
The output of such a network is given by Eq. ( 7) where w i , and b i represent the output weight and the bias of the i th hidden neuron, respectively [67].Here, a ij are the input weights between the j th input neuron and i th hidden neuron, and ϕ is a non-linear activation function.
From Universal Approximation Theory [68] it follows that ∀ ϵ > 0 : ∃n ∈ N such that Eq. ( 6) is satisfied.Therefore, a feed-forward NN with a sufficiently large number of hidden neurons can approximate the non-linearity in a SI signal with an arbitrarily small tolerance.We need to however keep in mind that computations occur in real-time on a digital hardware processor, which inherently uses digital arithmetic having finite precision.For example, the processor may use the twos complement fixed-point format.Therefore, the non-linear effects of the computation arising from digital arithmetic must be negligible compared to the tolerance one is aiming to achieve.To ensure this, we will be using 32-bit precision throughout the implementation, to serve as a baseline for future reduced-precision digital hardware processors.
We so far had assumed a memory-less PA behavior.However, the output of a PA is dependent on the current input and past input waveforms due to memory effects, which is caused by electromagnetic coupling between the PA's output and the input ports.The input to the digital NN is fed by a tapped delay line (i.e., a form of digital FIR filter) to account for the memory effect.The above type of NN is referred to as a time-delay neural network (TDNN) [69].The modified NN output is described by Eq. ( 8), where m represents the memory length of the PA, and where τ j = T clk j ∈ R are time-delays which are integer multiples of the digital sampling/clock period.
, the output of Eq. ( 8) should be complex.Therefore input/output weights and biases should be complex-valued.Instead of using complex weights and biases, the network can be represented using a real-valued network [70].Since our input to the network is real, we can simply treat the single complex output as two independent real outputs and represents the network using only real weights and biases.We will call this network the envelope model for the rest of the paper.The final output of the predicted SI signal is given by Eq. ( 9).

B. AUGMENTED ENVELOPE NEURAL NETWORK (AENN)
We can absorb the multiplication in Eq.9 to the neural network input to generate a new model.To keep the model as a real-valued neural network, we will provide the real and imaginary parts of the x(t) signal as two independent inputs.We will refer to this model as the augmented envelope model for the remainder of the paper.The model output is described in Eq. (10).
where, for In section V, we compare performance of both an envelope model and an augmented envelope model NN against the reference NN model proposed in [71].

C. AENN WITH PIECE-WISE LINEAR ACTIVATION (AENN+PWL)
Traditional neural networks use continuous functions such as sigmoid or hyperbolic tangent functions to achieve the non-linear activation operation.The relatively high computational complexity of such functions when realized in parallel digital hardware results in high consumption of both energy and logic resources.We propose using a simple piece-wise linear function inspired by the PA AM/AM curve as the activation function.Although a typical PA has multiple transistors, a single transistor amplifier (Fig. 2(a)) can be used as a simplified model to understand the non-linear behavior.The relationship between output and input power for a common-source amplifier [72] is shown in Fig. 2(b)[blue curve].The P out /P in relationship is linear at low input power levels and starts saturating after the 1 dB compression point.Using this characteristic as an inspiration, a piece-wise-linear (PWL) function described by Eq. ( 11) is proposed for the activation function where n is the weighted input to the neuron.The proposed function acts as a linear activation function for small input values and then saturates (i.e., the gradient reduces to 0.125 ≪ 1) after exceeding the threshold value δ, a hyperparameter set to 0.6 in all our simulations and experiments.The gradient 0.125 is chosen because the multiplication can be implemented using shift operators, thus simplifying the digital implementation.
It is shown in [68] that piece-wise linear functions satisfies the universal approximator property.For instance, the piecewise linear function known as rectified linear unit (ReLU) function is said to learn faster and perform more efficiently [73].Therefore, the proposed function can be considered a valid activation function for the augmented envelope neural networks.The output of this model is given by Eq. ( 12).Our simulation results show that this activation function achieves better performance when compared with hyperbolic tangent activation function while simultaneously reducing the hardware complexity.
where, for In our simulations, a MATLAB dataset encompassing the measured input and output signals of an NXP Airfast PA was employed [74].Although there are additional non-idealities that affects the SI signal, such as IQ imbalance and oscillator phase noise, this dataset was chosen for our simulations with the assumption that the PA is the primary source of signal distortion [60].The transmit waveform is an OFDM signal with 100 MHz bandwidth.MATLAB's statistics and machine learning toolbox is used to train and test the models.The comparison is done for three models: TDNN, ENN, and AENN.All models have a single hidden layer with 20 hidden neurons and a memory length of 10.Fig. 3(d) shows the normalized mean squared error (NMSE) comparison of the three models.AENN model has the best performance of −33 dB, while the other two models have almost the same performance of -29 dB.An important factor to consider in the comparison is the number of learnable parameters since it determines the model complexity.Since all models have the same number of hidden neurons, the difference in learnable parameters is proportional to the number of inputs to the neural network.The number inputs to the three models are 1, 2, and 3 for the ENN, TDNN, and AENN respectively.Since all three models have 20 hidden neurons and a memory length of 10, the number of learnable parameters in the hidden layer is equal to 200, 400, and 600 for the ENN, TDNN, and AENN models.The ENN model has only 0.07 dB increase in NMSE despite using 200 parameters less than the TDNN model, which makes it more suitable for deployment in low-power IBFD radios.

A. SI PREDICTION ERROR VS. TRANSMIT POWER
The SI signal can be modeled using a finite impulse response (FIR) digital filter when the transmitter is linear.Linear FIR filters typically require fewer hardware resources than an NN-based nonlinear digital predictor for approximating the SI signal.However, real-world transmitters exhibit significant non-linearity, resulting in a non-linear mapping between the transmit signal and the received SI signal.We first explore the advent of non-linear distortion as a function of system parameters using computer simulation of microwave components.To wit, a circuit envelope simulation [75] was carried out using the vendor-provided simulation model parameters(i.e,gain, 1dB compression point, and IP3) of the Mini-Circuits ZX60-83LN-S+ amplifier, which was chosen for purposes of experimental verification, to compare the performance of FIR interpolation with the proposed augmented envelope neural network for different output power levels.The simulated results of predicted SI using FIR interpolation (blue), augmented envelope neural network with hyperbolic tangent activation function (green), and piecewise-linear activation function (red) are shown in Fig. 4.
The simulation results in Fig. 4 show that the ability of the FIR interpolation to cancel SI degrades when the output power level increases; on the contrary, envelope neural network can maintain a similar performance across a wide range of transmit power levels.This occurs because the PA output becomes more non-linear at higher output power levels, which can be verified by observing the shape of the constellation of the SI signal.Simulations further show that the augmented envelope model with hyperbolic tangent activation functions performs rather poorly compared to linear digital FIR interpolation when the PA operates in its linear region.However, the augmented envelope model with the piecewise linear activation function consistently performs better than the other models in both linear and non-linear regions.The normalized mean squared prediction error for different output power levels are shown in Fig. 5.The digital FIR linear interpolation works better at low PA power levels but quickly deteriorates when the PA is close to saturation.The first envelope model performs better than FIR interpolation with both activation functions but suffers from the same performance degradation at high power levels.The augmented envelope model shows more resilience for high output power than other models and delivers the best performance in both linear and non-linear regions.Based on these results, we decided to implement a digital design to perform real-time SI cancellation using an augmented envelope model.The real-time implementation of AMD-Xilinx RF-SoC ZCU-111 is described in the next section.

VI. DIGITAL IMPLEMENTATION OF SI CANCELLATION USING AMD-XILINX ZCU-111
We explored a real-time implementation of the proposed augmented envelope neural network-based digital canceller using AMD-Xilinx ZCU-111 RF-SoC, which is a field programmable gate array (FPGA) having integrated ADC and digital to analog converters (DACs).Overall system architecture on the RF-SoC is shown in Fig. 6

A. OVERVIEW OF RF-SOC TECHNOLOGY
The Zynq UltraScale+ RF-SoC ZCU-111 board serves as a versatile platform for high-performance RF applications, including wireless communication and radar systems [76], [77], [78], [79].AMD-Xilinx ZCU-111 RF-SoC features 8 spatial channels for ADCs with 12-bit precision and sample rates up to 4.096 GS/s, alongside another 8 channels for DACs with 14-bit precision at sample rates up to 6.554 GS/s [80].The system also incorporates 8 soft-decision forward error correction units (SD-FECs) for enhanced reliability.The architecture includes an embedded ARM A53 processor, facilitating system-on-chip designs.Complemented by Arm Cortex-A53 and Cortex-R5 subsystems, the platform integrates UltraScale+ programmable logic, offering unparalleled signal processing bandwidth in a digital signal processing environment.

B. EXPERIMENTAL SETUP
The Tx and Rx blocks are 64 quadrature amplitude modulators (QAM) that encode and decode a text message.A preamble that uses the barker code [81] of length 13 is added to every frame for synchronization purposes.The frame size is set to 56 symbols.The baseband signal is up-converted to a RF carrier centered at 2.4 GHz in the digital domain using the inbuilt numerically controlled oscillator (NCO) that is available inside the RF-SoC.The DAC output is fed into the PA (Mini-Circuits ZHL-2W-63-S+).The amplified Tx signal is then fed to a 2.4 GHz panel antenna consisting of a static array of patch antenna elements.The receiver is fed by an identical patch antenna array.The receiver uses the Mini-Circuits ZRL-2400LN+ LNA  and performs digital down-conversion using the NCO.The experiments do not utilize multi-level analog cancellation because the objective is to evaluate the performance of the SI cancellation at the baseband level using proposed NN-based non-linear canceller operating in real-time.We however stress that eventual communication system examples would require multi-level SI cancellation across electromagnetic, analog and digital domains.

C. DIGITAL ARCHITECTURE
The NN processor architecture is shown in Fig. 6(b).In the digital design, we limit the number of hidden neurons to ten.Since the network has many tunable parameters, a digital controller is designed to serially feed the NN weights and bias values.The controller will start the weight update routine when the processor signals the availability of new weights by setting the in_update signal.The internal counter requests the weight and bias values by their address and stores these in the internal block random access memories (RAMs).Next, the controller will use a bit-serial connection to store these values in weight and bias registers.The main computation blocks of the design are the multiply and accumulate (MAC) unit and activation function.Implementation of the activation function is reduced to a comparator, multiplexer, and bit-shifter because of the proposed low-complexity activation function.These are realized on the RF-SoC's programmable logic fabric as a field programmable custom computing machine.The parallel MAC units form a systolic processor which perform vector operations at clock rate F clk = 1/T clk ; this part of the digital design consumes most of the computational resources pertaining to the digital design.The complexity of the MAC unit grows in proportion to the input size, which can be noticed from the resource utilization Table 1.These results are based on the high-level synthesis report from MATLAB HDL coder [82].The generated design is then used in AMD-Xilinx Vivado for FPGA implementation.The resource utilization of the implemented design showed 27k, 15k, and 36k configurable logic block (CLB) utilization for TDNN, ENN, and AENN models respectively.
The NN model needs to be periodically updated because of the dynamic nature of the wireless environment.The embedded processor is preferred for training because it can perform complex floating-point arithmetic effectively.However, this requires low-latency access to IQ data samples from the processor.The FPGA is running at 16 MHz, which means it will produce IQ samples every 62.5 ns.Although the ARM processor is clocked at 1 GHz, an application running in the processor takes several milliseconds to access the data stored in RAM.Therefore, two FIFO buffers are created to store 2048 samples of Tx and Rx IQ data, which will be transferred to dual data rate-3 (DDR3) RAM on the ZCU-111 RF-SoC prototyping board.Multiple buffers are stored in the RAM and synchronously transferred as a single data frame using a batch transfer to the processor in order to perform the NN-training.the Tx and Rx frames are communicated to a host computer running Linux via 1 Gbps ethernet connection running the user datagram protocol (UDP) protocol.This ethernet based data link is used to verify the SI cancellation performance on the Linux host computer.

A. EXPERIMENTAL HARDWARE SETUP
The experimental setup is shown in Fig. 7(a).A 20 dB attenuator is used to limit the radiated power to 20 dBmselected to emulate the transmit power of a typical WiFi access point.The AMD-Xilinx ZCU-111 RF-SoC device is utilized as the digital signal processing platform to generate the baseband transmit symbols, and receive the signals containing SI such that the proposed algorithm can be applied on a per clock-cycle basis.A Mini-Circuits ZHL-2W-63-S+ is used as the PA for the transmitter and a Mini-Circuits ZRL-2400LN+ is used as the LNA for the receiver.The measured S-parameters of Tx and Rx antennas and their mutual coupling are shown in Fig. 7(b).The S-parameters demonstrate excellent power matching, and mutual coupling between the Tx and Rx arrays at about −30 dB (nominal) which sets the level of SI in the received signal.

B. MODULATION AND WIRELESS CHANNEL
For the purpose of demonstration of augmented envelope NN-based SI cancellation, we chose to modulate the Tx signal with random binary information using 64 quadrature amplitude modulation (64-QAM).The modulated Tx signal is used in full-duplex mode together with another quadrature phase shift key (QPSK) signal is applied to the remote transmitter.The remote transmitter communicates to the full-duplex system to form its Rx SOI; this signal is an overthe-air wireless signal.The remote source Tx uses a horn antenna as shown in Fig. 7(c) having gain 10 dBi, and transmit power of 20 dBm.

C. DIRECT-RF DIGITIZATION
ADC resolution is crucial for effective digital SI cancellation.Generally speaking, we analyze the ADC requirements for advanced modulation used in high-capacity communications.Let the Tx signal contain M-nary QAM loaded orthogonal frequency division multiplexing (OFDM) modulation having a Fast Fourier Transform (FFT) of discrete frequency N-bins.The dynamic range of the Tx signal is equivalent to that of an log 2 M + log 2 N bit quantized signal.Recall that the SNR of an n-bit ADC is approximately 1.76 + 6n dB.Let the desired SINR of the remote Rx SOI that is required for error-free demodulation by at least γ dB where L = ⌈((γ − 1.76)/6)⌉ bits.Therefore, the ADC must quantize the baseband signals (assume real-valued channels) at B = log 2 M + log 2 N + L effective number of bits (ENOB) in order that digital SI cancellation may subtract the Tx signal without corrupting the SOI.
For example, for N = 2048 bin FFTs loaded by 64-QAM as commonly found in 4G cellular communications, and assuming there is a remote site transmitting QPSK with the desired SINR at (say) 20 dB at the FD Rx, we need to quantize at a minimum resolution of B = 6 + 11 + (20 − 1.76)/6 ≈ 19 ENOB at the ADC for SI cancellation to be viable.The ZCU-111 has RF ADCs have 12-bit outputs.However, its ENOB is less than 12 bits as the real-world performance of RF ADCs is poorer than the number of physical bits at the output port due to non-linearity in the internal circuitry: clearly, the ADCs on the RF-SoC do not have sufficient ENOB resolution for effective SI cancellation in the above case.Consider a second example: For 64-QAM at the Tx with no OFDM, and L=12 from the ADC of which 6-bits are required to accommodate the 64QAM constellation.Therefore, there is L SIC = 6-additional ENOB worth of dynamic range left for SI cancellation.The best-case SIR achievable via digital SI reduces to about 1.76 + 6L SIC ≈ 37.76 dB: the best-case performance that can be achieved.Unless ENOB is very large, ADC performance is dominated by quantization noise, and therefore, ADC thermal noise has been neglected.
In the subsequent sections, we show our real-time measurements reach 35 ± 0.7 dB: only marginally below the expected best-case SINR.The loss of performance (about 2 dB) is likely due to thermal noise in the ADC, antenna/microwave component mismatches, or experimental/measurement errors.

D. ADC/DAC RATES AND POLYPHASE SAMPLE RATE CONVERSION
The DAC operates at a sample rate of 4.096 GS/s via four parallel time-interleaved streams having sample rates of 1.024 GS/s.Each DAC sample stream operates at a clock rate of 128 MHz using an up-sample factor of ×8 that yields the 1.024 GS/s stream rate.The digital design for processing 64QAM modulation is targeted at 16 MHz clock rate (32 MHz passband bandwidth), yielding a data rate of 96 Mbps (maximum).The processing core updates samples at the rate of 16 MHz, which is applied to the 128 MHz rate DAC sample stream using over-sampling and interpolation using sample copying (i.e., zero-order hold).Up-conversion to 2.4 GHz occurs within the DAC core using a polyphase NCO.
On the ADC side, the RF-ADC samples at 4.096 GS/s and DDC operating via an NCO produce the down-shifted signal in a multi-rate format.The baseband samples at 4.096 GS/s are split into 4 parallel streams in an internal decimator core to produce a single stream containing IQ baseband samples having ENOB 12-bits of precision operating at 1.024 GS/s.The stream is further decimated in a polyphase signal processing block available within the RF-ADC unit to yield a sample of 128 MS/s.I.e., the receiver produces two streams (for I and Q) each sampled at 128 MHz and having baseband bandwidth up to a maximum of 64 MHz.However, due to guard bands and filtering requirements, our example is limited to 16 MHz and is produced via a rate-transition block within the RF-SoC.The final data stream for processing consists of two 12-bit streams each updated at a rate of 16 MHz.These streams are applied to the machine learning processor for real-time inference using the augmented envelope NN.The machine learning processor may operate up to a maximum clock rate of about 50 MHz although this example requires only a 16 MHz clock frequency.At the moment, for testing purposes, the measured performance was estimated off-line by streaming IQ data to the host PC using the universal datagram protocol (UDP) packet engine running on the RF-SoC, due to the need for packet synchronization between the predictions of the ML cores and the transmit waveform available within the RF-SoC.

E. MEASURED PERFORMANCE
The real-time measurements of the digital SI cancellation is shown in Fig. 8.To create a reference to compare the effect of the RF front-end on the cancellation, (a)(b) measurements were made with a direct wired connection through a 50-ohm transmission line and a power splitter to combine the intended SOI which uses QPSK modulation.In the absence of SOI A performance comparison with other digital cancellation methods is shown in Table 2.The comparison of the digital cancellation performance independent of the RF front-end is challenging since different front-end cancellation techniques introduce different distortions to the received signal.Therefore, for this comparison, we selected published work that uses one of the two simple measurement setups: directly wired loopback (Wired), Tx-Rx antenna separation (OTA).The work [83] is used to provide a reference for neural-network-based digital SI cancellation which use a dataset as the input waveforms.
The work compared in Table 2 operates in 2.4 GHz band except for [59] which operates at 260 MHz band.An important point to note is that models such as Hammerstein, Winner, and MP require the computation of high-order terms of the input signal, whereas, the proposed method relies only on the input signal and its envelope.The performance of a SI canceller varies based on the operating bandwidth, transmit power and amount of SIC [61].To compare our measured results with those published by other researchers, we define a unitless metric as M = B • e (P T +SIC)/100 ≥ 0 where B is the bandwidth in MHz, P T is the FD transmit power in dBm, and SIC is the best-case cancellation in dB.We defined our own metric as there does not seem to be a standard metric that takes into account bandwidth, transmit power, and SI cancellation.Although this metric enables comparison of different methods, it is not universally applicable as the importance of each factor varies depending on the application.The metric M linearly associates performance to bandwidth while assigning exponentially increasing levels of difficulty (performance) for both the transmit power level and SI cancellation -since these system parameters typically increase in dB increments in system design.The metric M shows increased values for better performance; i.e., bigger means better.

VIII. POWER VS. PERFORMANCE TRADEOFFS
The SI cancellation stages tradeoff power to achieve higher cancellation.In digital cancellation, the canceller increases the digital circuit complexity which leads to an increase in power consumption.Our primary objective in this work is to demonstrate the in-band full-duplex communication capability in base stations, which does not have a stringent power budget limitation compared to mobile devices.Implementing SI cancellation in mobile devices is more challenging due to the stringent power budget limitation.
In order to adopt SI cancellers in mobile devices they should have relatively low complexity.Our ENN-based canceller (Section.IV.A) was shown to achieve higher isolation with a shallow network (i.e., fewer parameters and thus lower complexity) compared to the TDNN-based canceller proposed in [71] (Section V).Also, ENN is capable of capturing both linear and nonlinear components of the self-interference which eliminates the requirement to implement two different canceller modules as proposed in [56].However, optimizing the digital canceller for mobile devices requires further investigation because the coupling between the transmitter and receiver is quite different compared to base stations; this will be one focus for future work.

IX. CONCLUSION
In-band full-duplex technology is a promising candidate for improving the spectral efficiency of 5G NR and other future wireless systems.In this paper, it was proposed that an augmented envelope neural network with an activation function inspired by the PA transfer function provides practically relevant levels of SI cancellation in the digital baseband across both linear and non-linear ranges of PA operation.Experimental results from a real-time custom hardware processor realized on the AMD-Xilinx RF-SoC showed good performance within about 2 dB of the theoretical best case of 37.76 dB.Further, the measurements for the over-theair testing in a real-world spectral environment (laboratory space with multiple 2.4 GHz WiFi routers) showed over 30 dB SI cancellation.A remote QPSK transmitter operating at 20 dBm at a distance of about 2m from the FD system was used to demonstrate the capability of the implemented system to simultaneously receive while transmitting in the same frequency band.The implementation is done utilizing the RF ADCs and ARM A53 embedded processor of the AMD-Xilinx ZCU-111 RFSoC platform.In the future, we will utilize RF and antenna cancellation to build an IBFD radio capable of wide-area IBFD communication that has over 100 dB SI cancellation where the proposed augmented envelope NN-based digital canceller will be implemented at the baseband in the digital software-defined radio backend to achieve up to 30-35 dB of cancellation in addition to multi-level SI cancellation in the electromagnetic and analog domains.

FIGURE 2 .
FIGURE 2. (a) Circuit diagram of a single transistor amplifier (b) Simulated gain and AM/AM curves of the circuit (a).

FIGURE 4 .
FIGURE 4. Prediction error of the FIR interpolation (blue), memory polynomial (purple), augmented envelope model with hyperbolic tangent activation (green) and piece-wise linear activation (red) when the output power is 30 dBm (a) and 38 dBm (f).Predicted constellation points by the four methods are shown on the right when the output power is 30 dBm (b)(c)(d)(e), and 38 dBm (g)(h)(j)(k).In the constellation plots, the actual received SI is shown in black.

FIGURE 5 .
FIGURE 5. Normalized mean squared error (NMSE) plotted against the output power for digital FIR interpolation (blue), memory polynomial (purple), envelope model with hyperbolic tangent activation (gray) and augmented envelope model with hyperbolic tangent activation (green), piecewise linear activation (red).

FIGURE
FIGURE (a) System on chip architecture for the IBFD radio with digital cancellation (b) Digital canceller design that performs neural-network-based inference (c) Vivado block diagram of the system.

FIGURE 7 .
FIGURE 7. (a) Diagram of the experimental setup for verifying the digital canceller performance (b) Measured antenna S-Parameters of the Tx and Rx Antennas (c) Photograph of the experimental setup.

FIGURE 8 .
FIGURE 8. Constellation diagrams of (a) direct wired connection from Tx to Rx without using PA and LNA, (b) the signal of interest with QPSK modulation (at remote transmitter) is combined with the Tx signal using a wideband microwave/RF power splitter and RF feed to the Rx using a 50-ohm transmission line based wired connection, (c) Over-the-air (OTA) tests using the setup shown in Fig. 7 without using QPSK transmitter, (d) OTA test with the QPSK transmitter operating at 20 dBm output power.

FIGURE 9 .
FIGURE 9. Measured digital SI cancellation for consecutive frames using wired loopback (blue) and OTA testing (red) in the absence of SOI.

TABLE 1 .
Resource utilization and critical path delay based on the high-level synthesis report produced by MATLAB HDL Coder.

TABLE 2 .
Performance comparison of this work with other published work.