Analog-Domain Self-Interference Cancellation for Practical Multi-Tap Full-Duplex System: Theory, Modeling, and Algorithm

Practical, in-band, full-duplex (IBFD) systems typically require more than 100 dB of self-interference cancellation (SIC). Digital processing alone is insufficient for achieving this target, which drives us towards supplementary analog mitigation techniques. We propose an analog-domain, self-interference cancellation circuit to enable pass-band, analog SIC in an IBFD system. Analog SIC is limited by several hardware constraints and design choices, including finite tap-delay resolution, non-negative tap constraints, and bit precision quantization. We characterize the performance impact of each of these limitations as a function of signal bandwidth, carrier frequency, bit precision, and other system design parameters. We further characterize the achievable system performance under all of these limitations combined. We simulate several realistic examples to illustrate the relationship between the achievable self-mitigation performance and various system design choices. We implement a simple constrained optimization algorithm informed by these results to optimize the tap-delay weights of the analog circuit under these system constraints. We simulate the achievable mitigation performance and demonstrate as much as 45 dB of analog-domain, self-interference mitigation of a wide-band signal with realistic system configurations.


I. INTRODUCTION
M ODERN wireless communications systems demand increasing access to the already crowded electromagnetic spectrum, which motivates a range of efficient spectrum management techniques. in-band, full-duplex (IBFD) systems are a popular choice for improving spectral efficiency, but this approach introduces its own set of challenges. While Example self-interference mitigation strategy for a system that requires 120 dB of receiver isolation. To achieve this target, we cascade several techniques, including spatial antenna isolation, analog circulation, digital suppression, and analog suppression. While these individual techniques may perform better in certain circumstances, the numbers presented here are typical and achievable for traditional RF systems [4]. typical systems use some form of time-division duplex (TDD) or frequency-division duplex (FDD) to isolate the transmit and receive antennas, IBFD systems implement simultaneous transmit and receive (STAR) at the same time and frequency. To enable this capability, we must consider alternative approaches to isolating the transmit and receive antennas [1].
Successful IBFD technologies typically require some form of self-interference cancellation (SIC) to prevent the transmit channel from overwhelming the receive channel [2]. In typical systems, the receiver sensitivity is over 100 dB lower than the transmit power [3], so several approaches isolate the transmitted energy, including spatial isolation, analog circulation, digital suppression, and analog suppression, as depicted in Fig. 1. In general, each of these approaches is insufficient on their own, so we employ a cascade of isolation techniques to achieve the target performance.
In this study, we propose a digitally-controlled, analogdomain, self-interference cancellation (SIC) approach to support standard IBFD systems. This class of suppression techniques typically focuses on code-based [5] and wideband photonic-based [6], [7] methods, but can also be extended with time-domain [8], frequency-domain [9], [10], digitallyassisted [11], [12], and single-antenna specific [13], [14] architectures. These techniques introduce a wide range of design challenges, including hardware limitations, dynamic range, and quantization. We investigate the performance impact of these design constraints and characterize the achievable performance of the proposed cancellation technique under realistic design configurations.

A. Background
The proposed digitally-controlled, analog mitigation circuit is subject to both digital and analog limitations, including signal quantization [15], sparsity of delay elements, dynamic range and quantization of coefficient weights, discrete model mismatch [16], amplifier nonlinearities [17], in-phase and quadrature (I/Q) mismatch [18], and timing precision [19]. In this study, we characterize these design limitations and quantify their impact on achievable self-interference cancellation.
1) Weighted Delay Elements: Time-domain cancellation circuits are typically implemented using a series of weighted delay elements, also referred to as a "tap-delay line" [2]. Early implementations of this approach used a single delay element (tap), but only performed well for a specific frequency, short delay spread, and precise hardware [20], [21], [22]. Multitap implementations significantly relax these constraints at the cost of additional computational complexity [23], [24]. The additional degrees of freedom introduced by multi-tap implementations also create several interesting optimization opportunities, most notably the spacing [24], [25] and sparsity [26], [27] of the delay elements in time.
2) Quantization: For the proposed digitally-controlled, analog tap-delay circuit, achievable performance is limited by the quantization of the tap-delay coefficients. A simple implementation may execute an efficient convex solution to optimize the tap-delay coefficients, then simply round the result to the corresponding hardware precision. Quantization is well-studied in this context [28], [29], [30], [31], which bounds achievable mitigation performance by 1/12n 2 , where n is the interval range. To further improve performance, we can integrate the quantization constraints directly into the optimizer to mitigate this rounding penalty.
3) Nonnegative Tap Coefficients: In many cases, enforcing a nonnegativity constraint on the tap-delay coefficients can greatly simplify hardware design at the cost of greater computational complexity. While this constraint is unconventional for analog SIC, it has proven useful in a range of machine learning and nonnegative matrix factorization (NMF) applications [32]. Nonnegative Least Squares (NNLS) has demonstrated asymptotically favorable performance on high-dimensional data sets [33], [34] and has been successfully applied to several applications, including echolocation [35], MIMO radar [36], beamforming optimization [37], and activity detection in massive MIMO systems [38], [39]. In previous studies, we have explored several techniques for mitigating the performance penalty incurred by enforcing a nonnegativity constraint on the tap-delay coefficients [4], [40].
Our work directly characterizes the effects of the aforementioned hardware constraints on the optimal tap coefficients, which is novel and of broad interest. Previous work has examined tap spacing, but not characterized the outcomes to the detail of this work. Additionally, non-negative tap coefficients is novel in terms of theoretical feasibility for SIC, as is the practical implementation utilizing a hybrid tap filter. Our analysis provides a theoretical tool that provides accurate approximations of the performance based on the hardware constraints. This guides the early system design choices so that engineers can clearly understand which constraint is the limiting factor across the desirable operating range.

B. Contributions
We propose a theoretical tool to guide design choices and evaluate trade-offs of multi-tap digitally-controlled analog circuits for SIC in IBFD systems. We introduce the models and analysis to theoretically characterize the effects of the aforementioned constraints. We illustrate that the independent effects of the constraints combine to approximate overall performance over the full operating frequency range. We further provide constrained optimization techniques to optimize the tap-delay coefficients considering the hardware constraints. We characterize the performance impact of these limitations in a MATLAB simulation platform and demonstrate as much as 45 dB of self mitigation under realistic system configurations.

C. Organization
This manuscript is organized as follows: in Section II, we formally define the problem statement and describe the architecture of the proposed circuit; in Section III, we discuss the hardware limitations and optimization of a single-tap implementation; in Section IV, we extend this discussion to a multi-tap implementation; in Section V, we simulate the proposed circuit in a MATLAB simulation environment, characterize the achievable performance in several configurations, and discuss potential modifications to improve performance; and in Section VI, we summarize the results and provide concluding remarks.

II. HARDWARE IMPLEMENTATION
To achieve strong cancellation (35+ dB) for a wide bandwidth operating from 100-500MHz we designed a hybrid filter to implement a large number of tap delays. The first part of the filter uses an 8-tap delay line composed of inductive-capacitive (LC) delay elements (7 LC elements and a non-delayed tap) [41]. The second part of the hybrid filter uses a switched capacitor (SC) bank to delay the reference signal by reusing the eight delays, eight times, to effectively produce 64 taps with distinct gain controls. This hybrid design results in 72 taps, while achieving a compact size and minimal loss.
The first 8 main taps are cascoded common-source fieldeffect transistors (FETs) to vary the gain. Specifically, the design uses a segmented amplifier sized by the bit precision b to control 2 b −sized amplifiers as illustrated in Fig. 2. The maximum tap gain should be near unity and as a result, input powers should exceed receive leakage levels by a few dB.
The SC taps act as weaker taps but cover a longer delay spread. Conceptually, the SC delay bank implements a circular buffer constantly moving the write pointer to replace the oldest value in a 64-element capacitor array. Fig. 3 shows how this is realized by an efficient circuit. The reference signal voltage (V ref ) is successively loaded onto a circular buffer implemented as a capacitor array. Amplifiers connected to the SC array cycle (mux) through the tap weights to  shift past the data samples over time. The muxed weights are broken into A and B subsets to avoid glitching when writing values, otherwise a single mux for all eight values would be appropriate. This process effectively convolves the reference signal voltage with the tap weights to produce the self-interference cancellation needed from the SC taps.
Notably, both the main and SC taps are non-negative. In [4] and [40], simulations demonstrate that non-negative taps can match signed taps for cancelling signals. Non-negative taps require half the area of differential taps, and the hardware proposed here validates this design.

III. PROBLEM STATEMENT
In this section, we define the problem statement, mathematical model of the proposed circuit, and the channel model used in our simulations. The variables used in this manuscript are summarized in Table I. The model of the proposed circuit is depicted in Fig. 4, where for our theoretical purposes we do not model the main taps and SC taps differently.

A. Signal Model
Throughout this discussion, we assume that the baseband transmit signal s(t) is a simple band-limited waveform with bandwidth B given by The proposed analog-domain cancellation techniques operate at pass-band, so this waveform is up-converted to a carrier Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
frequency f c , given by

B. Transmission Model
Assume that the pass-band transmit signal x(t) propagates through a multi-path self-interference channel. The selfinterference y s (t) caused by each scattering path s may be modeled as the original transmit signal with a scaling factor h s and a time delay τ s , such that The scaling factor h s may be any real number, and notably may be negative due to the physical properties of RF signals [42]. The overall self-interference y(t) observed by the receiver is the sum of each of these scattering paths, such that

C. Mitigation Model
The proposed analog canceller depicted in Fig. 4 enables self interference cancellation by combining weighted and delayed versions of the reference signal x(t) to cancel the multi-path reflections observed in y(t). The proposed digital optimization techniques operate on discrete, sampled data vectors. For a sampling rate f s = T −1 s , we define the sampled transmit reference with N real samples as Each delay element k will delay this reference signal by ν k , so it is convenient to write each delayed reference vector x ν k as The discrete self-interference row vector is given by For K total tap-delay elements, we construct the reference signal matrix X such that and the tap-delay coefficient vector w is given by To cancel the self interference signal y, we construct an estimateŷ using the discrete tap-delay elements and weighting coefficients depicted in Fig. 4. This estimate is given bŷ After subtracting the estimateŷ from the received self interference signal y, the remaining residual power Ψ is given by We assume that the delay elements have fixed time delays, so any subsequent optimization occurs over the tap-delay coefficients w using this residual as the objective function, namely

D. Channel Model
To simulate the performance of the proposed technique, we assume a standard Rician channel model [4], [40] on the magnitude and delay of the scattering paths s. We draw a random number of channel taps γ c ≥ 1 using a modified point-Poisson distribution parameterized by integer γ. To randomly generate γ c , we sum the point-Poisson probability for any integer i, until it exceeds a uniform probability u ∼ U (0, 1). The argument to choose the number of channel taps so that γ c ≥ 1 is Once the number of channel taps γ c is determined, the tap delays are uniformly distributed across a maximum delay spread of the channel t ds , such that the scatterer delays τ s are The associated channel tap weights h s are scaled according to an exponential decay and randomized by a normal distribution such that The tap weights are then rescaled to satisfy the Rician equality determined by the Line-of-Sight tap amplitude h LoS and the K-factor K, i.e., The resulting channel model is visually summarized in Fig. 5.
IV. SINGLE-TAP HARDWARE LIMITATIONS In this section, we characterize the performance impact of three primary hardware limitations on a single-tap implementation of the proposed circuit: a) a nonnegativity constraint on the tap-delay coefficients, b) misalignment between the scatterer delay and the hardware tap delay, and c) quantization of the digitally-controlled tap-delay coefficients. We simulate the achievable residual after interference cancellation for each of these limitations as a function of carrier frequency and bandwidth-to-carrier ratio (BCR).

A. Nonnegative Coefficient Constraint
In many cases, enforcing a nonnegativity constraint on the tap-delay coefficients can greatly simplify hardware design, in our case reducing the space by half. While this approach is atypical for this application, previous work has shown that the coefficients can be appropriately optimized to minimize the performance penalty induced by this constraint [4], [40] To gain a fundamental understanding of how non-negative tap weights limit the performance we study a single-tap implementation, which we expand to multi-taps in the next section.
Consider a single scatterer with a negative channel coefficient and zero delay such that the received signal y(t) takes the form Since the tap-delay coefficients are nonnegative, we can approximate a negative coefficient by shifting the tap-delay slightly in time. To shift the phase of the carrier signal by 180 • and thus approximate a negative coefficient, we delay the reference signal x(t) by ε = 1/(2f c ) such that By moving the time delay of the hardware tap by ε = 1/(2f c ), the residual signal ψ(t) may be simplified as When the tap-delay coefficient w = 1, the magnitude of the residual signal is driven by the bandwidth-to-carrier ratio (BCR), i.e. B/f c . By inspection, as the BCR→ 0 then ψ(t) → 0 for any t, indicating that this approximation is most accurate for narrowband signals. Since most radio systems are interested in transmit signals with non-zero bandwidth, we can also optimize the tap-delay coefficient w to further improve this approximation.
For the sampled receive and delayed-transmit vectors y and x ε , the optimal nonnegative tap weight w * may be found by solving the simple minimization Fig. 6. Normalized residual (left) and optimal weight (right) for a single-tap implementing a non-negativity constraint using the time-shift approach.
We solve for w * by taking the derivative with respect to w and setting the result equal to 0, which is the standard least squares (LS) solution, where (x ε x ε ⊤ ) −1 is the inverse of the power of the delayed signal, and (y x ε ⊤ ) is the correlation between the reference and the reception. In Fig. 6, we plot the optimized, normalized residual Ψ * and the corresponding optimal weight w * for bandwidth-to-carrier ratios on the interval (0, 2), where BCR = 0 corresponds to a tone and BCR = 2 is the maximum possible spectral occupancy. As expected, the residual performance worsens with increasing BCR, which is consistent with our previous observations of (24).

B. Tap-Delay Misalignment
In general, the channel time delays τ s will not always align with the circuit delay elements ν k , and this misalignment naturally incurs a performance penalty as the difference between the two increases. The Nyquist Sampling Theorem suggests we should sample at least twice as fast as the highest frequency component f Nyq , which in turn suggests that the spacing between pass-band delay elements is Given the simple signal model defined in the previous section, we will assume that this maximum frequency f Nyq is chosen such that where f c is the carrier frequency and B is the strictly positive signal bandwidth. A channel delay τ s can fall anywhere between two adjacent delay elements ν k and, in the worse case exactly between them, as depicted in Fig. 7. For a multi-tap implementation of delay elements spaced uniformly based on the Nyquist frequency f Nyq , the difference κ s,k (i.e., misalignment) between any channel delay τ s and the nearest delay element ν k is defined as Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. The worst case misalignment κ s,k occurs when the channel tap falls half-way between two taps. Therefore, using the tap spacing defined in (28), the misalignment κ s,k is contained by, which is also contained by the carrier frequency since the bandwidth is strictly positive. Consider a single channel delay τ 1 drawn randomly from the interval ( −1 4fc , 1 4fc ) that has a positive, unit channel coefficient such that the received signal y(t) is simply given by Further consider a single delay element with a time delay of ν 1 = 0. Clearly, the achievable mitigation performance will depend on the difference κ 1,1 = τ 1 − ν 1 . As in the previous example, we can find the optimal coefficient w * by solving the optimization where y τ is the sampled version of y(t − τ 1 ). The solution to this optimization takes the same LS form where w * is now notably a function of τ 1 . Substituting this solution back into (33), the residual can now be written as a function of τ 1 in the form where pow(·) and corr(·) are shorthand notation for power and correlation. In Fig. 8, we plot this normalized residual for several BCRs as a function of τ 1 (and κ 1,1, by proxy, since ν k = 0). As expected, the performance degrades as the channel delay moves away from the hardware delay element in either direction. The worst performance occurs when τ s falls exactly on the midpoint of ± 1 4fc , which causes the cosine of the channel signal to be shifted by ±π/2 and is therefore perpendicular to the reference signal's cosine. Notably, the performance degradation is dominated by the misalignment factor compared to the BCR.

C. Coefficient Quantization
The proposed analog circuit is digitally controlled, so the tap-delay coefficients must be quantized according to some finite bit precision. This quantization is a nonlinear function Fig. 8.
Normalized residual as a function of tap-delay misalignmentnormalized to fractions of a carrier cycle -for three different BCR ratios. In this regime, performance is dominated by the misalignment factor compared to the BCR. and can be difficult to optimize directly, so we start by simply rounding each continuous weight w c to the nearest discretized option w d . For a reference signal x, the residual Ψ q introduced by this quantization is given by Let the signal power ρ = x x ⊤ and assume that (w c − w d ) is a random process. The expected value of Ψ q is then Assuming that w c is a uniform random variable over the dynamic range of the system, then the quantity (w c − w d ) is simply a uniform random variable over the distance between two adjacent quantized test points. Given a dynamic range D = max(w d ) − min(w d ) and number of bits b, the distance between test points is given by If the quantity (w c − w d ) is a uniform random variable over this range ∆, then the variance is simply From (40), we can interpret the affects of different bit precisions and dynamic ranges. For example, with nonnegative weights ranging from [0, 1], the dynamic range D = 1.
Using signed weights, the equivalent range becomes [−1, 1], so the dynamic range is D = 2, corresponding to a +6 dB quantization penalty. Equivalently, adding signed information to a finite precision value requires one additional bit, so if we fix the dynamic range, the signed weights would require one additional bit b + 1, which also results in the same +6 dB penalty.

D. Aggregate Penalty of Hardware Limitations
The nonnegativity constraint, tap-delay misalignment, and coefficient quantization all simultaneously limit the performance of the self-interference canceller. To simultaneously Fig. 9. Normalized residual power for a single tap weight when the channel tap is misaligned (red), nonnegative (blue), and limited by the bit precision (green). Specifically, the nonnegative curves follow the same approximation as discussed in Section IV-A, while the red curves assume a worst-case scenario misalignment κ s,k = 1/(4fc,max).
visualize the penalty that each of these limitations imposes on the system, we plot their respective residuals as a function of carrier frequency and bandwidth-to-carrier ratio in Fig. 9. These results are consistent with the previous results shown thus far, but more clearly highlight which limitations dominate performance in different configurations.
Define f c,max as the maximum carrier frequency for a uniform tap spacing when the bandwidth B = 0, where ν 1 and ν 2 are the time delays of the first and second taps, respectively. The signal carrier frequency f c is then bounded by where we use f c,max (and not f Nyq ) as the upper bound to keep things consistent. Given values for f c,max and B, the bandwidth-to-maximum-carrier-ratio, BCR max , is then The nonnegativity curves follow the same optimization discussed in Section IV-A. The misalignment curves assume a worst-case scenario κ s,k = 1/(4f c,max ). As the carrier frequency increases, the misalignment incurs a larger penalty because f c approaches f c,max , while the nonnegativity constraint incurs a lesser penalty because the fixed bandwidth signal becomes relatively more narrowband.

V. MULTI-TAP EXTENSION
In this section the effects the hardware design choices have on multi-tap lines are examined, extending the analysis completed for single tap optimization from the previous section. The optimal tap weight vector w * is determined given a reference signal matrix X defined by (8) and a selfinterference row vector y, such that, (44) Fig. 10. Normalized residual power as a function of BCR for tap delay lines with multiple taps spaced at integer multiples of ± 1 2fc . The tap weights are optimized to minimize the residual. Additional taps improve the performance at low BCRs, but degrade to 0 dB as the BCR → 2.

A. Multi-Tap Nonnegativity Extension
The analysis in this subsection concludes that more nonnegative taps improves the SIC for low BCRs. Similar to the single tap results, the performance degrades quickly at high BCRs. To formalize this analysis define the self-interference as a negative scatterer, the same way it is defined in (22). Then, assume that there are K taps delayed by (positive and negative) integer multiples of the 180 • phase shift approximation, specifically the tap delays are at ε(k) = k/(2f c ) when k = (±1, ±2, . . . , ±K/2). Spacing the taps at integer multiples of the 180 • phase shifts ensures the optimized tap weights are positive. The specified delays are used to define the reference signal matrix X, so the optimal tap weights may be determined by (44).
The resulting optimal residual (16) due to non-negative tap weights is plotted in Fig. 10 for different tap quantities K over the range, 0 < BCR < 2. Predictably, as the quantity of taps increases, so does the overall performance. However, regardless of the tap quantity at higher BCRs the residual signal power increases rapidly. When BCR≈ 2 the residual approaches 0 dB which is consistent with the single tap results.

B. Multi-Tap Misalignment Extension
This subsection extends the single tap scatterer misalignment to a multi-tap delay line. The optimized results indicate that additional taps improve the performance at lower frequencies, but the residual degrades to 0 dB as the carrier frequency approaches Nyquist limits.
To examine the effect of a misaligned scatterer on a multitap delay line, define the reference signal matrix X with tap delays at ν k = k/(2f c,max ), when k = (0, ±1, ±(K − 1)/2, K/2). The self-interference is a scatterer delayed so that it is misaligned directly between 2 taps, specifically y(t) = x(t−1/(4f c,max )). The optimal tap weights can be determined from (44) and the optimal residual can be calculated from (16). Shown in Fig. 11, as the number of taps increases the residual is reduced i.e., the SIC is improved. But, as the carrier frequency approaches the Nyquist limit the residual degrades to 0 dB regardless of the number of taps, which is consistent with our single tap analysis. Fig. 11. Normalized residual power as a function of normalized frequency. The tap spacing is determined by fc,max and the bandwidth is set so BCRmax = 1/2. The simulated carrier frequencies range from (42), to avoid aliasing and not exceed the Nyquist limit. Increasing the number of taps reduces the residual at lower carrier frequencies, but as fc/fc,max → 1 the residual degrades to 0 dB.

C. Multi-Tap Quantization Extension
Increasing the quantity of taps adds additional rounding error which can reduce the SIC. The total expected residual due to rounding is the sum of the expected residual of each tap. Letting (w c − w d ) k be the rounding error for tap k, the total expected residual for K taps at delays ν k (compared to a single tap in (38)) is It is possible to simplify this expression by noting that the reference signal power regardless of the delay is constant ρ = x ν k x ν k ⊤ ∀ν k and can be removed from the expectation. Additionally, if the rounding errors at each tap are independent and identically distributed (i.i.d.), the total expected residual is which is dependent on the bit precision and calculated by (40). So, while non-negative and finitely spacing show improved performance with additional taps, the opposite is true when considering quantization. Table II summarizes the different design choices that affect the expected residual due to quantization.

D. Multi-Tap Outer Hull
The outer hull of the non-negative, finite spacing, and bit precision constraints approximates the performance of an analog full-duplex tap delay line. The outer hull shows the limiting constraint over the operating region. This subsection combines the constraints into a single simulation and compares the resulting residual to the outer hull. 2fc,max , with 18 taps at 8 bit precision. The self-interference is a misaligned negative scatterer, y(t) = −x t − 1 4fc,max . As the carrier frequency is swept, the optimal residual after mitigating the negative misaligned scatterer closely follows the theoretical outer hull.
The simulation uses a negative misaligned scatterer, which combines the limitations of both the non-negative and finite spacing of the taps. The tap weights are spaced at finite intervals of ±1/(2f c,max ) (identical to the set-up from Section IV-B) where f c,max denotes the absolute maximum carrier frequency of the tap delay line. To the tap weights are optimized as positive only values not exceeding the dynamic range, the objective is reformulated in quadratic form with a non-negative and upper bound constraint, similar to [4].
where ⃗ 0 and ⃗ 1 are column vectors of K entries. A quadratic optimizer, such as quadprog in MatLab, ensures the taps weights are within the dynamic range and efficiently determines optimal tap weights when the objective is defined in this form. The L1 penalty term (λ w ⃗ 1) may be included in the objective without the traditional absolute value around the tap weights (| w |) due to the non-negativity constraint (w ⊤ > ⃗ 0). The L1 penalty provides regularization and forces redundant taps off, but for the purpose of this section assume that λ = 0. After the taps are optimized with a quadratic optimizer, they are rounded to the nearest discrete value specified by the bit precision.
To generate the theoretical outer hull, the residuals due to non-negativity, misalignment, and quantization are determined independently of each other. The outer hull is compared to the simulation that combines all the constraints in Fig. 12. It is clear that the residual from the combined constraints closely follows the theoretical outer hull.
Note that the simulated residual is lower than the expected bit quantization at some points due to the randomness of rounding the tap weights to discrete values. The outer hull is not a hard limit of the performance, but provides an approximation and highlights the limiting constraint at any operating point.

VI. SIMULATION STUDY OF ANALOG SIC
This section applies the theoretical analysis developed in the previous sections and applies it to a realistic setting. Prior to applying SIC, the simulations generate a random band-limited signal and propagates it through a simulated channel. After applying constrained SIC, the average residual is compared against the equivalent theoretically generated residual curves, comprising the outer hull. Additionally, we introduce signal processing and optimization techniques that improve the performance by breaking some of the assumptions required for the theoretical analysis.

A. Simulation Implementation
The simulation implements a random signal, a random channel, optimizes a constrained tap-delay line, and evaluates the residual after SIC to provide average results. Specifically, the waveform is a random message of 10 3 samples modulated with Phase Shift Keying (PSK), passed through a cosine filter, and up-converted to the carrier frequency to simulate a classic transmit chain.
The random signal then propagates through a modified Rician channel we defined in Section III-D. The channel parameters are shown in the first two columns of Table III, where the LoS scatterer is negative and misaligned similar to (47). This generates the self-interference signal y(t) to be mitigated by the tap delay line.
The simulated tap delay line is configured to have 64 taps spaced at 625 ps delay intervals (at a frequency of 1.6 GHz which results in f c,max = 800 MHz). Our simulation can operate the main taps and the SC taps differently, but for the purposes of this experiment we do not differentiate between the two. The taps are non-negative and have a bit precision of 8, with a dynamic range from 0 to 1. The hardware parameters are shown in the last two columns of Table III, where the negative minimum delay (second-to-last row) helps cover the energy spread of the LoS scatterer.
The tap delay line is optimized using the objective defined by (48), and the tap coefficients are rounded to the nearest discrete tap weight defined by the bit precision. Effectively, the tap delay line generates the self-interference estimate, y(t) = w X and subtracts it from the self-interference to  (15). The simulation results shown in this section are averaged from 250 iterations.

B. Simulated Performance Compared to Theoretical Analysis
To verify that our theoretical approximations are accurate, we compare the average residual from the realistic simulation to the theoretical outer hull. At lower bandwidths, our tests confirm that the outer hull accurately predicts the performance, even though the reference signal is a random message and there are multi-path scatterers.
We compare the results of high and low bandwidth signals in Fig. 13. Clearly, the lower bandwidth follows the outer hull closely for the entire frequency range. At the higher bandwidth, the multi-path scatterers have a greater effect resulting in a poorer performance approximation. These are predictable results since higher bandwidth signals are more difficult to mitigate.

C. Performance Improvement From Algorithmic Development
Signal processing and optimization techniques can improve the performance past the outer hull approximation by breaking some of the assumptions made by the theoretical analysis. For example, to estimate the expected quantization residual (46), we assume the rounding error is uniform over the ∆ range defined in (39). Including an L1 penalty, λ > 0 in the objective function (48), promotes a sparse solution, which in turn causes redundant taps to be turned off, w ≈ 0. This reduces the variance due to quantization of the redundant taps Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.  and improves the performance when the bit precision is the limiting constraint. We test this hypothesis by including an L1 penalty in the objective, i.e. λ = 0.0001. When the bit precision is the limiting constraint in Fig. 14a, the residual with the L1 penalty outperforms the expected bit precision. The performance improves because the L1 penalty encourages fewer active taps, shown in Fig. 14. Therefore, including the L1 penalty in the objective results in a sparse solution which improves the performance and does not increase the objective complexity. The only additional cost is tuning the λ parameter.
An additional assumption made to produce the theoretical outer hull requires that the self-interference is a negative and misaligned scatterer, (47). In a real setting it may be possible to control the relative delay and sign of the receive signal so the LoS of the self-interference is the same sign (positive) and aligned with a tap delay (aligned). This would negate the effects of the non-negative and finite spacing constraints on the LoS channel tap that generally dominates the self-interference. The results in Fig. 15 compare the residual when controlling the LoS sign and alignment. The best performance occurs when the LoS is positive and aligned, and the L1 penalty is included to knock the residual down even further.

VII. CONCLUSION
In this work, we analyzed the effects of hardware constraints of an analog time-domain SIC circuit and characterized the performance as functions of carrier frequency and bandwidth. Our analysis demonstrated the independent effects of nonnegative taps, finite tap spacing, and bit precision combine to limit the achievable performance of the system, which was matched by simulated realistic scenarios. Using the insights obtained from our analysis, we proposed optimization and signal processing techniques to improve the performance by overcoming key aspects of the hardware constraints.
With realistic system configurations we demonstrated 45 dB mitigation for a wide bandwidth (200 MHz) compared to the carrier frequency (200 − 700 MHz).
This work was developed to guide and confirm the early design choices on a time-domain analog SIC chip. Our analysis studied the dominant hardware effects in the chip to ensure they did not unpredictably limit the performance.
Additionally, we provide the theoretical rationale that nonnegative tap weights can be effective for SIC. The chip has been designed, and our plan for the following papers is to validate the theoretical results with real hardware performance.