A Commutated-LC RF Broadband Delay Circuit

This article presents a commutated-inductor–capacitor (commutated-<inline-formula> <tex-math notation="LaTeX">$LC$ </tex-math></inline-formula>) or switched-<inline-formula> <tex-math notation="LaTeX">$LC$ </tex-math></inline-formula> circuit that acts as a radio frequency (RF) delay line. Thanks to its linear-periodically time-varying (LPTV) operation and fully passive implementation, it concurrently achieves long maximum delays, fine delay tuning steps, and wide instantaneous bandwidths while being low loss and highly linear. Unlike existing LPTV switched-capacitor broadband delays, the introduction of inductors in the proposed commutated-<inline-formula> <tex-math notation="LaTeX">$LC$ </tex-math></inline-formula> delay circuit provides a new degree of freedom, allowing it to operate at a much higher RF with a wider instantaneous bandwidth. A proof-of-concept prototype in a 65-nm CMOS process demonstrates a measured 1.3-GHz 3-dB bandwidth around a 4.3-GHz RF, i.e., a 30% fractional bandwidth, when clocked at 250 MHz. The measured maximum delay is 1.4 ns with a 23-dB loss and noise figure; this loss or noise is orders of magnitude lower compared with fully passive linear-time-invariant RF delay lines operating at a similar frequency with the same delay. The measured IIP3 is +16 dBm.

A Commutated-LC RF Broadband Delay Circuit

I. INTRODUCTION
A N INTEGRATED radio frequency (RF) delay line is a key building block for many existing and emerging broadband wireless circuits and systems, such as magnet-less circulators [1], broadband antenna arrays [2], [3], and full-duplex transceivers [4]- [6]. The fundamental challenge associated with an integrated RF delay line is to concurrently achieve a long delay, often on a nanosecond scale, a fine delay tuning step, and a wide instantaneous bandwidth while retaining high linearity, low noise, and reasonable dc power.
By creating quasi-electromagnetic-wave prorogation on chip, LC-based artificial transmission line is a common way of realizing integrated RF delay lines [2], [3]. However, to achieve programmable delays with high tuning resolution, these artificial transmission lines have to be divided into many unit cells, increasing chip area and insertion loss substantially. LC-based all-pass filter RF delay lines have also been reported for larger delay-bandwidth products and lower delay variation compared with artificial transmission lines [7], [8]. But, they still suffer from large chip areas and high insertion loss when both large delay spreads and small delay steps are needed. For area-efficient and low-loss RF delay generation and/or programming, active components have been included in LC-based delay circuits [3], [9]. However, the presence of active circuits results in degraded linearity performance when compared with an all-passive design (e.g., [7], [8]). Active G m -C circuits also allow compact inductor-less RF delay lines [10], [11]. However, these inductor-less delay lines have limited dynamic range due to the presence of active devices and are often limited to a sub-3-GHz RF due to parasitic effects.
Besides linear time-invariant (LTI) delay elements, linear periodically time-varying (LPTV) N-path switched-capacitor circuits have been reported for delay generation but they possess their own challenges. N-path RF filters with ≥10-ns delay have been demonstrated but have a very limited delaybandwidth product, resulting in narrowband operation [5], [12], [13]. Compared with an N-path filter, an RF switchedcapacitor sampler has a larger delay-bandwidth product hence supporting broad instantaneous bandwidth [4], [14]- [18]. However, an RF switched-capacitor sampler has high insertion loss when used in RF systems with resistive (e.g., 50 ) interfaces; this high insertion loss is due to the fact that the RC time constant is much smaller compared with the switch on-time in a sampler. Recently, a broadband switched-capacitor delay has been reported in [1] and [19] with the RC time constant similar to the switch on time. It demonstrates nanosecond-scale delays, fine delay tuning steps, broad fractional bandwidths, high linearity, and low noise at the same time. However, its maximum achievable delay is coupled with its RF, limiting its operation frequencies to low RF bands (e.g., below 1 GHz), when nanosecond delays are needed.
In this work, we present an N-path switched-inductorcapacitor (switched-LC) or a commutated-LC RF delay circuit, as shown in Fig. 1. It allows all-passive, compact, and low-loss RF delay elements with gigahertz-wide instantaneous bandwidths, nanosecond-scale maximum delays, and clockpath-defined fine delay tuning steps, at a beyond 3-GHz RF.
Unlike switched-capacitor broadband delay lines (e.g., [1], [4], [19], [20]), our proposed commutated-LC delay circuit decouples the maximum achievable delay with its RF, since the introduction of inductors provides a new degree of freedom to existing LPTV circuits. This allows it to operate at a much higher RF while preserving key features of an LPTV circuit. Compared with all-passive LC-based delay lines (e.g., [7], [8]), our proposed LPTV commutated-LC RF delay circuit is compact, low loss, and has fine delay resolution This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Eight-path commutated-LC circuit with its clock switching waveforms. thanks to the clock-path-defined delay and the non-reciprocal operation. Moreover, our proposed commutated-LC RF delay circuit allows clock-path-based bandwidth control in addition to the delay tuning.
A qualitative comparison of the characteristics of the aforementioned tunable delays is given in Fig. 2. Our proposed commutated-LC delay concurrently achieves high linearity, long delay, and wide bandwidth at beyond-3-GHz RF with moderate loss and area.
We have designed and fabricated a 65-nm CMOS proof-ofconcept prototype. In measurement, it demonstrates a 1.3-GHz 3-dB bandwidth around a 4.3-GHz RF, i.e., a 30% fractional bandwidth, when clocked at 250 MHz. The measured maximum delay is 1.4 ns with 23-dB loss and noise figure; this loss or noise is orders-of-magnitude lower compared with fully passive LTI RF delay lines operating at a similar frequency with the same delay [7], [8]. The measured IIP3 is +16 dBm.
This article is organized as follows. In Section II, we propose an approximated fundamental transfer function for commutated-LC circuits. In addition, we briefly review commutated-LC N-path filter circuits. Section III describes the proposed commutated-LC broadband RF delay circuit. Implementation details are discussed in Section IV. Finally, measurement results are presented in Section V, and Section VI concludes this article.

II. COMMUTATED-LC CIRCUITS: TRANSFER FUNCTION AND N -PATH FILTERS
In this section, we propose an approximated fundamental transfer function for commutated-LC circuits.  Fig. 1 In addition, we briefly review the case when a commutated-LC circuit operates as an N-path filter (e.g., [21]- [23]), showing its limited delay-bandwidth product and incapability of constant-bandwidth delay tuning.
The N-path commutated-LC circuit in Fig. 1 acts as an N-path filter when the bandwidth of the RLC circuit that consists of the LC load and the source resistance R 0 = R 0 + R ON is narrow; that is (2π R 0 ||R L C) −1 being small. We will quantify this condition shortly.
Similar to the behavior of a switched-capacitor N-path filter [12], an N-path commutated-LC filter creates many narrow passbands around f R + k f C , where f R is the LC resonance frequency, f C is the commutation or switching frequency, and k is an integer [24], [25]. Intuitively, the periodic switching operation translates the RLC load narrow bandpass frequency response to the switching frequency harmonics [12].
Consider a design example as shown in Fig. 1 with C = 2.5 pF, f R = 1/(2π √ LC) = 5 GHz, R 0 = 50 , R ON = 5 , R L → ∞, f C = 1.5 GHz, and T d = T C /2. The simulated fundamental transfer function 1 is given in Fig. 3, showing narrow passbands around f R + k f C and is similar to that of an N-path switched-capacitor filter [12].
Let us quantify this transfer function. Deriving a precise one using an LPTV analysis akin to that in [26] is involved due to the presence of additional inductors. Interestingly, we find that the transfer function can be approximated by the N-path switched-capacitor circuit transfer function; for example, the one given in [27, eq. (15)] with a frequency shift. Our proposed approximated transfer function is given as where N is the number of paths, Unlike in a switched-capacitor circuit where the transfer function is exact, (1) is an approximated transfer function, and the approximation holds only when the RLC circuit is deeply underdamped; in other words, the RLC circuit satisfies where α is the damping factor [28]. As α/ω R increases and approaches 1, the peak of the frequency response of a commutated-LC circuit shifts to lower frequencies compared to f R , reducing the accuracy of our model. In addition, as we will see in Section III-A, the maximum achievable delay and delay spread are inversely proportional to α. Therefore, a small α is also needed for large delays. The rationale behind (1) can be shown using a comparison between a commutated-LC and a commutated-capacitor circuit. Let us consider an internal voltage across one of the LC or capacitor loads as the output equals to a weighted sum of the internal node voltages as V P2 (t) = 8 n=1 V n (t) · sw n,B (t) (see Fig. 1). As in Fig. 4, only a single path is needed in this case given memory-less source and load impedance R 0 and ideal non-overlapping 12.5%-duty-cycle clocks [26]. Without loss of generality, the first path is considered.
For the switched-LC circuit in Fig. 4 The damping factor α = [2R 0 ||R L C] −1 = 1.8 G rad/s is much less than 2π f R = 31.4 G rad/s; hence, the RLC circuit is underdamped when the input or output switch is closed.
Since it is underdamped and we have α ω R , the voltage across the LC load is proportional to e −αt sin(ω d t + φ), where φ is a constant and ω d = (ω 2 R − α 2 ) 1/2 ≈ ω R is damped natural angular frequency [28]. When both switches are open, the LC load holds the stored energy just like a capacitor, but in both the magnetic and electric fields. We will consider lossy inductors in Section III.
Removing the inductor and doubling the capacitance, we arrive at the switched-capacitor circuit in Fig. 4(b). As can be seen in Fig. 4(d), the internal voltage V 1 here is essentially the envelope of that in the switched-LC circuit in Fig. 4(a). Now, we can say that (1) is based on the fact that the output voltage has the form of e −αt sin(ω R t + φ), a product of a negative exponential term and a sine term when any of the output switches in Fig. 1 is on.
The negative exponential e −αt signifies a damped waveform with a time constant 1/α = 2R 0 ||R L C. This negative exponential is identical to the natural response of an RC circuit with a time constant τ RC = 2R 0 C = 1/2π/ f RC . This effective time constant τ RC corresponds to the effective RC frequency (1). The sine term signifies a frequency shift of f R in the transfer function given in (1).
The calculated transfer function of the aforementioned design example using (1) is plotted in Fig. 3, showing a good match with the simulation.
Let us define the ratio between the switch-on time and the time constant as similar to that in a switched-capacitor circuit [26] ≡ When is small and π/2, the commutated-LC circuit acts as an N-path filter. We will discuss the choice of π/2 in Section III. For example, in Fig. 3, we have = 0.3.
Like in an N-path switched-capacitor filter [26], [27], it can be shown that the commutated-LC filter bandwidth and peak group delay around f R can be written as follows: Simulated commutated-LC filter close-in fundamental transfer functions are plotted in Fig. 5 with varying capacitance values. As C increases from 1.25 to 2.5 pF and to 5 pF, the simulated bandwidth reduces from 0.62 to 0.3 GHz and to 0.14 GHz, and the simulated peak group delay increases from 0.6 to 1.2 ns and to 2.2 ns. These simulated bandwidths and peak group delays show a good match between those calculated using (4) and (5).
Equations (4) and (5) show that the delay-bandwidth product of a commutated-LC filter is limited to 1/π. Unsurprisingly, an N-path switched-capacitor filter also has a delay-bandwidth limit of 1/π [1], [29]. Moreover, in an N-path-filter-based delay circuit, constant-bandwidth delay-tuning is impossible as the delay-bandwidth product is always 1/π.

III. COMMUTATED-LC BROADBAND DELAYS
The N-path commutated-LC circuit in Fig. 1 acts as broadband delay when ≈ π/2. We show that this choice results in a maximum delay-bandwidth of around 4 in an eight-path single-ended design-an order-of-magnitude improvement compared with that of an N-path filter.
As will be discussed momentarily, we choose = π/2 as it corresponds to the case, where the effective sampling frequency N f C is twice as large as the RLC tank 3-dB bandwidth, i.e., We also study the impact of on-chip lossy inductors. The proposed commutated-LC delay achieves its low loss and compactness by moving delay tuning from the signal path to the clock path similar to switched-capacitor delay circuits [1], [15]- [17], and by allowing delay non-reciprocity.
Finally, we investigate the effects of switch parasitic and clock overlap on the performance of our commutated-LC broadband delay circuit.

A. Broadband Delay With Independent Clock-Path Bandwidth and Delay Tuning
The broadband delay operation of the N-path commutated-LC circuit can be intuitively explained in the time domain.
As discussed in Section II, a commutated-LC circuit acts like a switched-capacitor circuit but with a frequency shift f R and an effective time constant τ RC = 2R 0 ||R L C = 1/2π/ f RC . Unlike the existing broadband N-path switchedcapacitor delay circuits that have frequency responses centered around dc [1], [4], [20], introducing an inductor to each path allows our proposed delay circuit to operate at a much higher frequency around f R . Fundamentally, the introduction of magnetic fields provides a new degree of freedom, breaking the limits of switched-capacitor circuits.
When ≈ π/2, the time constant τ RC is similar to input switch-on duration T C /N based on (3). Therefore, the voltage envelope of each LC tank in the commutated-LC circuit tracks that of the input during each switching period (see Fig. 6), instead of storing the average input over many periods when π/2 as in an N-path filter. Together, the N paths in the commutated-LC circuit operate like a time-interleaved "slow" sampler that faintly tracks the input with an effective sampling frequency of N f C . By storing the sampled energies at the LC tanks and releasing them to output at a clock-defined time, a true time delay can be introduced to the signal (see Fig. 6). Based on the sampling theorem, the effective sampling frequency N f C is chosen to be twice as large as the RLC tank 3-dB bandwidth, resulting in = π/2.
Quantitatively, the transfer function of the proposed broadband commutated-LC circuit can be evaluated using (1) with a ≈ π/2. As we discussed in Section II, (1) describes a commutated-LC circuit as long as it is deeply underdamped, i.e., α ω R , regardless of . That said, one should not solely rely on reducing the time constant τ RC = 1/α for a large = π/2 as this would result in an elevated α, reducing the maximum achievable delay and the delay spread based on (9).
when input-side switch sw 1,F is closed. By storing the sampled energies at the LC tanks and releasing them to output at a clock-defined time, a delay can be introduced to the output signal v P2 (t).
Commutated-LC delay circuit ( Fig. 1) simulation with varying is shown in Fig. 7. As increases from π/10 to π/2, the transfer function magnitude and group delay no longer resemble a comb filter that is seen in N-path filters; instead, they are flattened across multiple local oscillator (LO) harmonics, supporting a broadband operation centered around f R . Calculated transfer function magnitudes using (1) are plotted in Fig. 7; the simulation and calculation results match well with each other regardless of choices.
Evaluating the magnitude of the transfer function given in (1) at discrete frequencies f = f R + k f C with k being an integer results in Based on (6), we know the transfer function magnitude peaks at (1/ ) · (R 0 /R 0 ) when k = 0. Evaluating (6) numerically with = π/2, the magnitude drops by 3 dB when |k| > N/(2π); this gives us the lower bound on the 3-dB bandwidth. In addition, since our commutated-LC circuit acts as a sampler with sampling frequency of N f C , the one-sided signal bandwidth is bounded by N/2 f C based on the Nyquist-Shannon sampling theorem. Given that the RF bandwidth corresponds to the two-sided baseband (BB) bandwidth, the RF 3-dB bandwidth upper bound is N f C . Therefore, the delay line RF bandwidth is bounded based on Similar to switched-capacitor RF broadband delay circuits [1], [20], the delay is clock-path defined and where τ delay,21 is the delay when the signal travels from the port 1 to port 2, T d is the clock-path delay as defined in Fig. 1, and τ delay,12 is the delay when the signal travels in the opposite direction. The magnitude response in (6) applies to signals traveling in either direction. This is because the transfer function magnitude is independent of delay assuming lossless LC tanks and no clock overlapping. The signal path delay τ delay,21 and τ delay,12 are approximately the same and smaller compared with the clock-path delay T d and T C − T d , respectively; the difference between the signal-and clock-path delays is due to the fact that the RLC circuit or the signal delaying component in our design is dispersive. As shown in [1], replacing each LC tank with a dispersion-less ideal transmission line makes the signal-path delay constant and exactly the same as the clock-path delay; however, on-chip transmission lines are very bulky and lossy, and hence, capacitors are used instead in [1].
Equation (8) makes sense intuitively as we discussed earlier-a time delay can be introduced to the signal (see Fig. 6) by storing the sampled energies at the LC tanks and releasing them to output at a clock-defined time. Equation (8) also indicates the non-reciprocal operation of the LPTV delay line.
The maximum achievable delay is set by the switching period T C due to the periodic nature of the clocks. For example, looking at the clock waveforms in Fig. 1, a clock-path delay of 1.5T C cannot be distinguished from a 0.5T C delay. In addition, since clock overlapping between the delay line Commutated-LC delay circuit simulation with varying delays two ports is avoided, the maximum and minimum achievable delays are where we have substituted T C /N with /α based on (3). Therefore, given a fixed ≈ π/2, a small α or a deeply underdamped LC circuit results in large maximum delay τ delay,max or delay spread τ delay,max − τ delay,min . Simulation results of the commutated-LC delay circuit with varying delays T d = T C /2 and T d = 7T C /8 are given in Fig. 8 The simulated 3-dB bandwidth is 2.4 GHz regardless of T d and is larger than 8/2 × 0.5 = 2 GHz as expected based on (7). When T d = T C /2 = 1 ns, the simulated average group delays in both directions are 0.9 ns with around a ±0.1-ns ripple. When T d = 7T C /8 = 1.75 ns, the simulated average group delay is 1.7 ns in the forward direction and is 0.2 ns in the other; the delay ripple is around ±0.1 ns in both directions. These delay simulations show good match with those predicted by (8) and (9).
Despite sharing the same schematic, the proposed commutated-LC broadband delay is fundamentally different from the N-path-filter-based delay element that we discussed in Section II, considering (4), (5), and (7)- (9). First, the maximum delay-bandwidth product is around 4 in an eightpath design, corresponding to an order-of-magnitude improvement. Second, the bandwidth and delay tunings are both relocated from the signal path to the clock path akin to that in a switched-capacitor sampler [1], [15]- [17]. Third, the bandwidth and delay tunings are de-coupled and controlled by the clock frequency and clock-path delay, respectively, allowing constant-bandwidth delay tuning. Finally, the proposed commutated-LC broadband delay exhibits delay nonreciprocity. It should be noted that while an N-path filter has clock-path-defined phase tuning [29], it relies on changing the signal-path RC time constant for delay control based on (5). In addition, an N-path filter is non-reciprocal in its phase but does not delay [30].
When compared with LTI all-passive LC-based delay circuits (e.g., [7], [8]), the clock-path delay control, similar to that in switched-capacitor delay circuits [1], [14], [20], enables compact, low-loss, and high-resolution delay tuning as it avoids having many series LC sections that are needed to achieve fine delay resolution. The delay non-reciprocity also helps to further reduce the delay line size by half compared with its LTI reciprocal counterpart as discussed under the context of a switched-capacitor delay line design in [1]. Intuitively, this is because the total delay from both propagation directions in an LPTV non-reciprocal delay line is half of that in an LTI reciprocal counterpart. In many applications, such as self-interference cancellation and beam-forming systems, only a one-way delay is needed at a time.

B. Lossy Inductors
Here, we study the impact of lossy on-chip inductors on delay line performance, including insertion loss, noise, bandwidth, and delay generation.
As shown in Fig. 1, we model the inductor loss via an equivalent shunt resistor R L . Assuming R L R 0 , the capacitance can be calculated using (3) as C = T C /(2N R 0 ). Then, based on f R = 1/2π/ √ LC, the inductance is Given = π/2, f R = 5 GHz, N = 8, T C = 2 ns, and R 0 = 55 , L is calculated to be 0.7 nH using (10). In addition, R L = L · Q L ω R is 219 assuming Q L = 10 at 5 GHz. In this case, R L is only four times larger compared with R 0 . Hence, R L ||R 0 is noticeably smaller than R 0 , resulting in a higher = 1.9 based on (3). Nevertheless, we find that the Fig. 10. Envelope of the internal voltage v 1 (t) given in (11) and (12). delay circuit performance largely remains unchanged so long as ≈ π/2 and keep = 1.9 in our example design here.
First, let us find the in-band input impedance and insertion loss with lossy inductors. Given GHz, T d = T C /2, and Q L = 10, the simulated steady-state internal node voltage v 1 (t) of the commutated-LC circuit (Fig. 1) is plotted in Fig. 9. Since α 2 ω 2 R , we can write the internal node voltage of the i th path v i (t) in Fig. 1 as where i is the path index, i.e., an integer from 1 to 8, and e i (t) is a periodic function that captures the changing envelope. Each e i (t) period can be divided into four phases, corresponding to four windowing functions, w i,a , w i,b , w i,c , and w i,d , as shown in Fig. 10. It starts when the input-side switch sw i,F changes from open to close, the RLC circuit begins to faintly tracks the source signal during w i,a (t) = sw i,F (t). In the second phase w i,b , both input and output side switches are open, the voltage across the RLC circuit starts to damp out with a smaller damping factor α L = 1/(2R L C) compared with α = 1/(2R 0 ||R L C). During the third phase w i,c (t) = sw i,B (t), the output side switch sw i,B is closed, the energy stored in RLC circuit is slowly released to the output with the damping factor α. Finally, the RLC circuit continues its damping in the w i,d with α L .
Assuming the remaining energy at the end of the each e i (t) period is negligible given lossy on-chip inductors and long delays, the periodic envelope function e i (t) can be approximated as where (11) and (12), the commutated-LC delay input impedance and insertion loss with lossy on-chip inductors at f R are readily to be calculated.

Now, given v i (t) is known based on
Input voltage v P1 (t) in Fig. 1 periodically rotates among all the internal voltages, and hence, can be expressed as v The input impedance is the ratio between the input voltage and the input current as Similarly, output voltage v P2 (t) in Fig. 1 can be expressed Based on (15), the gain from V S to V P2 at f R is where α L = 1/(2R L C) and H 0 ( f ) is the transfer function in (1). Similarly, if the source voltage is moved from port 1 to port 2 in Fig. 1, and based on (8), the gain becomes Regarding the transfer function across frequencies, we expect that it should be similar to that with lossless inductors given in (1). This is because the output always connects to one of the LC tanks as , and the RLC circuit is nearly the same with or without inductor loss so long as R L is significantly larger compared with R 0 . Therefore, a generalized transfer function that considers inductor losses can be written as From (18), we see that H 0,Gen,21 ( f ) or H 0,Gen,12 ( f ) is now T d dependent, unlike the transfer function (1) assuming lossless inductors. This makes sense intuitively. When inputand output-side switches are off, the energy held by a lossless LC tank never changes. However, for a lossy LC tank, more energy is damped out with a longer delay.
Since the sum of delays in both directions is T C regardless of T d [see (8)], the transfer functions sum is independent of T d . In addition, the generated delays are governed by (8) and (9). Finally, the noise figure of the delay line is dominated by the inductor loss, and hence, in-band noise factor at f R can be approximated as F 21 ≈ G −1 21 and F 12 ≈ G −1 12 . Simulated delay line performance with lossy inductors across different T d using Q L = 10, R 0 = 55 , f C = 0.5 GHz, C = 1.45 pF, f R = 1/(2π √ LC) = 5 GHz, and = 1.9 is given in Fig. 11. The corresponding calculated results using (8), (14), (16), and (18) are also plotted using dashed lines across frequencies or markers at f R , showing a reasonable match. The delay error, i.e., the difference between T d and S21 delay, is due to the dispersive LC delay units as discussed in Section III-A (8).

C. Switch Parasitic
Switch on-resistance has been modeled as R ON as in Fig. 1. It introduces excess loss through the resistive division term R 0 /(R 0 + R ON ) in (1). While a small R ON reduces this loss, an overly small on-resistance results in large transistor size, and hence, not only power-hungry clock buffers but also additional shunting parasitic capacitance. Switch shunting parasitic capacitance introduces extra loss and frequency shift.
We model switch parasitic capacitance as a shunting capacitance C P , as shown in Fig. 12. Similar to that in an N-path filter circuit [31], C P causes the peak of magnitude response to shifting toward lower frequencies due to harmonic effects. We expect the amount of frequency shifts depends on the commutation frequency f C as f C determines harmonic frequencies. C P also introduces additional losses as the parasitic Fig. 12. Commutated-LC circuit with switch parasitics C P , source inductance L W , and clock overlaps D. Fig. 13. Simulation of the commutated-LC circuit in Fig. 11 with T d = 7/8T C and the C P , L W in Fig. 12. Switch parasitic C P causes excess loss and shifts the peak of magnitude response toward lower frequencies. A series inductor L W resonating with C P only marginally restores the performance. capacitance shunts harmonic signals to ground [32]. As in Fig. 13, compared with the design example shown in Fig. 11 with purely resistive source impedance, adding a C P of 0.6 pF, which corresponds to the extracted switch parasitic capacitance in our implementation as discussed in Section IV, causes the peak to shift from 5 to 4.2 GHz and additional loss of 1 dB.
An additional inductor L W may be introduced to resonate with C P . However, unlike in an LTI network, the shunting effect of C P in an LPTV circuit includes signals across multiple harmonic frequencies [32]. Since an inductor can only resonate with C P at one frequency, the shunting effect remains at other harmonic frequencies. Therefore, the performance can be only marginally improved when including an inductor L W . As seen in Fig. 13, adding an inductor L W slightly changes the peak frequency and the in-band loss. Noted that it has been shown in [31] that the introduction of the additional inductor can restore the peak frequency to f R . However, the N-path mixers and filters studied in [31] operate in the large-timeconstant regime with a small ; in the frequency domain, a small means that the passband bandwidth is much smaller compared with the commutation frequency. Analysis of the effect of source inductance on commutated-LC broadband delay with a moderate , i.e., when passband bandwidth is significantly larger than the commutation frequency, could be an interesting future research topic.
Given the noticeable peak-frequency shift due to switch parasitic capacitance C P , a natural question arises is that up Compensating C P -induced frequency shift by adjusting LC resonance frequency f R : simulation of the commutated-LC circuit in Fig. 11 with different f R using T d = 7/8T C , L W = 0, and C P = 0.6 pF. LC tank capacitance remains nearly the same for a fixed across different f R .
to what frequency can the parasitic-capacitance-free analyses in Sections III-A and III-B provide useful insights? Similar to [32] that studies small-passive mixers, we define a cutoff frequency ω T for our moderate-commutated-LC broadband delay circuit. We assume that the RF-port parasitic capacitance can be ignored for simplicity if RF is below ω T given as where C P = NC SW , C SW is the parasitic capacitance for each switch, N is the number of paths, and τ SW = R ON C SW is a technology constant that reduces with CMOS scaling. ω T defines the frequency beyond which the delay loses more signal to parasitic capacitance C P than it passes the commutating switches. We have assumed that the input impedance looking into the delay circuit from C P is R 0 . This input impedance R 0 is in shunt with source resistance R 0 , resulting in the factor of 1/2 in (19). Given a τ SW of 375 fs in 65-nm CMOS, N = 8, R 0 = 50 ohm, and R ON = 5 ohm, the cutoff frequency is calculated as 10 GHz based on (19). The C P -induced frequency shift can be compensated by adjusting LC resonance frequency f R . As our simulation shown in Fig. 14, in the presence of C P (no L W ), the peak-frequency can be restored to 5 GHz by reducing inductance or increasing f R = 1/(2π √ LC) to 6 GHz. Capacitance remains nearly the same for a fixed . The added loss is dominated by the inductor as its parallel resistance R L = Q L (L/C) 1/2 reduces and shunts away more signal power with a smaller L.

D. Clock Overlap
Similar to N-path filters and mixer-first receivers [32], [33], our proposed commutated-LC delay circuit is prone to clock overlaps. Clock overlaps increase insertion loss and reduce delay spread. The loss increase and S11/S22 degradation due to overlaps have been studied within the context of mixer-first receivers and N-path filters [33]. The overlaps can be modeled Fig. 15. Simulation results of the commutated-LC circuit with clock overlaps using the same design parameters as those in Fig. 11. Clock overlaps of D = 1.8% increase insertion loss and reduce delay spread.
as an additional shunting impedance in an LTI model [32]. Given a normalized clock overlap D (see Fig. 12), the change in the maximum and minimum achievable delays can be expressed as From (20), we see that the delay spread is changed by τ delay,max − τ delay,min = −2 D · T C , while the sum of delays in both directions remains the same as τ delay,max + τ delay,min = 0. Simulation results of the commutated-LC circuit with clock overlaps are shown in Fig. 15, using the same design parameters as those in Fig. 11. When configured in the maximum delay setting T d = 7/8T C , clock overlaps result in direct feed-through from the input to the output, causing significant ripples in delay line frequency responses and unexpected delay reduction. After modifying the maximum delay with τ delay,max , i.e., using T d = (7/8 − D)T C , the large ripples disappear and the delay restores to the expected value. Besides reducing delay spread by 2 D · T C , the overlaps lead to 4-dB higher loss. Small rise, fall time, and duty-cycle control [34], [35] may be used to reduce the clock overlaps.

E. Summary
The N-path commutated-LC circuit in Fig. 1 acts as a broadband delay when ≈ π/2. The damping factor α given in (2) is designed to be much smaller than the resonance frequency ω R for a large delay spread as in (9).
Considering inductor losses and assuming the entire commutated-LC circuit has no memory component except for its RLC loads and its clocks have no overlap as in Fig. 1, the generalized transfer functions are given in (18). The in-band noise factors in both directions are F 21 ≈ G −1 21 and F 12 ≈ G −1 12 , where G 21 and G 12 are given in (16) and (17), respectively. The signal-path delay can be approximated by (8). Finally, the delay line bandwidth is given in (7), while the maximum and minimum achievable delays are calculated using (9). Transistor parasitic resistance and capacitance result in additional loss and cause a frequency downshift in the delay circuit transfer function. We have defined a cutoff frequency in (19) for our commutated-LC broadband delay circuit below which the RF-port switch parasitic capacitance may be ignored for simplicity.
As discussed in Section III-D, clock overlaps increase insertion loss and reduce delay spread by 2 D · T C , where D is the normalized clock overlap.

IV. IMPLEMENTATION DETAILS
We devised a proof-of-concept commutated-LC broadband RF delay circuit in a standard 65-nm CMOS process. The schematics are shown in Figs. 16 and 17.
A differential implementation is utilized instead of a singleended realization. This way, we reduce the number of inductors from 8 to 4, saving chip area and, more importantly, allowing a layout where the RF and clock signals can be easily routed to the switches at the center (see Fig. 18). The differential RF ports 1 and 2 both have an impedance of 100 obtained from a wideband 180 • hybrid coupler (KRYTAR 4020180). Compared with its single-ended equivalent, the differential delay lines have the same performance, including insertion loss, bandwidth, and clock-path-defined time delay. The area and layout benefit of a differential implementation  comes with the expense of halving the maximum achievable delay. Intuitively, this is because each LC tank connects to the source or loads twice as frequently as it does in the single-ended implementation. For example, the first LC tank in the single-ended implementation (see Fig. 1) connects to the source only when sw 1,F is on; in the differential version (see Fig. 16), the first LC tank sees the source when either sw 1,F or sw 5,F is on. This effectively doubles the switching frequency or halves T C , and hence, halves the maximum achievable delay based on (9). Thus, the expected maximum delay-bandwidth product is reduced from 4 to around 2 but it is still nearly seven times larger compared with that in an N-path-filter-based delay line.
Delay line switches are realized using nMOS transistors that are designed to have an on-resistance of 5 , as shown in Fig. 16. While having a smaller R ON leads to a lower loss and noise figure, it comes with a larger switch size that introduces additional harmonic losses [32] and requires higher clock path dc power.
The inductor is realized using the top thick metal and has an electromagnetic-simulated inductance of 684 pH and a quality factor of 12 at 5 GHz. This corresponds to an inductor equivalent shunt resistance R L of 260 . A metal-insulatormetal (MIM) capacitor is used at each LC tank and has a capacitance of 1.5 pF to resonate with the inductor at 5 GHz.
Two on-chip eight-phase clock or LO generation circuits, akin to that used in [36], produce 12.5% non-overlapping pulses which drive the input-and output-side switches, respectively. Each LO generator has a differential input at four times the commutation frequency 4 × f C (see Fig. 16). As in Fig. 17, each input port has a 50-interface and an inverter chain that converts the input sinusoid into a 50% duty-cycle clock. Then, a four-stage differential Johnson counter using the high-speed latch in [37] acts as a divide-by-4 frequency divider, generating eight-phase 50% clocks at f C . Finally, each 12.5% non-overlapping LO pulse is generated using a NOR gate and an AND gate from two of the eight-phase 50% clocks and one further buffered input clock.
The clock path delay is generated off-chip in our proofof-concept prototype but it can be implemented on-chip (e.g., [29], [38]). In [38], a digitally controlled delay with a resolution of 1 ps has been demonstrated.

V. EXPERIMENTAL RESULTS
The proposed commutated-LC broadband RF delay circuit has been fabricated in a 65-nm CMOS process. Annotated chip photograph is shown in Fig. 18 with a core area of 1.2 mm 2 .
The chip is assembled in a 4-mm 20-pin quad-flat-no-Lead (QFN) package and mounted on an FR4 printed-circuit board. Measurement results in this article are referred to as the QFN package input and output.
Given the all-passive implementation, the delay line signal path draws zero dc power. The two CMOS-logic-based LO generation circuits together have a dc power of 26 mW from a 1.2-V supply when f C = 250 MHz.

A. Fixed Commutation Frequency f C
With a fixed commutation frequency f C = 250 MHz, can be calculated as = 1/N/ f C /(2R 0 ||R L C) ≈ π/2, and the measured delay line S-parameters in forward and backward directions are plotted in Fig. 19(a) and (b), respectively, across different LO-path delay T d between the maximum and minimum achievable delays.
In both directions, the delay line frequency responses are centered around 4.3 GHz. This is lower than the 5-GHz LC resonance frequency due to the parasitic effects that we discussed in Section III-C. The delay line here is a proofof-concept demonstration of broadband long delays beyond 3-GHz RF and is not designed for any specific wireless standard. When used in a specific wireless system, the delay line center frequency can be adjusted by changing the LC resonance frequency f R as detailed in Section III.
The measured 3-dB transmission bandwidth is 1.3 GHz from 3.6 to 4.9 GHz, so is the −10-dB reflection bandwidth. As expected, this bandwidth is between 8 × f C = 2 GHz and 4 × f C = 1 GHz based on (7). The average in-band transmission group delay varies from 0.5 to 1.35 ns, resulting in a maximum delay-bandwidth product of 1.8. The measured delay spread of 0.85 ns is smaller compared with our expectation which can be calculated using (9) as 1/4T C = 1 ns. As we discussed in Section III-C, LO overlaps result in this reduced delay spread. Based on (20), a normalized clock overlap of 1.9% could reduce the 1-ns delay spread to 0.85 ns in our measurement.
The measured transmission loss is 10 dB with 0.5 ns delay and is increased to 23 dB when delay becomes 1.35 ns. The loss difference between 0.5-and 1.35-ns delays can be calculated based on (16) as e −α L T d = 12 dB, showing a good matching with that in measurement.  Based on our discussion in Sections III and IV, regardless of the T d setting, the sum of transmission in both directions should be a constant and the sum of their group delay should be close to and slightly smaller than T C /2=2 ns. The plots in Fig. 19(c) validate our theory. In addition, the simulated results are plotted in dashed lines.
Measured delay line in-band delay variation and normalized variation across in-band average group delays are plotted in Fig. 20. Given a T d setting, the delay variation is calculated as the absolute difference between the maximum or minimum in-band delay and the average in-band delay. Dividing the delay variation with the corresponding in-band average group delay gives us the normalized delay variation. The measured variation here is similar to those in a switched-capacitor broadband delay circuit (e.g., [1]).

B. Varying Commutation Frequency f C
Measured delay line S-parameters with varying f C are plotted in Fig. 21.
As shown in Fig. 21(a), when f C = 400 MHz, the 3-dB transmission bandwidth is about 1.9 GHz from 3.1 to 5 GHz across all measured T d settings. The measured average in-band delay is from 0.28 to 0.9 ns. Compared with those obtained using f C = 250 MHz, the bandwidth and delays are scaled up and down, respectively, by a factor of about 400/250 as expected based on (7), (8), and discussion in Section IV.
Similarly, when f C = 125 MHz, we expect the bandwidth and delays are scaled up and down, respectively, by 125/250. As shown in Fig. 21(b), the measured bandwidth of 0.7 GHz (3.8 to 4.5 GHz) and delay of 0.55 to 2.15 ns meet our expectations.
To further illustrate the impact of varying f C on delay line performance, the sum of the measured two-direction transmission magnitudes and group delays are plotted in Fig. 21 (c) and (d), respectively. As we can see, the bandwidth increases with f C , while the delay reduces, maintaining a twodirection delay-bandwidth product of around 2.4 (2.2 to 2.5).
The measured delay line in-band noise figures at 4-GHz RF across T d and f C are plotted in Fig. 22. When using the same T d and f C , the noise figure is almost identical to the insertion loss, as we discussed in Section III-B.

C. Spurious Tones
As shown in Fig. 23, the spurious tones responses of the proposed commutated-LC broadband delay circuit are measured with a −18-dBm sinusoidal input at f S = 4 GHz. The delay line is clocked at f C = 250 MHz, and the measurements are performed with T d ≈ 0.5 and 1.4 ns.
In each case, we see that the 4-GHz output has an expected strength based on the measured S-parameters in Fig. 19. Across a wide frequency range of 0.1 to 6.5 GHz, the strongest spurious tones are located at f S ± N f C which are 10-to-16-dB lower compared with the desired output at f S . These spurious tones at f S ± N f C are due to the time-interleaved sampling operation of the delay line with a sampling frequency of N f C , as we discussed in Section III. Finally, it should be noted that these spurious tones at f S ± N f C are outside of the delay line bandwidth (3.6-4.9 GHz), and hence, can be filtered later.
Similar to switched-capacitor circuits (e.g., [1], [20]), the imperfection of LO signals leads to some in-band spurious   tones. However, these are at least 30-dB below the desired signal, as shown in Fig. 23. LO leakages at fundamental frequency f C and its harmonics are around −60 dBm.

D. Linearity
The measured linearity performance is plotted in Fig. 24. The delay line is clocked at f C = 250 MHz with the testing input signal at around 4 GHz. The measured inputreferred 1-dB gain compression point (IP1dB) is +1 dBm, which is similar to other LPTV circuits implemented using 65-nm CMOS processes (e.g., [24]). In addition, this gain compression point is independent of the delay setting T d as the linearity is input-side limited. The measured IIP3 is +16 dBm.

E. Performance Summary and Comparison
A summary of our measurement results is given in Table I together with those from state-of-the-art integrated RF delay line designs. They are divided into three categories-active LTI, passive LTI, and passive LPTV.
Compared with the LPTV switched-capacitor-based RF and BB delay lines, the proposed commutated-LC delay line has a much wider instantaneous RF bandwidth and operates at a center frequency that is more than ten times higher. This is achieved by introducing inductors to the LPTV operation. The utilization of inductors leads to noticeable increases in chip area and noise figure (or power loss) when normalized to the delay. Developments of low-loss and compact commutated-LC broadband delay circuits could be important future research topics for large antenna arrays and self-interference cancellation that have a stringent loss and area constraints.
However, given the same delay, the commutated-LC delay has orders-of-magnitude lower normalized loss or noise and a much smaller area per delay compared with fully passive LTI RF delay lines ( [7], [8]) that also uses inductors. This is due to the elimination of signal-path delay tuning and non-reciprocal delay operation. When implemented using similar CMOS technology, the reconfigurable passive LTI RF delay lines handle stronger signals as their static switches are  I   MEASUREMENT SUMMARY AND COMPARISON WITH STATE-OF-THE-ART INTEGRATED RF DELAY LINES more linear compared with dynamic switches used in LPTV designs.
The proposed commutated-LC delay has an 11-to-29-dB higher gain compression point compared with active LTI delay lines in [3], [10], and [11] thanks to the absence of active transistors.
The 31% fractional bandwidth of our proposed delay line is relatively small compared with other works in Table I. However, it is still about an order of magnitude larger than that in a typical phase-shifter-based narrowband RF frontend. In addition, the fractional bandwidth can be increased if needed by increasing the switching frequency f C as demonstrated in Fig. 21(c) at the expense of reduced delay spread.
In sum, our proposed commutated-LC delay concurrently achieves nanosecond-scale long delay, gigahertz-wide instantaneous bandwidth at beyond-3-GHz RF with high linearity and moderate loss and area.

VI. CONCLUSION
We have shown that introducing inductors, essentially adding a new degree of freedom, to an LPTV broadband delay circuit allows it to operate at a much higher frequency with a wider instantaneous bandwidth. We have derived the insertion loss, bandwidth, group delay, and noise performance of a commutated-LC delay circuit, providing various design insights and guidelines. We have devised a 65-nm CMOS proof-of-concept prototype, demonstrating a 1.3-GHz 3-dB bandwidth around a 4.3-GHz RF, 1.4-ns maximum delay, 23-dB loss, and an IIP3 of +16 dBm.