A Single Ring-Oscillator-Based Test Structure for Timing Characterization of Dynamic Circuit

Dynamic circuit is extensively used where high speed is required such as critical paths in digital and analog/mixed-signal circuits, yet it has not been successfully characterized and integrated into the digital design flows as its characterization is not straightforward due to its complex timing requirement. In this article, we first propose a step-by-step definition of dynamic circuit’s timing parameters required in its timing characterization. We then propose a single ring-oscillator-based test structure for the on-chip measurement of all these defined timing parameters. From the single output of a ring oscillator, this test structure can efficiently extract the timing parameters, including delay and setup/hold times under different conditions of input signal transition time as well as output load capacitance, which is sufficient for the construction of liberty files in the standard-cell library. With the proposed test structure, the setup/hold constraints are examined reiteratively by the ring under actual operating environment. Thus, the worst case result can be obtained and the error rate of the circuit can be estimated.


I. INTRODUCTION
D YNAMIC circuit, proposed several decades ago [1], aims at higher speed and less occupation area by utilizing two operation phases, precharge and evaluation, to realize a combinational logic.The structure of a typical dynamic AOI gate and its timing diagram is shown in Fig. 1 as an example.Compared with the static CMOS circuit, the dynamic circuit replaces the area-occupying pull-up network with a single pMOS pull-up transistor, whose gate is connected to a clock signal, to first precharge the dynamic node in Fig. 1  Haoming Zhang and Shuowei Li are with the Department of Electrical Engineering and Information Systems, School of Engineering, The University of Tokyo, Tokyo 113-0032, Japan (e-mail: zhm@g.ecc.u-tokyo.ac.jp; lishuowei@silicon.u-tokyo.ac.jp).
Tetsuya Iizuka is with the Systems Design Laboratory, School of Engineering, The University of Tokyo, Tokyo 113-0032, Japan (e-mail: iizuka@vdec.u-tokyo.ac.jp).
Color versions of one or more figures in this article are available at https://doi.org/10.1109/TVLSI.2024.3370862.
Digital Object Identifier 10.1109/TVLSI.2024.3370862level and then wait for evaluation.In the evaluation phase, the pull-down evaluation network will discharge the dynamic node or remain unchanged depending on the input logic values.
Having lower input capacitance and no contention between pull-up and pull-down networks during switching, the dynamic circuit is the fastest among commonly used circuit families [2].
In recent years, dual-mode logic (DML), as a hybrid version of static and dynamic circuits, has been proposed in [3] to capitalize on the advantages of both static (robustness and low power) and dynamic (high speed) CMOS circuits and has proved to be efficient in plenty of state-of-the-art digital circuits [4], [5], [6].
With the advent of dynamic circuits, there have been significant attempts to provide a timing model and verification method for this circuit family.Timing constraints for dynamic circuits and also DML were proposed in [7], [8], [9], [10], [11], and [12].These works have granted profound enlightenment for the timing characterization of dynamic circuits and pose great inspiration for integrating them into the digital design flow.However, none of them has provided a clear and detailed zoom-in to the definition of dynamic circuits' timing parameters with necessity and sufficiency.Due to the extra clock signal and the dual operation phases, dynamic circuits have complex timing, and thus, the definition of their timing parameters, especially setup/hold times, can be bewildering and has always been ambiguous in prior works.Moreover, these timing parameters are typically obtained from simulation, which lacks reliability and robustness considering the unpredictable nonideality during the chip fabrication process.
Recently, we have proposed a timing verification strategy for the dynamic circuit in [13].In [13], with custom-defined timing parameters, we first established exclusive timing requirements for a dynamic circuit and then modeled it into a commonly used liberty format.Subsequently, we successfully executed the static timing analysis for three cascaded dynamic circuits with the constructed liberty files, demonstrating the effectiveness of these timing parameters in predicting and preventing timing failures.
In this article, expanding upon the fundamental setup/hold time concepts for a D-flip-flop (D-FF) and a latch, we first provide an intuitive explanation for the definition of dynamic circuit's timing parameters.These timing parameters have proven to be required in the timing verification of the dynamic circuit but were not explicitly defined in previous works.The step-by-step deductive process also provides insights into how to concisely characterize digital circuits with complex and stringent timing requirements.We then introduce a single ring-oscillator-based test structure to characterize the onchip values of these defined timing parameters, including delay and setup/hold time, whose preliminary results are also presented in [13].In comparison to the simulation result, this proposed test structure is capable of extracting the actual timing parameters in the physical world, revealing the dynamic circuit's raw performance after fabrication and giving confidence in constructing a trustworthy liberty file for practical characterization.Leveraging the proposed structure, we then present a novel approach to estimate the error rate stemming from setup/hold violations.In addition, we presented some detailed discussions on the accuracy of the delay and setup/hold measurement.Finally, we capitalize on the advantages of the proposed architecture and provide some extensions of the current structure.
The rest of this article is organized as follows.In Section II, timing parameters for the dynamic circuit were first explicitly defined.Section III then proposes our test structure accompanied by a detailed explanation of its operation.Section IV provides the chip implementation and a breakdown of the data extraction procedure as well as the error rate estimation approach, and Section V provides the accuracy analysis, extends the structure with minimum pulsewidth measurement, and broadens the applicability of the proposed architecture beyond dynamic circuits for general purpose.Finally, Section VI concludes this article.

II. TIMING PARAMETER DEFINITION
Since the existence of the clock and dual-phase operation complicate the timing parameter definitions of the dynamic circuit, to straightforwardly define the setup/hold time for the dynamic circuit, we first recap the setup/hold definition of the conventional D-FF and latch and then extend it to the dynamic circuit.

A. Constraint Definition of D-FF
Fig. 2 shows a timing diagram for the constraints definition of a conventional rise-edge triggered D-FF with a data input D and an output Q.There are four cases for this D-FF related to the sampling edge of its clock (CLK).
Cases 1 and 2 show the two cases with a rising edge at the data input near the sampling edge as follows.1) Case 1: If the output is originally 0, then there will be a setup time constraint.If the input violates the setup time (the rising edge arrives too late), then the output will not rise from 0 to 1. 2) Case 2: If the output is originally 1, then there will be a hold time constraint.If the input violates the hold time (the rising edge arrives too early), then the output will not fall from 1 to 0. Cases 3 and 4 show the two cases with a falling edge at the data input near the sampling edge as follows.
1) Case 3: If the output is originally 1, then there will be a setup time constraint.If the input violates the setup time (the falling edge arrives too late), then the output will not fall from 1 to 0. 2) Case 4: If the output is originally 0, then there will be a hold time constraint.If the input violates the hold time (the falling edge arrives too early), then the output will not rise from 0 to 1.In summary, in the vicinity of the sampling edge of the D-FF, if the voltage levels of the input and output are initially the same (Cases 1 and 3), there will be a setup time constraint, giving a restriction on the input that it must arrive sufficiently early.If the input and output voltage levels are originally different (Cases 2 and 4), there will be a hold time constraint, restricting the input that it must arrive sufficiently late.

B. Constraint Definition of Latch
To define timing constraints for a latch, we only care about the clock edge of the end of the transparent phase.For example, for a positive level-sensitive latch, setup/hold time definition should only be required at the falling edge of its clock, as shown in Fig. 3. Compared to the constraint definition of a D-FF illustrated in Fig. 2, Cases 2 and 4 no longer exist.The reason is that, near the end of the transparent phase, the output voltage level cannot differ from the input because of the transparent characteristic.As a result, the latch actually has no real hold time.It only has two setup times at the end of the transparent phase, one for rising input and one for falling input (Cases 1 and 3).Sometimes, two virtual hold times are defined to be minus the setup times [2], but actually, they have no specific meaning.

C. Constraint Definition of Dynamic Circuit
For the dynamic circuit, we separately analyze the two cases where there is a rising edge input and a falling edge input, as shown in the timing diagram on the right of Fig. 1.
First, if we consider the case when there is a rising data input as shown in the top right of Fig. 1, Cases 1 and 2 in Figs. 2 and 3 are of concern.For the rising edge data input, if we regard its evaluation phase as the transparent phase of a latch, the dynamic circuit behaves identically as a latch that has the setup time only at the end of the transparent phase, so the dynamic circuit also has the setup time only at the falling edge of its clock, the end of its evaluation phase.Similar to a latch, there is no hold time at this edge, so only Case 1 in Fig. 3 is left and others are excluded.
Second, if we consider the case when there is a falling data input as shown in the bottom right of Fig. 1, Cases 3 and 4 in Figs. 2 and 3 are of concern.Since the falling edge of the clock will force the output to 0 regardless of the arrival time of the falling data input, only the rising edge of the clock, the start of the evaluation phase, needs timing constraints.Also, since in the precharge phase, the output of the dynamic circuit is fixed to 0, Case 3 is impossible because the output cannot be initially 1.Thus, only Case 4 in Fig. 2 is left.
Therefore, as summarized in Fig. 4, two constraints are defined for the dynamic circuit between the data and clock inputs.The first one, t stu−drcf , refers to a setup time constraint for the rising data input at the falling edge of the clock (corresponds to Case 1 in Fig. 3).The second one, t hld−dfcr , refers to a hold time constraint for the falling data input at the rising edge of the clock (corresponds to Case 4 in Fig. 2).
In addition, there are setup time constraints between two DATA inputs in series, for example, A and B in Fig. 1.In some cases, one of the inputs rises within an evaluation phase, while the other is fixed to 1.In some other cases, the input can have a falling transition even in the evaluation phase.In this case, a setup time constraint for the rising edge of A at the falling edge of B, t stu−arbf , is needed to constrain the arrival time of the two inputs, as shown in the bottom left of Fig. 4. For example, supposing that input A will rise, while input B will fall in the evaluation phase, the rising edge of A must arrive sufficiently earlier than the falling edge of B. Otherwise, the output cannot rise to 1 in this evaluation phase.The other t stu−afbr is also defined just by switching the position of A and B in t stu−arbf .

D. Delay Definition of Dynamic Circuit
Fig. 5 summarizes the delay definitions for the dynamic circuit.As only the rising transition of the DATA inputs (A, B, and C) can propagate to the output in the evaluation phase, while both the rising and falling edges of the clock can change the state of the output, it has three types of delay, including t d2q (the delay from data input to output), t clkr2q (the delay from clock rising to output rising), and t clkf2q (the delay from clock falling to output falling).Note that there are three t d2q for the circuit in Fig. 1 because it has three data inputs.

III. POST-SILICON CHARACTERIZATION
Given the explicitly defined timing parameters in Section II, we have been able to effectively model the timing of a dynamic circuit and examine it using a prevalent timing analyzer as demonstrated in [13].Hereafter, to obtain the values of these timing parameters, one way is by simulation, which is relatively flexible because the designers can freely set the simulation condition and observe the waveform from whichever node inside the circuit.However, it suffers from low reliability since the actual delay and setup/hold time may vary from simulation results after the imperfect chip manufacturing process as well as the nonideal operating environment.Taking this into account, post-silicon validation is generally required for digital circuits to evaluate their raw performance and silicon-to-model correlation [16], [17].Onchip timing characterization circuit for IPs such as memory has also been proposed by recent works [18].
In order to characterize dynamic circuits with more reliability and validity, we propose a ring-oscillator-based test structure that can extract all the required timing parameters, including delays and setup/hold times after fabrication, and a detailed explanation of its structure and operation principles will be presented in this section.

A. Prior Works of Ring-Oscillator-Based Measurement
Ring-oscillator-based methods have been widely used for delay and process variability measurement [19], [20], [21], [22], [23], [24], [25], [26], [27], [28].The delay of static combinational logic gates can be easily obtained by simply connecting them into a ring oscillator and measuring the oscillation period.However, the situation becomes more complicated for dynamic circuit because it has dual phases of operation.In [29], a dynamic-cell-based ring oscillator was proposed, in which each dynamic cell composing the ring is guaranteed with a correct operation by being clocked by the feedback signal from the next stage.This structure makes it possible to measure the on-chip data-to-output delay values of the dynamic cell.However, it cannot measure the clock-to-output delay and the setup/hold times.Moreover, it can only measure the delay value under fixed input signal transition/load capacitance conditions, which is insufficient for practical characterization purposes.

B. Proposed Architecture
Fig. 6 shows the proposed test structure, a ring oscillator composed of five delay measurement blocks and two setup/hold time measurement blocks, the detailed structure of which will be explained as follows.The timing of the ring oscillator is shown in Fig. 7.When measuring a delay, the two setup/hold measurement blocks can be regarded simply as two buffers, each providing a fixed delay (t sh1 and t sh2 ), while each of the five delay measurement blocks is composed of four delay measurement units in series and contributes an overall delay of t dly .Fig. 8 shows the block diagram of one delay measurement block, in which four series delay measurement units share the same clock signal and behave equally as four series dynamic gates in terms of timing.The delay measurement unit can only propagate the rising signal and can only be reset to 0 by the clock.The phase generator shown at the top of Fig. 6 provides a trigger and clocks for all the blocks to ensure correct timing.
As shown in Fig. 7, in each cycle, the clock of each delay measurement block will be set to the evaluation phase before receiving the signal.As the signal propagates through all four delay measurement units within their evaluation phases in the N th delay measurement block, its output rises and the (N + 1)th block will be triggered.As the output of the (N + 1)th block rises, it will turn the clock of the N th block to the precharge phase, so the output of all the units in the N th block will be reset to zero.Similarly, after a while, the (N + 2)th block will reset the output of the (N + 1)th block, which will subsequently turn the clock of the N th block back to the evaluation phase.While each block operates following the same sequence, an oscillation lasts inside the loop.As Section III-D explains, by checking the oscillation  of the output of the ring and measuring the period, all the timing parameters, including delay and setup/hold time under different conditions of input transition time/load capacitance, can be obtained, with which we can construct a lookup table (LUT) used for the liberty file in a practical characterization.

C. Propagation Delay Measurement
The detailed structure of the delay measurement unit is shown in Fig. 9.In this test structure, the dynamic cell under test in Fig. 9 utilizes a typical domino AOI gate in Fig. 1 as an example circuit.To construct a liberty file for the dynamic circuit as a standard cell, an LUT for the delay with input transition time and load capacitance being the independent variables is required.Thus, two capacitance tuning blocks shown by the yellow block in Fig. 9 are inserted in front of and behind the dynamic cell under test to change the input transition time and load capacitance.We also aggregate the measurements of all the delays including the three t d2q , t clkr2q , and t clkf2q into a single ring.This approach avoids the need to construct several separate rings for measuring each of the multiple types of delay and avoids the unwanted impact of the location-dependent process variation on the measurement results.As shown by the blue block in Fig. 9, input switches are added to select the input under test and to control other idle Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply. 1) Measurement of t d2q : For example, when measuring the delay from data input C dyn to the output of the dynamic cell under test (t c2q ), the idle data inputs A dyn and B dyn will be connected to A ext and B ext , whose voltage levels can be set manually with external control codes.By choosing proper high/low cases for A ext and B ext as shown in Fig. 10, the output frequency of the ring will vary since the delay of the dynamic cell under test has changed due to the charge-sharing effect, which can be clearly observed and evaluated.In the meantime, the clock of the dynamic cell under test CLK dyn will be connected to pin CLK ext .As explained in Section III-B, the CLK ext of the four series delay measurement units in each delay measurement block is shared as a single clock signal.The clock signals of each delay measurement block correspond to the red signal in Fig. 8 and also clk < 0 : 4 > in Fig. 6.
2) Measurement of t clkr2q and t clkf2q : When measuring the delay from the clock to the output, as shown in Fig. 11, we first connect all the data inputs A dyn , B dyn , and C dyn to A ext , B ext , and C ext respectively, and set their proper voltage levels to let the pull-down network always activated.In this case, the output of the dynamic cell will be determined solely by its clock, so it can be regarded as a static CMOS buffer with a single input CLK and a single output OUT.We then switch the clock of the dynamic cell under test CLK dyn into the signal propagating path by connecting it to pin IN.Therefore, the ring oscillates like a typical inverter-based ring oscillator propagating both rising and falling edges.Since the ring is now equivalently composed of all buffers with positive polarity, an inverter is inserted into only the last delay measurement unit (the 3rd delay measurement unit in the 4th delay measurement block) after pin IN, which is shown by the blue dashed block in Fig. 11.
Unlike static CMOS combinational cells which are typically designed to have an equivalent pull-up and pull-down ability, the dynamic circuit has an asymmetric pull-up and pull-down network.Thus, the rising and falling delays from the clock input to the output, t clkr2q and t clkf2q , are largely different.Thus, instead of measuring the oscillation period, we must measure the duty cycle of the output of the oscillator and extract the rising and falling delays separately from the positive and negative pulsewidth.The detailed extraction procedure will be explained in Section IV-A.
3) Differential Measurement: Since we are adding an input switch block before the dynamic cell under test, it is obvious that the delay of the dynamic cell cannot be directly calculated from the oscillation period because the loop now contains other components except for the dynamic cells under test.To exclude the delays of these blocks for test, we utilized a differential measurement method, which is realized by the output switches shown by the green block in Fig. 9.The output switches can work in two modes, Mode T (through) and Mode P (pass).As indicated by red wires in Fig. 9, the input signals to the dynamic cell, A dyn , B dyn , C dyn , and CLK dyn , can be directly chosen by the output switches to bypass the cell.In Mode T, the propagating signal goes through the dynamic cell under test, while in Mode P, it is bypassed.By subtracting the two periods measured in these two modes, the delay of only the dynamic cell under test will be left, and the delay of other components in the loop will be canceled.
This approach shows advantages, especially when designing the layout.With the ring oscillator, the measured delay is an average value of the delay of all the cells.Thus, to guarantee the accuracy of the calculated results, the layout of the test ring needs a dedicated design so that all the dynamic cells under test are provided with an equal load environment to have a really identical delay.The differential measurement eliminates this concern because all the dynamic cells are surrounded by the same input and output switch blocks, so their delay will not be affected significantly by their unequal physical location.

4) Dummy Dynamic Cell:
To guarantee an oscillation with domino logic cells with positive polarity in a loop, we add another dummy dynamic cell, which is shown by the red block in Fig. 9, in each delay measurement unit to provide the reset function in Mode P. The dummy dynamic cell has the same structure as the measured dynamic cell, and it will also contribute an additional delay in Mode T, but this delay is identical in the two modes and will be canceled by subtracting the two measured periods.When measuring t clkr2q and t clkf2q , as explained above, the dynamic cell under test behaves as a static gate, so the ring oscillates like a normal inverter-based ring oscillator and each stage does not need to be reset; thus, the switch in the red block will select the top path for the signal to propagate and the dummy dynamic cell will be bypassed.

D. Setup/Hold Time Measurement
A typical way of measuring setup/hold time for sequential circuits is to design a pair of tunable delay paths and tune the skew between CLK and DATA to a point where it just fails to operate [30], [31], [32].However, this kind of method is not reliable enough because each setup/hold time can only be measured for one time.In our test structure, we inserted two setup/hold measurement blocks into the ring in Fig. 6.Thanks to the implementation of the differential measurement method, the positions to add the two setup/hold measurement blocks have no impact on the measurement results.However, for the convenience of layout design, we have placed them before and after the 2nd delay measurement block in Fig. 6.
The block diagrams of the two setup/hold measurement blocks are shown in Fig. 12.They are composed of a dynamic cell under test and a skew generator, which provides two signals with a tunable timing skew.Specifically, Block1 measures both t stu−drcf and t hld−dfcr , while Block2 is for measuring t stu−arbf and t stu−afbr .In Block1 shown in Fig. 12(a), we can select the measured parameter (t stu−drcf or t hld−dfcr ) by switching between the yellow and blue paths, which have inverse polarity for data and clock paths.For example, when t stu−drcf is being measured, both the up and bottom yellow paths in Fig. 12(a) are selected for data and clock signals.The rising signal coming in from the pin IN will first enter the skew generator and separate into the two yellow paths.The upper path for the data signal maintains its polarity since it goes through a transmission gate, while the bottom signal for the clock will be converted to a falling edge by the inverter.To reduce the error of the tuned skew, the sizes of the transmission gate and inverter are carefully tuned so that their propagation delay is approximately consistent according to simulation results.Similar to the delay measurement unit in Fig. 9, input switches are also added before the data input to select the measured input and change the status of other inputs  that are not under test, as shown in Fig. 12(a).In order not to affect the tuned skew, dummy switches are added before the clock to compensate for the delay brought by the input switches.The selection of t stu−arbf and t stu−afbr is achieved in the same way in setup/hold measurement Block2 as shown in Fig. 12(b), in which the timing parameters to be measured are the setup times between the two series data inputs, so it is the skew between the input A and B of the dynamic cell under test that tuned by the skew generator, while its clock signal is provided by the inverter after its output.
The timing diagram of the setup/hold time measurement Block1 when measuring t stu−drcf is shown in Fig. 13 as an example.Given a safe skew between the clock and data input as shown in Fig. 13(a), the measured dynamic cell will operate correctly as a part of the ring, and the whole ring oscillator will generate a stable oscillating signal.As we tune the skew smaller as shown in Fig. 13(b), the dynamic cell under test will violate the setup/hold time and fail to propagate the signal.Thus, the ring will fail to oscillate.We can record the skew between the clock and data at this moment, representing the setup/hold time.This method allows us to easily observe the oscillating waveform for a sufficiently long duration, for example, more than 10 s.As the ring repeatedly examines the setup/hold violation countless times under actual operating conditions, the measured setup/hold time will be more trustworthy than the ideal simulation results.
The skew generator in Fig. 12 is implemented, as shown in Fig. 14.One skew generator has two delay-tunable paths with the same structure, each of which has a coarse-tuning block and a fine-tuning block [30].The coarse tuning is realized by changing the buffer number in the signal propagating path, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.so the coarse-tuning step will be the delay of one buffer.Fine-tuning is achieved by changing the load capacitance of the buffer.Thus, the fine-tuning step will be the delay difference of one buffer when there is a small or big load capacitance.To achieve an optimal resolution and to prevent any gaps within the expected covering range of this timing gap, simulations have been made to ensure sufficient overlap between the coarse-tuning step and the fine-tuning range.The on-chip values of the coarse-tuning and fine-tuning steps are obtained by constructing two extra ring oscillators and measuring their output periods, as shown in Fig. 15.To compose a ring oscillator, each ring has an additional NAND gate to provide a negative polarity and an enable signal.The impact of the NAND gate delay is negligible when the delay step is calculated from the oscillation period.
IV. ON-CHIP MEASUREMENT Two rings with identical structures but with differently sized dynamic cells under test are implemented for comparison.The chip is fabricated in a TSMC 65-nm process, whose die photograph is shown in Fig. 16.To implement the circuit with the standard-cell-based design flow and considering the simplicity of layout design, the layout of the dynamic cell is designed by following the standard-cell design manner, e.g., cell height and power/ground rail locations.All the other components in this test structure are designed fully based on standard cells.This attribute enhances the area efficiency of the layout and endows the proposed architecture with robust applicability to other circuits across various technology nodes, thanks to the high portability of standard cells.Fig. 17 shows the measurement setup.The power source gives the power supply for the onboard LDOs, which provide the supplies to the chip.The SPI controls the operation mode of the chip.The function generator provides the chip with a trigger signal.The output of the ring is connected to an  oscilloscope, which captures the oscillation waveform and measures the oscillation period.
A. Delay Extraction From the Measured Waveform 1) t d2q : Fig. 18 shows the output waveforms of the test structure when measuring the delay from the DATA input to the output.As the measured waveform shows, when the signal goes through the dynamic cell under test in Mode T, the output period of the ring T T is 14.3 ns.When the signal bypasses the cell in Mode P, the output period of the ring T P is 10.0 ns.
To derive the relationship between t d2q and the measured period, we first define the overall delay of the components excluding the five delay measurement blocks in Fig. 6 as t exc-dlb , which includes the delay of the two setup/hold measurement blocks and the OR gate.In each delay measurement unit shown in Fig. 9, we then define the overall delay excluding the dynamic cell under test as t exc-dyn , which includes the delay of the input and output switches as well as the red dummy dynamic cell block.Since there are 20 delay measurement units in the ring, the relationship between the measured period in the two modes and the defined delays is given by (1) T P = t exc-dlb + 20 × t exc-dyn = 10.0 ns. ( Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.By subtracting T P from T T , the delay of one dynamic cell under test is given by 2) t clk2q : Fig. 19 shows the output waveform when measuring the delay from clock input to output.In Mode T, the positive and negative widths of the output W T+ and W T− are 12.8 and 12.9 ns, respectively, while in Mode P, the positive and negative widths of the output W P+ and W P− are 9.2 and 10.6 ns, respectively.
As explained in Section III-C and illustrated in Fig. 11, an inverter is inserted into the 20th delay measurement unit in the ring, which is the 3rd unit in the 4th delay measurement block in Fig. 6.At the start of one positive pulse, a rising edge departs from the output of the ring and it will first travel through 19 delay measurement units, then can be inverted before the 20th dynamic cell under test, and go through it as a falling edge.Finally, it reaches the output and ends this positive pulse.Adopting the definition used in t d2q calculation and defining the inverter delay as t inv , the relationship between the measured duty cycles in the two modes and the clock-tooutput delay is given by ( By subtracting W P+ from W T+ , the relationship of the clockto-output delay of one dynamic cell under test, t clkr2q and t clkf2q , and the measured positive pulsewidth is given by Similarly, the relationship related to the negative pulsewidth is given by Using ( 6) and ( 7) and the measured W T+ , W T− , W P+ , and W P− , the rising and falling clock-to-output delays of the dynamic cell under test, t clkr2q and t clkf2q , are calculated to be 183.6 and 111.4 ps, respectively.

B. Setup/Hold Extraction From the Measured Waveform
Fig. 20 shows the output waveforms when measuring t stu−drcf of the large-size dynamic cell.In these figures, the yellow waveform is the trigger signal to the ring oscillator, and the green one is the output of the test structure.In the first case in Fig. 20 (Coarse Code: 4 and Fine Code: 0), the skew between the data and clock signal is the largest among all cases, and it shows that the test structure outputs a stable oscillating waveform.When the skew is decreased by one fine-tuning step, the oscillation lasts for 100 ms.This shows that now, the skew is approaching the setup time of the dynamic cell under test, so the ring oscillator becomes prone to disturbance and tends to fail to oscillate.The skew is then further decreased, and the oscillation duration keeps decreasing from around 780 and 5 µs to only 40 ns, and finally, the ring completely fails to oscillate.In this research, for more reliable results, we uniformly choose the smallest skew with which the ring can output a stable oscillation for a long enough duration, e.g., more than 10 s, as the measured result for the setup/hold time.For example, among all the cases in Fig. 20, we use the skew-tuning code in the first case (Coarse Code: 4 and Fine Code: 0) to calculate the setup/hold time.According to the measurement results obtained from the ring oscillator shown in Fig. 15, the coarse-and fine-tuning steps t coarse and t fine are 77.4 and 4.5 ps, respectively, for the skew generator in Fig. 14.The coarse-and fine-tuning codes in one of two paths in the skew generator are fixed to 8 in this design.The setup time is just the delay difference between the two paths, which is given by Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

C. Error Rate Estimation
The measured oscillation duration can also be used to estimate the error rate due to setup/hold time violation.Given a fixed skew between the data and clock inputs, as the ring keeps examining the setup/hold violation for a small duration and finally fails to oscillate, the setup/hold time is always satisfied except for only the last time.Thus, the error rate of the setup/hold time violation can be represented by the ratio between the oscillation period to the oscillation duration.The black line in Fig. 21 shows the calculated error rate from Fig. 20.It shows that as the skew between data and clock signal becomes smaller (the fine-tuning code changes from 0 to 5), the error rate of t stu−drcf violation increases exponentially from below 10 −8 to 1.This function of the proposed ring-oscillator-based architecture applies equally to other sequential circuits such as D-FF to thoroughly estimate the error rate of setup/hold time violation in terms of not only the skew between the data and clock input but also temperature and voltage fluctuations.
Similarly, the error rate graphs of both large and small size dynamic cells of the three types of constraints (t stu−drcf , t hld−dfcr , and t stu−arbf ) are shown in Figs.21-23, respectively.Here, in these example results, the large-sized cell has smaller setup/hold time requirements.The oscillation durations longer than 10 s are regarded as 10 s for simplicity when calculating the error rate in these cases.Since the oscillation period in the setup/hold time measurement is all within the range of tens of megahertz, an oscillation duration of 10 s approximately corresponds to an error rate of 10 −8 , so the minimum skew that can meet an error rate smaller than 10 −8 will be recorded as the setup/hold time result in our measurement.

D. Measurement Results
Fig. 24 shows a 3-D plot of the measured and simulated results for the delays of both large-and small-size dynamic cells from input B to output, which shows that the difference is less than 16%.It also shows that the delay of the large-size dynamic cell is approximately half of the small-size dynamic cell.
Fig. 25 shows the measurement result of the delay from input C to output for large-size dynamic cells under different statuses of idle pins.It shows that the delay is the largest when A is on and B is off.This is reasonable because, in this case, both the parasitic capacitances C1 and C2 in Fig. 1 are originally charged, which indicates that they need more discharging time when the input C changes from zero to one.While the other two cases have similar delay values because as long as A is off, C2 is separated from the discharging path.Thus, the ON/OFF of B has little impact on the delay.Fig. 26 shows the measurement result of t clkr2q for small-size dynamic cells under different statuses of idle pins.The delay is significantly small when A, B, and C are all 1.Because in this situation, both pull-down paths in Fig. 1 are on (the first path is composed of two transistors, A and B, in series, and the second path has a single transistor C), the pull-down current is the largest.In contrast, the most significant delay appears when only C is 0 and A and B are 1, since in this case, only the path of series transistors A and B is on.
Fig. 27 shows the significant deviation between the measured and simulated setup time of the dynamic cell.In this graph, the shown setup time is t stu−drcf for input A of the large-size cell.This indicates that the measured setup time is significantly larger compared to the results obtained by simulation.In the test structure, unlike delay measurement  results which is an average value of the delays of all the dynamic cells under test in the ring oscillator, the setup/hold time is checked repeatedly by the ring countless times as explained in Section IV-B so that all the disturbances, such as voltage variation and temperature variation, will all be considered, giving a worst case result.This also confirms the importance of this test structure for practical timing characterization of dynamic circuits under the actual on-chip operating condition.
Regarding the on-chip duty cycle measurement required by the clock-to-output delay extraction as explained in Section IV-A, one possible concern would be that the accuracy of the duty cycle measurement might be influenced by the output buffers and I/Os, and their process variation impact.Based on post-layout simulation results for both small-and large-size dynamic cells under observed that the clock-to-output delays derived from the simulated duty cycles exhibit ∼10% errors from the simulated delays of a single cell in different process corners.To enhance the accuracy of the duty cycle measurement, the clock-to-output delay can also be measured in with some dedicated on-chip duty cycle measurement techniques [33], [34] for even more precise results.

V. DISCUSSION OF MEASUREMENT ACCURACY AND FUNCTIONAL EXTENSION
In this section, we initially delve into the accuracy of the proposed test structure, providing a quantitative analysis for the design consideration of the ring stage number as well as the required number of measured periods.Subsequently, we discuss extensions of the proposed architecture, enhancing its functionality with minimum pulsewidth measurement and adapting it for use with other circuits.

A. Accuracy of the Measurement
Although the differential delay measurement demonstrates advantages in the layout design as explained in Section III-C3, it also imposes a limitation on the accuracy of the delay measurement.As explained in Section III-C1, in Fig. 9 for example, when measuring the delay from input A to the output, we first let the signal go through and then bypass the dynamic cell under test and use the difference of the two measured periods to calculate the cell's delay.In the two measurements, even though the two input paths of the output switch are designed to be symmetric, the delay through the switch will not be identical in the two modes due to the different slew rates at its two input nodes OUT dyn and A dyn .Thus, in Section IV-A, t exc-dlb cannot be completely canceled in (3), introducing an error when calculating the delay.Although this error can be compensated by carefully tuning the load capacitance at the two nodes OUT dyn and A dyn such that they have a fixed relationship to ensure an identical slew rate, our test structure, aiming to provide a 4 × 4 LUT of the delay for the liberty file, has to vary the capacitance of the two nodes independently by the yellow block in Fig. 9.This means that this error is inevitable in the test structure, resulting in a tradeoff between the range of the independent variables of the LUT and the range of the error.Fig. 28 summarizes the errors between the actual delay and the calculated delay according to the simulation results in constructed delay LUT tables with a small and large range of the input transition time and output load capacitance.This suggests that errors are minimal around the diagonal from top left to bottom right of the LUT, where the load capacitance at nodes OUT dyn and A dyn is relatively balanced.Conversely, errors are most significant at the topright and bottom-left corners of the LUT, where the slew rates at the two nodes deviate the most.Furthermore, as the required ranges of the two independent variables increase, the two capacitances become more unbalanced, resulting in a larger range and peak values of the error.
Besides the inherent limitations of the structure itself, both random variations and noise can also influence the measurement accuracy.The random variations may result in deviations in the delay of each measured gate, while the noise originating from the ring oscillator will additionally introduce deviations on each measured period.These two factors independently contribute to errors in the final measurement results, with their impacts closely tied to the stage number of the ring oscillator and the measured periods.
The error caused by the random variation can be mitigated by increasing the stage number.To quantify this effect, we initially treat the infinite possible delay values of a single dynamic cell as a population X d with a normal distribution, defining its standard deviation due to random variation as σ d .The value of σ d can be estimated through the Monte Carlo simulation during the circuit design.The real delay of the dynamic cell, excluding the random variation, is naturally denoted by the population mean µ d .Subsequently, the actual delays of each dynamic cell under test in the ring are considered as finite samples of the population X d .Since the measured delay using the ring oscillator is an averaged value for the cells under test, we define this measured delay as x d representing the sample mean, and the stage number as n d standing for the number of samples.Based on these definitions, the accuracy of the delay measurement can then be quantified using the error bound for a population mean (EBM) for the confidence interval of µ d , represented by x d , σ d , and n d .Considering an example confidence level of 95% and the corresponding stand score z (= 1.96), a 95% confidence interval for µ d can be interpreted as follows [35]: Hence, given a specified target value of EBM 1 and a simulated value of σ d , the minimum required stage number n d can be determined using the aforementioned equation, ensuring that the sample mean x d can be deemed a sufficiently accurate estimation of the population mean µ d .In our design, for example, considering a z-value of 1.96 that corresponds to a 95% confidence level and assuming a simulated σ d of 2 ps, the calculated n d based on ( 9) is 15.37 with a target EBM 1 below 1 ps for µ d .This suggests that the stage number of the ring oscillator should be at least 16 to alleviate the influence of random variation, while our design uses 20 stages to have some design margin.
To mitigate the impact of noise-induced variation in the oscillator's period, it is advisable to measure and average a sufficiently large number of periods.Likewise, the measured periods in Modes T and P with averaged values of T T and T P can be treated as samples from the two populations X T and X P encompassing all possible period values.The two populations have respective population means T T and T P , representing the real periods in the two modes, and population standard deviations σ T and σ P attributed to noise.For simplicity of the analysis, we assume that the numbers of measured periods in both Modes T and P are denoted as n p and that the standard deviations in the two modes (σ T and σ P ) shared an equal value σ p , measurable by the oscilloscope.Consequently, a 95% confidence interval for the actual periods T T and T P is given by Given the relationship in (3), the confidence interval of the calculated delay t d2q is then interpreted as A calculation example of ( 11) is provided next.With a z-value of 1.96 corresponding to a confidence level of 95% and assuming a measured σ p of 25 ps, if we aim for an EBM 2 below 1 ps for t d2q , n p will be calculated as 24.01, indicating that it is recommended to measure at least 25 periods to obtain a sufficiently averaged result.This requirement is easily attainable using the current oscilloscope technology.On the other hand, regarding the precision of the setup/hold measurement for the proposed test structure, the resolution is limited by the fine-tuning step of the skew generator shown in Fig. 14.Based on the explanation at the end of Section III, the fine-tuning step can be changed to smaller by reducing the size of the load capacitance to achieve a higher resolution for the setup/hold measurement, but as a tradeoff, the buffer numbers should be increased in the fine-tuning path to ensure that the fine-tuning range is larger than the coarse-tuning step to prevent any gaps within the expected covering range.This will of course consume more area on the chip.Therefore, simulations regarding the step and range of the fine and coarse tuning are important to verify this requirement and assist in considering the balance between resolution and area efficiency.

B. Measurement of Minimum Pulsewidth
Similar to other sequential circuits, the dynamic circuit may also necessitate a minimum pulsewidth requirement for its clock signal in certain cases.For instance, the dynamic cell illustrated in Fig. 1 may fail to precharge successfully if its minimum clock pulse is not wide enough.Since the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.requirements of the minimum pulsewidth of the clock signal are generally not as stringent as those for delay and setup/hold time, and their definitions do not become as complicated for dynamic circuits, we did not implement the regarding function into the test structure or discuss aspects related to minimum pulsewidth in Section II.However, the functionality of minimum clock phase measurement can be readily incorporated to enhance the current architecture.This can be achieved by adding a minimum pulsewidth measurement block into the delay measurement ring in Fig. 6.Fig. 29 introduces the block diagram of the minimum pulsewidth measurement block, which operates on a similar principle as the setup/hold measurement introduced in Section III-D.The two switches inside this block will decide whether it is the positive or negative pulsewidth to be measured.The simulation results of the minimum pulsewidth measurement block inserted into the test structure are shown in Fig. 30.As explained in Section III-C, the dynamic cell will operate as a simple static CMOS buffer with a single CLK input when its pull-down network is always activated.With a skew generator generating a tunable time gap, given a sufficiently wide clock pulse, the dynamic cell under test will also have a pulse on its output wide enough to trigger the next stage.As the clock pulsewidth is tuned smaller, the dynamic cell will violate the requirement of minimum pulsewidth and fail to provide the signal transition, stopping the ring from oscillating.The measured minimum pulsewidth can then be read from the time gap of the skew generator in the same way as the setup/hold measurement.

C. Adaptation to Other Circuits
While the architecture presented in this article utilizes a domino AOI gate as the circuit under test, it is evident that the same structure can be adjusted for the characterization of  any other domino logic with different numbers of data inputs, such as AND gate or OR gate simply by changing the switch numbers in each delay measurement unit illustrated in Fig. 9.Moreover, the distinctive characteristic of the proposed ring oscillator also offers insights into the timing characterization of other circuits.
A comparison between a typical inverter-based ring oscillator and a simplified version of the proposed dynamiccell-based ring oscillator is depicted in Fig. 31.Compared to the original test structure in Fig. 6, in Fig. 31(b), we omit the two setup/hold measurement blocks as well as the enable signal, replacing the NOR gates in the phase generator with just inverters for easier understanding.As mentioned in Section III-C, the conventional inverter-based ring oscillator [Fig.31(a)] propagates both rising and falling edges because each gate senses the rise and fall input alternatively.In contrast, the proposed oscillator only propagates the rising edge.Each dynamic cell can only propagate the rising transition on its data input to the next stage, whose output is then fed back to perform the reset function.
This form of edge propagation can be applied to the delay measurement of other circuits that were previously impossible to cascade as a ring oscillator, such as a D-FF, which only reacts to a single type of edge on its clock.For instance, to measure the rising delay of a rising edge triggered D-FF, a ring structure as shown in Fig. 32 becomes an effective solution.In each cycle, each D-FF under test exhibits a single form of operation, and it rises to 1 and is then reset by the subsequent stage after a while.The delay of each D-FF can be calculated in the same way from the oscillation period,  as explained in Section IV.Based on the structure depicted in Fig. 32, we also implement a D-FF-based ring oscillator, the simulation result of which is shown in Fig. 33, demonstrating the same oscillation pattern as that of the dynamic-cell-based ring oscillator in Fig. 7.
In addition, as one of our main proposals presented in Section IV-C, utilizing the iterative examination by the ring oscillator, we also aim to obtain the error rate of the circuit stemming from a certain type of setup/hold violation from the ratio of the oscillation period and duration.Throughout the iterative process of measuring the same type of timing constraint, the setup/hold measurement block naturally operates in the same manner, and this brings a crucial prerequisite that there should consistently be a single type of edge (either rising or falling) feeding into the setup/hold measurement block, rather than alternating between rising and falling edges.This implies that a conventional ring oscillator type is unsuitable for this purpose.In order to measure the setup/hold time of the D-FF, a similar architecture can be reused as shown in Fig. 34, wherein a setup/hold checking block is inserted into the D-FF-based ring in Fig. 32.The setup/hold time can then be measured by checking the oscillation of the ring oscillator as introduced in Section IV-B.

VI. CONCLUSION
In this article, we first proposed clear definitions of the timing parameters for dynamic circuits, which are necessary and sufficient for the liberty file construction and static timing analysis.To extract all these required timing parameters with actual measurement, we then proposed a ring-oscillatorbased test structure, which can also be used for the on-chip characterization of other similar types of dynamic circuits with dual operational phases.Subsequently, we delineated the measurement procedure, analyzed the measured results, and proposed a method for estimating error rates.Finally, we provided a detailed analysis of the accuracy of the proposed test structure and engaged in discussions regarding potential extensions of the current architecture.
to a high Manuscript received 20 October 2023; revised 5 January 2024 and 29 January 2024; accepted 19 February 2024.Date of publication 4 March 2024; date of current version 26 April 2024.This work was supported in part by the Japan Science and Technology Agency of the Adaptable and Seamless Technology Transfer Program through Target-Driven R and D under Grant JPMJTR201C; in part by the Japan Society for the Promotion of Science KAKENHI under Grant JP21H03406; and in part by The University of Tokyo through the activities of VLSI Design and Education Center, in collaboration with Cadence Design Systems Inc. (Corresponding author: Haoming Zhang.)

Fig. 1 .
Fig. 1. Circuit diagram of the example dynamic AOI gate (left) and its timing diagram (right) when there is a rising (top) or a falling (bottom) data input.

Fig. 4 .
Fig. 4. Setup/hold time definition of the dynamic circuit.

Fig. 6 .
Fig. 6.Block diagram of the proposed test structure.

Fig. 7 .
Fig. 7. Timing diagram of the proposed test structure.

Fig. 14 .
Fig. 14.Structure of one of the two delay tunable paths in the skew generator.

Fig. 18 .
Fig. 18.Output signal of t d2q measurement in Mode T (top) and Mode P (bottom).

Fig. 19 .
Fig. 19.Output signal of t clk2q measurement in Mode T (top) and Mode P (bottom).

Fig. 20 .
Fig. 20.Output signal of the ring oscillator in big-size dynamic cell's t stu−drcf measurement when the coarse-tuning code is 4 and the fine-tuning code changes from 0 to 5.

Fig. 25 .
Fig. 25.Measured t c2q for large-size dynamic cell in all conditions.

Fig. 26 .
Fig. 26.Measured t clkr2q for small-size dynamic cell in all conditions.

Fig. 28 .
Fig. 28.Error of the calculated delay (ps) with a smaller (left) or a larger (right) range of output load capacitance and input transition time.

Fig. 30 .
Fig. 30.Simulation results of clock minimum pulse measurement block when (a) not violating or (b) violating for positive pulse and (c) not violating or (d) violating for negative pulse.

Fig. 34 .
Fig. 34.D-FF-based ring oscillator with a setup time measurement block.