Standard Cell-Based Ultra-Compact DACs in 40-nm CMOS

In this paper, very compact, standard cell-based Digital-to-Analog converters (DACs) based on the Dyadic Digital Pulse Modulation (DDPM) are presented. As fundamental contribution, an optimal sampling condition is analytically derived to enhance DDPM conversion with inherent suppression of spurious harmonics. Operation under such optimal condition is experimentally demonstrated to assure resolution up to 16 bits, with 9.4–239X area reduction compared to prior art. The digital nature of the circuits also allows extremely low design effort in the order of 10 man-hours, portability across CMOS generations, and operation at the lowest supply voltage reported to date. The limitations of DDPM converters, the benefits of the optimal sampling condition and digital calibration were explored through the optimized design and the experimental characterization of two DACs with moderate and high resolution. The first is a general-purpose DAC for baseband signals achieving 12-bit (11.6 ENOB) resolution at 110kS/s sample rate and consuming <inline-formula> <tex-math notation="LaTeX">$50.8\mu \text{W}$ </tex-math></inline-formula>, the second is a DAC for DC calibration achieving 16-bit resolution with 3.1-LSB INL, 2.5-LSB DNL, <inline-formula> <tex-math notation="LaTeX">$45\mu \text{W}$ </tex-math></inline-formula> power, at only <inline-formula> <tex-math notation="LaTeX">$530\mu \text{m}^{2}$ </tex-math></inline-formula> area.


I. INTRODUCTION
Although digital circuits have benefitted tremendously from technology scaling, the design of analog and mixed signal blocks has become increasingly challenging. This is due to several factors, such as lower supply voltages, poor scaling of analog properties of transistors, very limited area shrinkage across technology generations , and significantly higher design effort. This limitation has led to recent efforts to introduce architectures of analog/mixed signal blocks that are mostly or completely based on digital standard cells , to meet the stringent area, cost and design effort requirements of nodes for the Internet of Things (IoT) [1]- [10]. This permits indeed to specify their operation through behavioral description in a hardware description language (HDL), and implement them through fully-automated design flows. This drastically reduces the design effort, and brings the advantages of digital circuits , such as design and technology portability, low-voltage operation, and effective area shrinkage at more advanced technology generations.
This paper focuses on digital-to-analog converters (DACs), which are key building blocks for sensor readout, on-chip tuning/calibration, reference generation, audio processing and threshold generation for event detection [11]- [15]. Conventional single-bit sigma-delta (ΣΔ) DACs and pulse-width modulation (PWM) DACs are fully digital, but demand high-order ΣΔ modulators and digital interpolators at high clock rates [16], which make them not attractive in tightly area-and power-constrained systems. Also, PWM DACs require large, high-order reconstruction filters to suppress image frequencies [14]- [15].
In view of the limits of single-bit ΣΔ and PWM fullydigital DACs, state-of-the-art low-frequency DACs are mostly based on hybrid architectures, including a high-order multi-bit ΣΔ noise shaper with low (e.g., 32-64X) oversampling ratio and an analog DAC (e.g., currentsteering, resistive string) [11]- [13]. Compared to fullydigital DACs, the presence of the analog sub-DAC brings the limitations of analog designs. As a result, the minimu m voltage DD,min of DACs from prior art is in the 1.8-3.3 V range, with very few exceptions at 1.2V [17], and 0.8V [18].
To address the above challenges, the Dyadic Digital Pulse Modulation (DDPM) was recently proposed in [19]. The DDPM modulation moves most of the energy of image spectral components to much higher frequencies than PWM, reducing the area of the reconstruction filter roughly by 2 , being N the resolution [19]. Also, the DDPM modulation does not require area-and power-hungry interpolation as opposed to ΣΔ DACs, and has no stability issue thanks to its open-loop architecture.
In this paper, standard cell-based Nyquist-rate DDPM DACs are explored in terms of achievable resolution, and novel techniques to improve it. From the spectral analysis or the DDPM modulated signal, an optimal sampling condition is analytically derived to suppress spurious harmonics. Suitable digital calibration techniques and dynamic resolution-sampling rate tradeoff are also discussed and experimentally demonstrated. A testchip with two DDPM DAC designs in 40nm is experimentally characterized to evaluate the effectiveness of such techniques, and to demonstrate the versatility of the DDPM approach up to relatively high resolutions . The first design is a 12-bit , 110kS/s (DAC_12) general-purpose converter occupying an area of only 270µm 2 , and a power of 50.8µW. The second design is a 16-bit DAC (DAC_16) for static signal generation, which targets the typical requirements of on-chip calibration and high-resolution on-chip DC voltage generation for analog and mixed-signal integrated systems. Such DACs are extensively required in several applications, including high-frequency A/D and D/A converter calibration [20]- [21], RF transceiver calibration [22], on-chip filter tuning/reconfiguration [23]- [24], beamforming [24], reconfigurable/digitally-assisted analog, reconfigurable reference voltage generation [25]- [28]. The DAC_16 design achieves 16-bit static resolution at ±3.1LSB integral nonlinearity (INL), ±2.5LSB differential non-linearity (DNL) at 530µm 2 area, and 45μW power. This work shows that DDPM DACs can actually be very competitive in terms of resolution, in spite of their very compact area (9.4-239X lower than prior art).
The paper is structured as follows. In Section II, the basic spectral properties and an optimal sampling condition for DDPM D/A conversion are derived. In Section III, the architecture of the proposed DACs is described, along with an off-line calibration strategy for resolution enhancement. In Section IV, measurement results are discussed. Section V concludes the paper.

II. D/A CONVERSION VIA DDPM MODULATION AND OPTIMAL SAMPLING CONDITION
In DDPM D/A conversion [19], the -bit integer binary input to be converted is expressed in terms of its binary representation ( −1 −2 … 0 ) as: and is associated to a digital DDPM output stream given by The DDPM stream in (2) consists of the superposition of the dyadic basis signals ( ) for =0… − 1, as defined by [19] In (3), Π ( ) is the ideal digital pulse signal defined as As shown in Fig. 1, the generic basis signal ( ) is a digital pattern of a pulse equal to 1 starting on the clock cycle 2 − −1 and followed by 2 − − 1 zeros, then periodically repeating with a period of 2 − cycles [19]. As an example with =4, the first pulse equal to 1 in 3 ( ) occurs in the first cycle. This is then followed by one zero, and the resulting pattern is then repeated every two cycles. In 2 ( ) , the first pulse equal to 1 starts in the second cycle, it is followed by three zeros, and the pattern is then repeated every four cycles. Combining ( ) for =0… − 1 as in (2), the modulated DDPM output is periodic with a fundamental frequency 0 = 1/2 , and is obtained by merging the pulses equal to 1 associated with the input digits . Each input digit results in a pulse train with pulse density 2 −1 /2 (i.e., fraction of the period 1/ 0 in which the pulse train is at 1) equal to the corresponding weight, as shown in Fig. 1a. From an implementation viewpoint, the DDPM modulated digital signal Σ ( ) in (2) can be generated by a simple priority multiplexer [19], whose selection signals are provided by a free-running binary counter (see Fig. 1b).
The Fourier series expansion 1 of Σ ( ) in (2) is readily found to be , and can be extracted via a first-order RC low-pass filter as in Fig.1b. Having a voltage swing of , the RC filtered output Σ,filtered corresponds to the outcome of the D/A conversion of the input . The harmonics in (5) are spurious components to be filtered out.
From the above spectral analysis, in the following the DDPM modulation is shown to enable inherent and guaranteed suppression of most of the s purious harmonics under proper choice of the sampling period. In turn, this vastly relaxes the output filter specifications. Indeed, (5) reveals that the phase of all the harmonics in Σ ( ) is independent of , and can be either 0 or 180° (as dictated by − ). The first-order RC filter in Fig. 1b introduces a further phase shift ∠ ( 0 ) ≈ − /2, for the harmonics at frequency 0 in (5a) lying well above the filter cutoff frequency = 1/2 (e.g., one decade above). Such -th harmonics above the filter cutoff frequency contribute to the filter output through an additive term that is equal to ± | ( 0 ) | ⋅ | , | ⋅ sin(2 0 ) from (5a). In turn, such contribution is equal to zero at = , being defined as In other words, all harmonics lying at least one decade above the filter cutoff frequency give zero contribution to the filtered output at = , irrespective of the specific DC input code being converted, and of the magnitude of the filter frequency response. Thus, the DAC output sampled at = 2 (or any integer multiple ) is unaffected by harmonics above 10 . Interestingly, such harmonics 1 Compared to [19], the Fourier series expression has been obtained shifting the time origin by /2, for convenience represent the vast majority of the overall energy of the spurious components above the DC component, as will be shown below. From the above considerations, the choice of the sampling period 2 introduces inherent suppression of the dominant contribution of spurious harmonics in DDPM modulation, drastically relaxing the filter cut-off frequency requirement. In contrast, such spectral property of DDPM modulation does not apply to binary streams originated by ΣΔ modulators (e.g., by 1 st -or 2 nd -order). Indeed, the latter ones are well known to have a complex and input-dependent phase in the harmonic components, as exemplified in Fig. 2a. In this figure, the magnitude and the phase spectra of the output stream is plotted for a DDPM, a first-order and a second-order ΣΔ modulator, under the same DC input code =5363. Accordingly, in ΣΔ modulators it is not possible to derive an input-independent optimal sampling time at which the contribution of nearly all harmonics is zero, thus requiring more stringent filter specifications. Quantitatively, Fig. 2b shows that sampling the output of a first and second order ΣΔ modulator with the same filter and sampling time as the DDPM DAC leads to an error of several LSBs (e.g., up to five in the example of Fig. 2b). It is worth noting that the input-independent optimal sampling condition in (6) rigorously holds for DC signals, and is hence certainly well suited for resolution enhancement for calibration/tuning purposes.

III. STANDARD CELL-BASED DESIGN AND CALIBRATION OF DDPM DACs
The potential limitations of DDPM converters, the benefits of the optimal sampling condition in Section II, and the implications in terms of calibration were explored through the optimized design and the experimental characterization of two DACs with moderate (12 bit, named DAC_12) and high resolution (16 bit, named DAC_16). The designs are part of the 40nm testchip in Fig. 3. Both DACs were designed with a fully-automated digital design flow, with the first-order filter being implemented by simply instantiating the passive components in the form of p-cells, as commonly available from commercial design kits (i.e., they were implemented with simple scripting). The overall design was completed in less than a day, confirming that DDPM converters entail an extremely low design effort.

A. DDPM DACs and Design Considerations
In the DAC_12 design, the first-order reconstruction filter in Fig. 1b was designed by using a 5-pF metal-insulatormetal on-chip capacitor and a high-resistivity poly resistor with a resistance of 300k. The DDPM modulator is very compact, as expected from its digital nature and intrinsic simplicity in Fig. 1b. The micrograph of the testchip in Fig.  3 shows that it occupies only 270m 2 , i.e. approximately a square with only 15m width. Being based on a fully standard cell-based approach, digital-like shrinking is also achieved when using CMOS technologies with finer minimum feature size. At the nominal 1-V power supply voltage, the DAC_12 circuit operates at a clock frequency up to max =900MHz. Since the best performance in terms of linearity and power-resolution tradeoff is achieved at clk =450MHz, the latter will be considered as nominal clock frequency in the following. Thanks to its digital nature, the DAC_12 circuit is able to properly operate down to 665mV (575mV) power supply voltage at clk =450M Hz ( clk =112.5MHz). Under clk =450MHz, the sample rate at the nominal 12-bit resolution is max /2 =110kS/s. A similar architecture was also implemented to explore the potential of DDPM converters, and its resolution limit beyond moderate resolutions of 10-12 bits. Since the plain architecture used for DAC_12 is not able to achieve higher resolution, various techniques were introduced to approach the targeted range of 16 bits. As first consideration, differential operation was adopted to improve the robustness against substrate and supply noise, as well as to double the output voltage swing to further improve the signal-to-noise ratio. To this aim, the DDPM output digital stream and its complementary stream DDPM ̅̅̅̅̅̅̅̅ are generated. Such outputs are then fed to a differential first-order RC reconstruction filter, which comprises two matched 250-k poly resistors,  and a 5-metal 20-pF Metal-insulator-Metal (MiM) capacitor (both automatically instantiated, placed and routed), as in Fig. 4a. This permits to halve the capacitance and hence the related area, compared to two single-ended RC circuits. Regarding the targeted range of 16 bits. As first consideration, differential operation was adopted to improve the robustness the 16-bit DDPM modulator, the nominal clock frequency is 225 MHz at 1-V supply. The digital input is sampled by the modulator at the frequency = /2 =3.4 kS/s, which is derived directly from the clock within the modulator.
As shown in Fig. 3, the overall silicon area of DAC_16 is only 4,730m 2 and is dominated by the filter area (4,200m 2 ), which could be further halved by using the entire 10-metal stack. To achieve higher resolution without significant area penalty, the filter cutoff frequency was set to keep the output voltage error at = lower than ±1/2 LSB for all input codes. The cutoff frequency target was obtained via circuit simulations, leveraging the monotonic reduction in the output error when the filter cutoff frequency is reduced (i.e., more effective harmonics suppression). At the nominal 225MHz clock frequency, the required cutoff frequency was found to be 12kHz, which is 8X higher than the requirement in [19] to reduce the peak amplitude of all DDPM harmonics below the quantization error level. Such 8X increase in the cutoff frequency is enabled by the intrinsic suppression due to optimal sampling as in (6). In turn, such 8X cutoff frequency increase translates into an approximately 8X smaller area of the capacitor and resistor in the reconstruction filter, which are also the dominant contribution as discussed above. In other words, the optimal sampling condition in Section II enables significant area reduction, in addition to the more obvious suppression of spurious harmonics and hence better output accuracy.

B. Digital Calibration
As in any DAC architecture, DDPM-based converters are affected by pulse shape non-idealities, and inter-symbol interference (ISI). In particular, the INL error in DDPM DACs is mainly due to the asymmetric rise/fall transitions and inter-symbol interference, and has a piecewise-linear shape, as illustrated in Fig.3b.
Indeed, for ≤ 2 −1 (i.e., −1 =0), an increase of the input code by an LSB introduces a new pulse and hence an additional rising-falling edge pair, resulting to nearly the same incremental error at each input code increase, and hence a gain error. However, for > 2 −1 (i.e., −1 =1), the increase of the input by an LSB actually reduces the number of rising-falling edge pairs by one, thus leading to a different gain error. This determines a double-slope nonlinearity error, i.e. a piecewise-linear DAC characteristic. Moreover, based on the analysis [19], ISI and power supply noise at the harmonics of the sampling frequency also result in a piece-wise linear characteristics affected by different gain and offset errors over different input code segments.
This suggests the adoption of simple piecewise-linear calibration is sufficient for DDPM converters. In turn, piecewise-linear calibration is easy to implement in a fully digital multi-segment form, thus preserving the fully-digital standard-cell based approach that is distinctive of DDPM DACs. In multi-segment calibration, the dynamic range is divided into 2 M segments, and a different gain and offset correction are applied to the digital input in each segment, as shown in Fig. 4c. At higher (lower) resolution targets, a higher (lower) calibration accuracy is needed and the required number of segments is hence expected to increase (decrease).
For the DAC_16, transistor-level simulations showed that an 8-segment calibration scheme is sufficient to keep INL within ±1/2 LSB at 16 bit resolution, as illustrated in Fig.  4b. This calibration scheme can be simply implemented with two 8:1 MUXes, each being driven by the three most significant bits of the input in,16 :14 , whose value selects the corresponding segment among the eight available as in Fig.  4a. The selected compensation basically inverts the INL curve in Fig. 4c, making the local error within the segment close to zero within the targeted accuracy. In particular, the MUXes select the desired gain (offset ) to compensate the local gain (offset) error in the -th segment, for = 0 … 7. Then, a multiplier and an adder simply generate the calibrated DDPM input in,cal based on the actual input in as follows in,cal = ⋅ in + if 2 3 ≤ in < 2 3 ( + 1) (7) as shown in Fig. 4a. In practical cases, (7) is often directly evaluated by the processor or DSP driving the DAC, thus not requiring any extra area. The values and of the calibration coefficients can be obtained via foreground calibration, measuring the slopes of the DAC static transfer curve, similar to [19]. Interestingly, the calibration coefficients were found to be nearly unaffected by supply and temperature variations, and are weakly sensitive to process variations . Thus, in costsensitive applications, the additional testing time for traditional die-specific calibration can be eliminated at the cost of moderate resolution degradation, adopting a one-time offline calibration that is equal for all dice. Alternatively, full resolution is reached by applying a die-specific calibration at testing time.
The same calibration network in Fig. 4a was also adopted for the DAC_12 circuit, although its lower resolution requires only a simpler two-segment calibration, thus further simplifying the calibration process and implementation.

IV. EXPERIMENTAL RESULTS
The 40nm DAC_12 and DAC_16 testchip in Fig. 3 were characterized under nominal operating conditions , i.e. at 25°C temperature, 1-V supply, CLK =450MHz for the DAC_12 and CLK =225MHz for the DAC_16. The accuracy was tested over process, supply and temperature variations, as discussed below.
The DAC_12 converter was found to consume 50. Based on the results of the dynamic characterization reported in Figs. 6a-b, DAC_12 achieves an SNDR of 72dB at low frequency, which corresponds to an ENOB of 11.6 bits. Moreover, both SFDR and THD exceed 85dB at low frequency. Compared with the DDPM DAC at the same resolution proposed in [29], DAC_12 presented in this paper achieves 2X higher sample rate at half area and 10% less power. The improvement is due to the avoidance of the overhead associated with the specific technique to achieve graceful degradation in [29], as appropriate to highlight the true potential of DDPM DACs (as opposed to aiming to relax system-level design by introducing graceful degradation against uncertain frequency and supply voltage).
This results in a 7dB higher (i.e., better) power efficiency FOM [16], where the FOM is defined as: being BW the bandwidth and the power consumption. Compared with state-of-the-art DACs with comparable bandwidth and/or resolution ranges in Table I, DAC_12 exhibits 52-5,180X lower area than [13]- [18]. For the sake of fairness, the comparison excludes the RC reconstruction filter, as it is not reported in prior art. Such area advantage is due to the simple architecture in Fig. 1, which avoids the need for the area-hungry interpolator, arithmetic and active analog circuitry needed by ΣΔ DACs. This area advantage further increases at finer technologies thanks to its digital architecture, which scales substantially faster than analog counterparts. Also, the avoidance of active analog circuitry  makes the design effort minimal, i.e. in the order of 10 manhours as opposed of more analog-intensive designs that typically require several hundreds of man-hours or more.
Regarding the DAC_16 design, its power consumption at nominal frequency =225MHz was measured to be 45μW. The results of its static characterization after eightsegment calibration are reported in Fig. 7, based on the eightsegment calibration in Section IVB. The RMS INL and DNL respectively are 0.63LSB and 0.52 LSB. Except for a very limited number of outliers (less than 20, i.e., 0.06% of input codes) exceeding ±2 LSB and always within ±9 LSB, the measured maximu m INL is 3.15 LSBs and the maximu m DNL is 2.5 LSBs.
The dynamic characterization of DAC_16 in Fig. 8 was performed on the same die under a sinewave input at 90% of full-scale amplitude with frequency in the 5-75Hz bandwidth. From this figure, the measured SFDR and THD are above 95dB, whereas SNR and SNDR are both 87.5d B at 5-Hz input, corresponding to 14.5 ENOB. A 20dB/dec ENOB degradation is shown at larger frequencies , as expected. For completeness, the DAC_16 circuit was also tested in the presence of process, voltage and temperature (PVT) variations. Under die-specific calibration derived at 1V (i.e., at the cost of increased testing time), the measured static characteristics at supply voltages in the 0.9-1.1V range is reported in in Fig. 9a  2.6LSB (1LSB), and the maximum INL (DNL) deviation from nominal temperature is 2.5LSB. A consistent 2.5X INL/DNL ratio is also observable over temperature, compared to room temperature, which indicates a very similar impact on INL and DNL.
To experimentally quantify the impact of die-to-die variations, the resulting static characterization was repeated over three dice. Conventional die-specific re-calibration of each die was confirmed to completely recover the nominal INL and DNL performance in all cases (results are hence omitted, as they are basically the same as Figs. [7][8]. To quantify the resolution degradation due to the adoption of a simple offline calibration, Fig. 9c plots the static characterization in the three considered dice, using the same calibration coefficients obtained for die #1. In other words, the elimination of the testing time required by die-specific calibration results in an INL ranging from 0.9 to 11 LSB (average is 4 LSB), and a DNL ranging from 0.5 to 0.9 LSB (average is 0.7 LSB). The resulting linearity of the proposed DAC_16 under an offline calibration is still above 12 bits.
Compared with the DDPM DAC proposed by the same authors in [29], the introduction of the optimal sampling condition in Section II and die-specific piecewise-linear eight-segment calibration achieves 3.2 bit higher ENOB at only 6% increased area, 20% lower power consumption, and 30X reduced bandwidth. This results in an overall increase in the FOM by +10dB. At the lower 12-bit resolution of DAC_12, the impact of process, voltage and temperature variations was found to be insignificant, hence the related results are omitted (they are basically the same as Figs. 5-6). State-of-the-art DACs from the recent literature are summarized in Table I. Compared to partially-and fullydigital DACs with comparable bandwidth and/or resolution, the proposed DAC_16 achieves 300X lower area compared to [32], 2,720X lower than [18], and 18,190X compared to [30]. The proposed DAC_16 has 19X lower power consumption compared to [32], 58X lower than [18], and 1,870X compared to [30]). Such reductions in area and power are achieved at the expense of a 12X reduction in the sample rate compared to [18] and [32], and 526X compared to [31], which is not an issue in DACs for on-chip calibration, being their output a DC signal. The favorable area-energy efficiency-performance of the proposed DACs is quantified by the area FOM = + 10 log 10 10 6 where is the feature size-normalized area, which is lower than [31] and [33] only and it is only 3-4dB less than the highest reported in [31].  Figs. 1-3). In [18], only the digital sub-system is considered. b) Area normalized to F 2 (F = process minimum feature size) is relatively constant across CMOS generations in digital architectures, and increases by slightly less than 2X in analog architectures. Hence, the area of [13] ported to 40nm is expected to translate into substantially larger area than this work, even though its normalized area is lower; c) A-weighted; d) based on text and figures; e) analog power only, f) twice the signal bandwidth for oversampled DACs.
From the above comparison with the state of the art of DAC_16 and DAC_12, DDPM DACs are very well suited for cost-sensitive low-power systems with very low design effort, either for baseband signals at moderate resolutions (e.g., 12 bit), or for calibration purposes at high resolutions (e.g., 16 bit).

V. CONCLUSION
In this paper, standard cell-based Nyquist-rate DDPM DACs have been explored in terms of their limits and potential for high resolution, while assuring very low area and design effort. To this aim, techniques to improve resolution have been introduced, including an optimal sampling condition to suppress spurious harmonics. Digital calibration has also been explored, showing that piecewiselinear techniques are sufficient to reach resolutions in the order of 16 bits.
To evaluate the effectiveness of these techniques, two DAC designs in 40nm CMOS have been demonstrated and experimentally characterized targeting moderate (12 bit) to relatively high resolution (16 bit). Both circuits were designed with a fully automated digital design flow based on standard cells, at a design effort in the order of only 10 manhours (i.e., more than an order of magnitude lower than typical DAC designs). Their area was shown to be 370-5,333X smaller than prior partially-digital DAC architectures, and expectedly further smaller than conventional analog designs. Such area efficiency over partially-digital SD DACs is achieved thanks to the avoidance of interpolation, arithmetic and active analog circuitry. The power consumption of 45-50.8µW is equivalent to the lowest reported to date, and 2-3 orders of magnitude lower than other solutions. The power efficiency FOM of 160-163dB is in the middle of the range covered by prior art (i.e., between 140-189dB). Such performance is achieved while not requiring any passive element matching or static DC bias circuitry, as opposed to other state-of-theart DACs.
Overall, this work shows that the introduction of simple techniques, such as an optimal sampling condition and lightweight digital calibration, make DDPM DACs very competitive in terms of area efficiency, power consumption and low design effort for a wide range of resolutions, as required by cost-sensitive applications and low-power constraints.