• Abstract

# Design and Optimization of a 71 Gb/s Injection-Locked CDR

High data rate wireline systems suffer from increasing complexity and design difficulty due to stringent system specifications and circuit and technology challenges. A methodology therefore must exist which allows circuit and system challenges to be dealt in an effective manner while paying close attention to the extensive coupling between these two domain. In this work we look at the problem of system and circuit design of a 71 Gb/s Clock and Data Recovery circuit (CDR) in a 180 nm SiGe process. To provide power efficient and robust clock recovery (CR) circuits for this system, an injection locked CR block has been implemented, leading to a reduction in circuit components and power consumption over conventional CDRs. The design methodology is based on an iterative approach alternating between circuit and system level design optimization. The core of the circuit consumes 136 mW from 3.3 V supply. The total circuit consumes 514 mW, including 60 mW for the limiting amplifiers.

SECTION I

## Introduction

KEEPING up with state of the art in process technology, and ever increasing demand for higher data rates, wire-line data communication systems have seen the advent of successively higher data rates, extending beyond 10 Gb/s to 20 and 40 Gb/s nodes. The motivation for this research stems from the need to achieve these rates with low-power, cost-effective and robust circuits.

In evolving to such higher data rates, it is necessary to dispense with some of the conventional circuit design. As data rates continue to increase, the conventional PLL based CDR architectures do not offer scalability and low-power consumption due to inherently complex circuitry. To provide seamless scalability and low-power consumption, open-loop CDR architectures, using injection-locking principles, were developed [1]. In this paper we develop and investigate a methodology to design complex high-speed components, in a top-down—bottom-up approach alternating between circuit design and system modeling to study and quantify the performance tradeoffs in such systems. This work highlights how modern system design tools can be used to advantage for the design of highly efficient and complex circuits. The organization of the paper is as follows. Section II looks at system architecture and provides a component overview. Section III examines circuit implementations and all the basic components. Section IV describes the system modeling for circuit level optimization. The final section shows simulation results and performance comparisons with available technologies.

SECTION II

## System Architecture

The system architecture of the wireline receiver is shown in Fig. 1. The front-end consists of a wireline detector/demodulator (not shown here) followed by a set of limiting amplifiers which operate over a wide dynamic range of input voltages and produce a constant swing of 200 mVP-P.

Fig. 1. Block Diagaram of the system.

The topology is laid out as a clock recovery path and two data re-timing and de-serializing paths (Fig. 1). The data stream is at 71 Gb/s and is fed into data and clock recovery branches. The architecture operates in two stages for recovering a phase synchronized clock. In the first stage, a harmonic generator is employed to generate spectral energy at the desired clock frequency. The optimization of this harmonic generator circuit will ensure that maximum power is available to steer the injection system towards the frequency and phase of the incoming signal. After this point, a fundamental clock component and its higher harmonics are available for the precise locking of the injection system. The injection system shown in the Fig. 1 consists of a VCO fed by the signal from harmonic generator. The signal is fed into the frequency determining core of the VCO (used as an injection locked divider) to lock the free running frequency in precise phase/frequency synch with the input data signal. With the VCO lock achieved, a synchronized clock signal of 35.5 GHz is available at its output. The VCO output feeds a differential clock buffer, the output of which is a twin phase clock. The clock samples the data signal at both its rising and the falling edges to achieve a 1:2 demultiplexing of the 71 Gbps data into dual channels of 35.5 Gbps each. For proof of concept demonstration, only one of the demultiplexing channels has been implemented. The data recovery path has been implemented using a single D-Flip Flop (D-FF) with buffers at both the input and output.

SECTION III

## Circuit Design and Implementation

Shunt peaking is widely used to improve the bandwidth. The peaking inductors have been implemented by shunt short stubs (characterized in Ansoft HFSS).

### A. Limiting Amplifiers

The limiting amplifier (LA) is used as the first block of the receiver chain in order to provide a constant eye-opening over a wide input dynamic range, for data rates upto 71 Gb/s. The LA consists of three stages with each being a modified cherry-hooper (C-H) amplifier. The C-H amplifier consists of two cascaded stages, a trans-admittance stage (TAS) and the trans-impedance stage (TIS). The large impedance mismatch between alternating TAS-TIS stages helps each stage maintain its bandwidth over a larger range. This is because the effective loading, seen by a stage looking into the succeeding one, is reduced due to the high reflection coefficient as a result of the mismatch. Further all the nodes in the circuit become low-impedance due to this architecture. This also contributes to its large bandwidth [2]. A modified version of the classic C-H amplifier shown in Fig. 2 has been implemented for the LA. Instead of using purely resistive feedback from the TIS to the TAS, an emitter follower has been included in the feedback path [3]. The emitter followers (Qf) serves to increase the bandwidth of the circuit and also to alleviate the headroom problem faced by the TAS stage in the traditional C-H topology. Additionally, it helps to substantially reduce the feed-forward, thus further improving the bandwidth. The collector resistance has been split into resistances R1 and R2. It has been shown [3] that the gain of this cell is boosted by (1+R2/R1) without affecting the bandwidth of the circuit significantly if 0 < R2/R1 < 2.5. The LA consumes 60 mW from a 3.3 V supply.

Fig. 2. Schematic diagram of the modified C-H amplifier.

### B. Harmonic Generator

There is no power available in PRBS sequences for NRZ formatted data at the clock frequency. Hence it is necessary to generate the super harmonics to inject into the VCO. Digital implementation of a harmonic generator by a XOR gate is unsuitable for very high speed applications [1]. Since the target of the proposed CDR is 70 Gb/s, a purely analog block has been used here. A differentiator is used to generate the pulse edges from the data followed by a wired OR circuit for rectification (Fig. 3). This block is critical, as the amount of power it generates at the harmonics determines the locking range. Hence this block has been carefully modeled and optimized as will be described later. The circuit consumes 56 mW from a 3.3 V supply.

Fig. 3. Schematic diagram of harmonic generator.

### C. VCO

A cross-coupled oscillator is designed with a center frequency of 35.5 GHz and a tuning range of 6.8 GHz (Fig. 4). The phase noise achieved was −90 dBc at 1 MHz offset. The VCO topology chosen is a capacitive emitter-degenerated cross-coupled oscillator. The tank of the VCO is implemented using transmission lines and metal-insulator-metal (MIM) capacitors. Capacitive degeneration is used to improve tuning range and phase noise [4]. The signal is injected into the VCO by its tail current transistor. This arrangement acts as a divide-by-two configuration as the injected signal at 71 GHz (twice the VCO center frequency) is already present at the emitter coupled node of the VCO. Hence the injected signal pulls the VCO into phase alignment, and the output of the VCO is the desired phase-synchronized clock.

Fig. 4. Schematic diagram of injection-locked VCO.

### D. Latch

CML based high-speed latches have a high headroom requirement because of the stacking of clock and data levels (series gating of clock and data). Further, the clock and data signal levels are different, requiring additional level shifting circuitry. This problem is eliminated in the LVL latch design [5]. These latches have the advantage of parallel data and clock gating. Hence the clock and the data can be at the same signal levels. Moreover, these circuits can work even at a supply voltage as low as 2.0 V because of the reduced transistor stacking, thus providing reliable operation even in case of supply variation below 3.3 V. The latch outputs have switched emitter followers, hence no separate emitter followers are required to drive succeeding latches or buffers. The latch consumes 44 mW from the dc supply.

SECTION IV

## Modeling and System Optimization

Circuit simulation reveals that the power in the clock component is a very sensitive function of the differentiator transfer function. The effectiveness of injection locking is a direct function of the power injected into the VCO. Therefore the phase lock quality of the system is dependent on how effectively the input signal power was transformed into the spectral power at the clock frequency. Since this transformation is nonlinear, it is not amenable to precise analytical expression. To understand the circuit tradeoffs involved, it is imperative to develop a system level simulation to provide optimal values for circuit parameters such as transistor lengths and passives. A block diagram of the simulation model used is shown in Fig. 5. In this model, the pseudo-random data signal is passed through the differentiator transfer function to yield a sequence of alternating pulses. These are then rectified and fed to a periodogram to perform spectral analysis. The signal at the output of periodogram can be written as TeX Source $$M(f) = W(\vert X(f)\cdot H(f)\vert)\eqno{\hbox{(1)}}$$where W(.) is the kaiser window function to define and extract finite spectral energy, H(.) is the transfer function for the composite difference amplifier and second order filter, and X is the input signal spectrum. The signal M is then processed by the operator F which extracts the particular index f = fclk from the signal. The output of the block is then used to form an optimality criterion for the CDR system. The criterion is defined as, TeX Source $$C(H) = {\sum W(X(f))\over F(M)} = {P(signal)\over P(clock)}\eqno{\hbox{(2)}}$$such that TeX Source $$F(M) = M(f)\vert_{f= f_{clk}}\eqno{\hbox{(3)}}$$Where C is the defined optimality criterion based on the ratio of power in the signal and the power in the clock frequency. The limits for the windowing function W(.) can be defined arbitrarily depending upon the spectral purity requirement set by the VCO injection and phase noise response. It can be seen that the optimality criterion C(H) is a function of the filter transfer function which itself is the tunable parameter for the entire system model. After defining the optimality criterion, the first task in using the system was to form the spectral line at the desired clock frequency and study the relative power spectrum. The result of the investigation appears in Fig. 6, which shows the overlap of signal and clock spectra. It can be seen that the clock component in red contains some finite power which will be optimized in the next step. An optimizer was then used to run the model in Simulink with several component values of the differentiator. The results are plotted in Fig. 7, which demonstrates the optimum component values of the differentiator with regard to the cost function described by Eq (2).

Fig. 5. Block Diagram of the Optimization System.
Fig. 6. Clock recovery illustration and performance.
Fig. 7. Optimization Criterion vs Optimized Parameters.
SECTION V

## Simulation Results

To validate the proposed approach to designing ultra highspeed CDRs, extensive simulations were run in Cadence (Spectre). The circuit was found to be functional up to a minimum input voltage of 28 mVP-P (−18 dBm). The reported results, though are for 100 mVP-P (−7 dBm) as it is assumed that the trans-Impedance amplifier at the front-end of the wireline transceiver is capable of supplying this power. The test input data rate is 71 Gb/s. The spectrum of the input data is shown in Fig. 8(a). It can be observed that there is a null at 71 GHz (clock frequency). The harmonic generator differentiates and then rectifies the data to generate power at the clock frequency, i.e., 71 GHz. The signal is then fed into the VCO using tail current injection.

Fig. 8. (a) Data Spectrum for NRZ PRBS at 71 Gb/s (b) Injection Locked Clock Spectrum.

The VCO locks onto the signal and the output frequency of the VCO is half of the injected signal which is 35.5 GHz as shown in Fig. 8(b). The output of the VCO is used as a half-rate clock. The clock is used to sample the data in the D-FF (which acts like one channel of a 1:2 Demux). The output of this Flip-flop is exactly at half the input data rate i.e., 35.5 Gb/s. Fig. 9 shows an eye opening (amplitude) of about 200 mVP-P single-ended (400 mV differential). It can also be observed that the data rate is now exactly 35.5 Gb/s. This demonstrates that the clock and data are synchronized as otherwise the bit duration at the output would have varied, and hence produced a closed eye.

Fig. 9. Re-timed, De-serialized eye at 35.5 Gb/s.
SECTION VI

## Summary

This paper incorporates ideas from low-voltage logic, injection locking, broadband amplifier design and system level optimization to successfully demonstrate that higher level system modeling can be used to perform component level optimization of circuits for extremely high performance.

TABLE I CDR Parameters
TABLE II Comparison With Other CDR Works

## Footnotes

Tonmoy S. Mukherjee, Mohammad Omer, Jihwan Kim and Kevin T. Kornegay are with the School of Electrical and Computer Engineering, Georgia Institute of Technology Atlanta, GA 30332-0250, USA.

## References

1. A full-rate injection locked 10.3 Gb/s clock and data recovery circuit in a 45 GHz ft SiGe process

J. Zhan, J. Duster, K. T. Kornegay

Proc. CICC, 2005-09, 557–560

2. Design considerations for very-highspeed Si-bipolar IC's operating up to 50 Gb/s

H.-M. Rein, M. Moller

IEEE J. Solid-State Circuits, vol. 31, issue (8), p. 1076–1090, 1996-08

3. A 60-dB gain, 55-dB dynamic range, 10-Gb/s broad-band SiGe HBTlimiting amplifier

Y. M. Greshishchev, P. Schvan

IEEE J. Solid-State Circuits, vol. 34, issue (12), p. 1914–1920, 1999-02

4. A high fosc/fT ratio VCO in SiGe BiCMOS technology

J. Zhan, J. Duster, K. T. Kornegay

IEEE Microw. Wirel. Compon. Lett., vol. 15, issue (3), p. 156–158, 2005

5. 2.5 V 43–45 Gb/s CDR circuit and 55 Gb/s PRBS generator in SiGe using a low-voltage logic family

D. Kucharski, K. T. Kornegay

IEEE J. Solid-State Circuits, vol. 41, issue (9), p. 2154–2165, 2006-09

6. A 20 Gb/s burst-mode CDR circuit using injection-locking technique

J. Lee, M. Liu

ISSCC, 2007-02, 586

7. 40 Gbit/s fully monolithic clock recovery IC usingInAlAs/InGaAs/InP HEMTs

K. Murata, Y. Yamane

Electron. Lett., vol. 36, issue (19), p. 1617–1618, 2000-09

8. A 43-Gb/s clock and data recovery OEIC integrating an InP-InGaAs HPT oscillator with an HBT decision circuit

H. Kamitsuna

IEEE J. Sel. Top. Quantum Electron., vol. 10, issue (4), p. 673–678, 2004-07/08

## Cited By

No Citations Available

## Keywords

### INSPEC: Non-Controlled Indexing

No Keywords Available

### More Keywords

No Keywords Available

No Corrections

## Media

No Content Available
This paper appears in:
International Symposium on Circuits and Systems
Issue Date:
2009
On page(s):
177 - 180
ISBN:
N/A
Print ISBN:
978-1-4244-3827-3
INSPEC Accession Number:
10760375
Digital Object Identifier:
10.1109/ISCAS.2009.5117714
Date of Current Version:
26 Jun, 2009

### Articles of Influence

#### One-sided object cutout using principal-channels

© Copyright 2011 IEEE – All Rights Reserved