A 16/32 Gb/s Dual-Mode NRZ/PAM4 Voltage-Mode Transmitter With 2-Tap FFE

This paper describes a voltage-mode transmitter that supports both non-return-to-zero (NRZ) and four-level pulse amplitude modulation (PAM4) signal transmission. A 2-tap feed-forward equalizer (FFE) is implemented with PAM4 equalization logic consisting of three stages of logic gates and two separated voltage-mode drivers for multi-level generation. Equalization is performed in both the NRZ and PAM4 signal transmission modes. A prototype chip is fabricated using a 28-nm CMOS process. When transmitting 500-mVp-p signals using the proposed transmitter, 41.13 mW power is consumed from a 1-V power supply during the 32-Gb/s PAM4 transmission.


I. INTRODUCTION
Global internet protocol (IP) traffic is increasing steadily, and annual global IP traffic is expected to reach 4.8 zettabytes by 2022. Smartphone traffic recently exceeded personalcomputer (PC) traffic, and wireless and mobile device traffic is expected to occupy 71 % of total IP traffic in 2022 [1]. As the fifth generation (5G) of cellular mobile communications approaches, the hardware performance of each device needs to be improved to match the faster network communication speed. Therefore, the demand for high bandwidth and energy efficiency of high-speed serial I/O will be increased further.
To overcome the electrical channel limitations of nonreturn-to-zero (NRZ) signaling and improve the data transmission efficiency, researches on four-level pulse amplitude modulation (PAM4) have been conducted since the early 2000s. However, high power consumption and design complexity have delayed the active adoption of the PAM4 technology. In recent years, research on PAM4 has been resumed The associate editor coordinating the review of this manuscript and approving it for publication was Adao Silva .
vigorously with the scaling of the complementary metal oxide semiconductor (CMOS) process, which can reduce digital-circuit power consumption. As a result, PAM4 has been a key enabler for high-bandwidth optical communications [2], [3] supporting data-center interconnect and has been adopted in many applications such as the Ethernet [4] and chip-to-chip interfaces of large capacity storage systems [5]. Nonetheless, because PAM4 has a drawback of large power consumption, research is still being conducted to improve its energy efficiency. The main focus of this paper is to increase the energy efficiency of PAM4 serial links. In particular, we try to minimize the power overhead of the voltage-mode PAM4 transmitter when it is equipped with equalization and supports both PAM4 and NRZ transmissions.
To support PAM4 reception, mixed-signal based PAM4 receivers [6] and analog-to-digital converter (ADC)based PAM4 receivers [7] have been used. Compared to mixed-signal based receivers, which are widely used in NRZ signaling, ADC-based receivers can perform more complex equalization and symbol detection, using a digital signal processor (DSP). The ADC-based PAM4 receiver was became an attractive solution, as the power of the DSP can be reduced by scaling the CMOS process. To support PAM4 transmission, voltage-mode transmitters [8], current-mode transmitters [9], [10], [11], [12], and a hybrid transmitter [13] that combines a voltage-mode driver with an auxiliary current-injection driver have been studied. The voltage-mode transmitters are predominantly used in low-power mobile applications, owing to their low static current consumption. However, the voltage-mode drivers, which are also referred as sourceseries-terminated (SST) drivers, have difficulty in adjusting the output swing because the output swing is determined by the power supply. To overcome this difficulty, an on-chip regulator is used as an internal power supply in voltage-mode drivers. In the case of current-mode transmitters, the static current of the driver is four times larger than that of the voltage-mode transmitters. However, there is an advantage that the output swing can be controlled easily by the bias current.
Although PAM4 has many advantages over NRZ signaling, it is not the best modulation option for all systems, when considering the power budget, signal-integrity requirements, channel conditions, etc. Moreover, in most industries, there is always an inertia to maintain the existing environment, and thus, it would be difficult to create a new system for PAM4 alone. For example, the PCIe (Peripheral Component Interconnect Express) standard, while supporting the existing NRZ interface method, the addition of the PAM4 interface specification is in progress [14]. Therefore, it is necessary to switch from the existing NRZ to the PAM4 interface method so that the transmitter and the receiver support the NRZ and the PAM4 at the same time. Fig. 1 shows an exemplary dualmode NRZ/PAM4 transceiver architecture. In the ADC-based receiver, the mode change between NRZ and PAM4 can be done in the DSP, without reconfiguring the analog front-end (AFE) circuits and ADC. However, for the mode change of the transmitter, the analog and digital circuits such as the serializer and digital-to-analog converter (DAC) driver need to be reconfigured.
In the high-speed link, an equalizer is necessary to overcome the inter-symbol interference (ISI) due to channel loss. In an ADC-based receiver, the discrete-feedback equalizer (DFE) or feed-forward equalizer (FFE) are implemented in the DSP. The tap number and weights of the equalizer are  related to the resolution of the ADC. As the ADC resolution increases, the power consumption of the ADC and DSP increases, thereby increasing the power consumption of the entire link. When an FFE is implemented on the transmitter side [15], [16], [17], the eye-height is increased and the resolution of the ADC can be reduced; thus, the energy efficiency of the entire link can be improved. In this study, the transmitter focused on minimizing the power of the data path and maximizing power efficiency while removing the main ISI with a fixed first post tap coefficient.
In our behavior simulation review, the 2-tap TX FFE equalizer on the −10 dB insertion loss channel reduces the ADC resolution from 6 to 5 bits. As a result, the gate count of the DSP that follows the ADC and performs equalization (decision feedback equalization and feed forward equalization) and CDR (clock and data recovery) can be reduced by about 21 %. This paper presents an energy-efficient voltage-mode transmitter that supports both 16-Gb/s NRZ and 32-Gb/s PAM4 transmissions with a 2-tap FFE. Section II explains the FFE concept and describes the proposed dual-mode transmitter architecture. Section III shows the transmitter circuit in VOLUME 10, 2022  detail. Section IV presents the measurement results of a prototype chip fabricated using a 28-nm CMOS process, followed by the conclusions in Section V. Fig. 2 shows the conceptual signal transition diagrams of the NRZ and PAM4 signaling. In NRZ signaling, there are two transition cases, from 0 to 1 and 1 to 0, and only one amplitude exists for the signal transition. In the case of PAM4, there are 12 transition cases and three transition amplitudes-major, intermediate, and minor. Because of this discrepancy between the NRZ and PAM4 signals, the NRZ equalization method cannot be applied directly to PAM4.

II. PROPOSED ARCHITECTURE
The step responses of the 2-tap FFE for NRZ and PAM4 are shown in Fig. 3. When the FFE is applied to NRZ, the static signal level is reduced by one tap-weight coefficient α. This is called de-emphasis. On the other hand, the de-emphasized PAM4 signal required three tap-weight magnitudes that are proportional to the transition amplitudes. In other words, three tap-weight magnitudes (3β, 2β, and β) should be applied for the major, intermediate, and minor transitions, respectively.
The architecture of the proposed dual-mode NRZ/PAM4 voltage-mode transmitter with a 2-tap FFE is shown in Fig. 4. The driver consists of two independent sub-drivers. A highswing driver (HSD) generates the D00 and D10 levels, and a low-swing driver (LSD) generates the D01 and D11 levels for PAM4 signaling [18]. The HSD and LSD are segmented to 36 and 12 units, respectively, for multi-level generation. The operation mode of the NRZ or PAM4 and the activation of equalization are determined by the mode selector. The serializer generates 2-bit data for PAM4 or 1-bit data for NRZ. The PAM4 equalization logic generates data for the PAM4 FFE and passes them to the mode selector. The pre-driver for driving the NRZ/PAM4 DAC driver consists of CMOS buffers. An internal phase-locked loop (PLL) or an external   clock source provides the required high-frequency clock for high-speed serial data generation. The maximum target frequency of the clock is 8 GHz. The parallel pseudo-random bit sequence (PRBS) generator and impedance controller of the driver are programmed using the inter-integrated circuit (I 2 C) protocol, and the related digital circuits are synthesized. In addition to the output driver, the entire data path circuits from the serializer to the pre-driver are configured as a voltage-mode circuit instead of a current-mode circuit to increase power efficiency.

III. CIRCUIT IMPLEMENTATION A. SERIALIZER
The half-rate serializer in Fig. 5 serializes parallel data to 1-bit NRZ data and 2-bit PAM4 data, using the 8 GHz high-speed clock and its divided clocks. Fig. 5 shows a 2-to-1 multiplexer (MUX) with 1 unit interval (UI) delay, Z −1 , for the 2-tap FFE; one-hot encoder; and the entire serializer structure where the data paths of the 32-to-1 NRZ and 64-to-2 PAM4 are separated. This separated structure occupies a huge area and consumes a large amount of power during the clock distribution. The one-hot encoder at the last stage of the PAM4 data path is composed of NOR gates that have delay variations depending on the pattern of the input data. Therefore, data-dependent jitter (DDJ) of approximately 10 ps is VOLUME 10, 2022 generated at the output of the one-hot encoder. To remove this DDJ, a re-timer with 16 GHz full-rate clock is utilized. However, it is difficult to generate and distribute a full-rate clock in a power-efficient manner. Fig. 6 shows a modified serializer where the one-hot encoder is placed before the 2-to-1 MUXs. The output signals of the one-hot encoder are retimed using an 8 GHz halfrate clock. This modification requires two additional 2-to-1 MUXs with a Z −1 delay. Compared with the previous structure in Fig. 5, the DDJ of the serializer output is reduced by 90 % from 10 ps to 1 ps. By sharing a 64-to-4 serializer between the NRZ and PAM4 paths, we can reduce the power consumption and area of clock distribution.

B. NRZ/PAM4 DAC OUTPUT DRIVER
In our previous study [18], a voltage-mode transmitter for PAM4 that used two independent drivers (HSD, LSD) and regulators was introduced. Controlled by the one-hot encoder, the high-swing driver (HSD) and the low-swing driver (LSD) did not operate simultaneously. Therefore, unlike other PAM4 voltage-mode drivers with multiple segmentation [19], [20], even if the data path is divided into two, the power consumption does not increase significantly. In this study, a design change was made by segmenting the driver to generate multiple voltage levels for equalization while adopting the low-power feature of the PAM4 voltage-mode driver structure in [18]. Fig. 7 shows the NRZ/PAM4 DAC output driver architecture. To support multi-level generation and 2-tap FFE, the HSD and LSD are segmented into 36 and 12 units. The output voltages of VREG0, VREG1, and VREG2 are set to 500 mV, 333 mV, and 167 mV, respectively. T-coils are added to overcome the bandwidth limitation caused by the ESD protection diodes and package loadings.  Fig. 8, 16 output voltage levels are generated. In the case of the HSD, the segmented drivers are grouped into 4 groups, x29 (main driver), x4, x2, and x1. The turn-on resistance of the unit (x1) HSD driver is 1800 . On the other hand, the LSD is grouped by x9 (main driver), x2, and x1. The target resistance of the unit LSD driver is 600 . The total resistance can be calibrated using an I 2 C for impedance matching. The impedance control step is approximately 2 and the impedance range of the controllable 4-bit driver covers the entire process, voltage, and temperature variation.
When equalization is not applied, the 4 output levels of the driver are D00_0, D01_1, D11_1, and D10_0. The additional 12 output levels for equalization are created according to the input conditions of the main driver and the other groups.

C. PAM4 EQUALIZATION LOGIC
To remove the ISI properly using the 2-tap de-emphasis FFE, the tap-weight magnitudes should be proportional to the three transition amplitudes of the PAM4 signal. Fig. 9 describes the output levels of the DAC output driver according to data transitions. When de-emphasis is applied, the output levels of the output driver are determined by the PAM4  The schematic of the PAM4 equalization logic is described in Fig. 10. The PAM4 equalization logic consists of 14 identical logic units. Each logic unit has three stages with four AND and three OR gates. If the output delay of the PAM4 equalization logic is changed according to the input data patterns, the resultant skew of the data path can cause DDJ. Therefore, the implemented equalization logic is based on identical units, to minimize the skew among the 14 outputs.

D. MODE SELECTOR
The mode selector determines the operation mode of the output driver. Fig. 11 depicts the mode selector. The inputs of the mode selector are the PAM4 data (HS0P     Fig. 12(a) shows the output network model of the driver. An ESD clamp diode is added to withstand 2-kV human body model (HBM) and the parasitic capacitance (C ESD ) is approximately 300 fF. A T-coil [23] is inserted to overcome the bandwidth problem caused by the parastic capacitance. The inductances of L1 and L2 is 300 pH and the coupling coefficient (k) is approximately 0.5. The layout of the T-coil is shown in Fig.12(b). The inductors L1 and L2 are implemented with two thick metal upper layers. The simulation model of the T-coil is extracted using EMX of   Integrand Software, Inc. [24]. Fig. 13 shows the 32-Gb/s PAM4 eye-diagrams with and without the T-coil. The T-coil improves the eye-height and eye-width by 20 mV and 0.1 UI, respectively. Using T-coil, the insertion loss (S21) is improved by 0.4 dB (from −1.2 dB to −0.8 dB) and the return loss (S11) is successfully suppressed below −10 dB at 8 GHz.

IV. EXPERIMENTAL RESULTS
A 16/32 Gb/s dual-mode NRZ/PAM4 voltage-mode transmitter with 2-tap FFE was fabricated using a 28-nm lowpower CMOS process. As shown in the micro-photograph presented in Fig. 14, the total area of the transmitter, including the power-decoupling capacitor and T-coil, was 0.16 mm 2 . The total test chip area was 3.56 mm 2 , including the I 2 C and I/Os. The test chip was mounted on a printed circuit board (PCB) via the chip-on-board (COB) method. The wire bond length of the COB was approximately 0.2 cm and the FR4 PCB trace from the bonding pads to the SMA connector was approximately 3 cm. Fig. 15 shows the test board and test setup. The test chip was calibrated using an I 2 C interface. A Keysight J-BERT N4903B supplied a high-speed clock of 8 GHz and measured the bit-error rates. The output eye-diagrams of the transmitter were obtained using a Tektronix DSA71254C instrument and P7313SMA differential probes. The termination voltage was set by a P7313SMA [25]. The driver impedance was manually controlled by the I 2 C interface.
The eye-diagrams of the 16-Gb/s NRZ operation mode are shown in Fig. 16. Without equalization, the eye-height and eye-width were measured to be 312 mV and 0.68 UI, respectively. With equalization, the eye-height and eye-width were improved slightly, to 322 mV and 0.78 UI, respectively. Fig. 17 shows the eye-diagrams of the 32-Gb/s PAM4 operation mode. Without equalization, eye-height and eye-width were completely closed. With equalization, eye-height and eye-width were improved to 50 mV and 0.32 UI on accumulating 7,000 UIs, respectively. The eye-opening for 10 −12 bit error rate (BER) is 0.115 UI confirmed from the measured bathtub curve. The measurement of the level separation mismatch ratio (R LM ) of PAM4 is shown in Fig. 18. The measured waveform confirmed the obtained R LM of 0.96.
The power breakdown of the transmitter in the various operation modes is shown in Table 1. The clockdistribution circuit contributes to the largest part of the power VOLUME 10, 2022 consumption. Without equalization, the total power in the NRZ and PAM4 modes were 26.79 mW and 41.13 mW, respectively. When equalization was applied, the transmitter powers of the NRZ and PAM4 modes were increased by approximately 0.5 mW and 5.7 mW, respectively.

V. CONCLUSION
A voltage-mode transmitter with 2-tap FFE that supported both NRZ and PAM4 signal transmission was designed and implemented using a 28-nm CMOS low-power process. The power-efficient 2-tap FFE for the PAM4 transmitter was realized by limiting the flexibility of the output signal levels and minimizing the overhead for the equalization. The proposed PAM4 data path operated by the one-hot encoder had low power characteristics due to the low toggle ratio. In addition, even if equalization was applied, the current consumption was increased by only 14%. In the NRZ and PAM4 modes, the entire transmitter exhibited energy efficiencies of 1.69 and 1.28 pJ/bit, respectively.

ACKNOWLEDGMENT
The EDA tool was supported by the IC Design Education Center (IDEC), South Korea.