Optimal Optical Receivers in Nanoscale CMOS: A Tutorial

The integration of optical receivers in nanoscale CMOS technologies is challenging due to less intrinsic gain and more noise compared to SiGe BiCMOS technologies. Recent research revealed that low-noise, high-gain, and low-power CMOS optical receivers can be designed by limiting the bandwidth of the front-end followed by equalization techniques that benefit from good switching characteristics offered by CMOS technologies. In this tutorial brief, the operation of decision-feedback equalization, feed-forward equalization, and continuous-time linear equalization is reviewed in the context of high baud-rate 2-PAM and 4-PAM modulation. Recent advances and techniques in 4-PAM optical receivers are reviewed and compared in terms of speed, sensitivity, bandwidth, and efficiency.

unscalable. CMOS optical receivers integrated with the SerDes obviate this problem and reduce size and cost.
Whereas SiGe BiCMOS offers high intrinsic gain, bandwidth, and low noise, nanoscale CMOS offers good switching circuits, including some recently reported equalizer circuits, but suffers from less gain and more noise. Therefore, there is a need to rearchitect receivers' analog front-ends to leverage nanoscale CMOS technologies' strengths.
The conventional way of supporting higher data rates in optical receivers is to extend the front-end bandwidth. However, this generally implies lower transimpedance [4], [5]. To break this trade-off, the bandwidth of the TIA can be intentionally limited below the conventional target of 0.5 × baud rate, allowing for higher gain at the cost of intersymbol interference (ISI). ISI can be corrected using equalization techniques suited to nanoscale CMOS implementation. This optimization is well studied for 2-PAM modulation [6] and many prototypes leveraging different equalization techniques were developed [7]- [12]. However, 4-PAM modulation is more susceptible to bandwidth limitations because ISI is three times larger (relative to eye height) than in 2-PAM. As a result, it is important to study this optimization in the context of 4-PAM.
Section II of this tutorial covers continuous-time linear equalization (CTLE), feed-forward equalization (FFE), and decision feedback equalizer (DFE)-based optical receivers. Section III compares these equalization techniques. Section IV reviews recent advances in optical receiver design with emphasis on 4-PAM optical receivers, where we also look at design trends. Finally, Section V concludes the tutorial.
II. FRONT-END OPTIMIZATION Shunt-feedback transimpedance amplifiers (SFTIA), particularly inverter-based, are the most popular nanoscale CMOS TIA in recent years [3], [12]- [16]. Inverters offer high linearity, high transconductance per unit bias current, self-biasing in a feedback configuration, and high swing. We consider the TIA in Fig. 1 (a) with the small-signal model in Fig. 1 (b).
In this model, the input capacitance, C IN , is the sum of the photodetector capacitance, C PD , the pad capacitance, C PAD , and the inverters' gate-to-source capacitances, C gs . C a is the capacitance of the following stage, and R a is the output resistance of the TIA. The combined transconductance of the NMOS and the PMOS devices is g m , and R f is the feedback resistor. Finally, the model includes the gate-to-drain capacitance, C gd , which is important due to the Miller effect. Specifically, 1/R f C gd could become the dominant pole for large transistor sizes and feedback resistances. where and Some parameters in this model are coupled. Namely, the transconductance, C gs and C gd are coupled by the technology transition frequency, f t , and, as the gain of the inverter, A = g m R a is the transistors' intrinsic gain, so R a and g m are coupled. Table I summarizes the numerical values that will be used in this brief alongside the relationships between coupled parameters. We will use g m and R f as our design parameters. Model values are based on [3], [12] which report a 64 Gb/s and 2-PAM, and a 100 Gb/s 4-PAM optical receivers, respectively. For our simulations, we target a bit rate of 64 Gb/s in the 2-PAM case and 64 Gbaud (128 Gb/s) in the 4-PAM case.
In the following subsections, we take the following approach: 1) describe and illustrate the equalization technique used, 2) calculate the worst-case eye opening assuming 2-PAM signaling using peak distortion analysis, 3) calculate the output-referred noise from the model, 4) calculated the worst-case signal-to-noise ratio, SNR WC at the output of the receiver, and finally 5) extend the conclusions to 4-PAM.

A. CTLE-Based Optical Receivers
We first consider a SFTIA followed by a CTLE stage that recovers a part of the bandwidth and reduces ISI. As such, the TIA in (1) can be redesigned to have 1/χ times less bandwidth compared to an unequalized (UE) implementation. An ideal unity-gain CTLE stage that recovers the full bandwidth has the transfer function: where, f TIA is the 3-dB bandwidth of the TIA preceding the CTLE, and Q is the quality factor of the CTLE, taken here as 1/ √ 2. The zeros of the CTLE perfectly cancel the poles of the TIA in (1), and the pole frequencies of the CTLE are χ times higher relative to those of the preceding TIA. It should be noted that a practical CTLE stage has more poles than zeros.
The transfer function from the input to the output is the product of (1) and (4). Thus, R f can be increased, reducing the bandwidth of the TIA while the CTLE stage recovers that bandwidth. Practically, the value of χ cannot be too large because it leads to: 1) excessive peaking in the CTLE stages leading to gain and group delay variations; 2) decreased tunability and increased susceptibility to PVT variations [18].
As the total bandwidth of the TIA/CTLE (χ f TIA ) becomes smaller, i.e., below 0.5× baud rate, the signal will not have sufficient time to settle, leading to a degradation of gain. Moreover, this leads to ISI, further reducing eye opening. This is illustrated in the pulse responses shown in Fig. 2 (a) for various χ f TIA , where precursors and postcursors are introduced when the bandwidth is far below the baud rate. From this, for 2-PAM, the worst-case eye opening, V ISI is calculated from the main cursor V A,0 and the i th pre/postcursors, V A,i : This method for finding eye-opening is peak distortion analysis (PDA), and is extensible to 4-PAM [19].
To understand the benefit of a CTLE, we next define the worst-case signal-to-noise ratio (SNR WC ) as a function of f 3dB /f baud and χ . We will use f 3dB to refer to the overall 3-dB bandwidth of the TIA/CTLE in CTLE-based receivers, and to the 3-dB bandwidth of the TIA in the FFE and DFEbased receivers. We begin by considering the noise sources in the SFTIA: the channel thermal noise, I 2 n,g m = 4kTγ g m , and the thermal noise of the feedback resistor, I 2 n,R f = 4kT/R f . The calculation of the noise at the output of the TIA can be  simplified by splitting I 2 n,R f as in Fig. 1 [20]. The resulting TIA output power spectral density, S out , is where Z o is the output impedance of the TIA, We later use (6) in FFE and DFE noise calculations. Here, we are interested in the power spectral density at the output of the CTLE stage, S CE (s) Finally the worst-case signal-to-noise ratio (SNR WC ) is defined as the ratio of the eye-opening found from PDA to the RMS noise.
We plot SNR WC as a function of χ and f 3dB /f baud as shown in Fig. 2 (b) and (c), respectively. In constructing these plots, we sweep the values of g m and R f and pick the best achievable SNR WC for a given f 3dB /f baud or χ .
From Fig. 2 (b), we observe SNR WC improves as χ increases. However, this improvement is more pronounced when going from χ = 1 to χ = 1.5 compared to going from χ = 1.5 to χ = 2. This is because, while employing a CTLE with a reduced-bandwidth TIA helps in suppressing white noise, the colored noise is unaffected [6], [18], and using large values of χ provides only marginal improvement because the colored noise component dominates.
The worst-case SNR, SNR WC , is plotted as a function of f 3dB /f buad in Fig. 2 (c) for a UE TIA, and a TIA followed by a CTLE with χ = 2. For 2-PAM signaling, the optimal f 3dB in the UE case is 0.3× f baud and it increased to 0.39× f baud in the CTLE-based receiver. A lower f 3dB results in ISI that degrades SNR WC while f 3dB larger than 0.3× f baud increases the outputreferred integrated noise voltage also degrading SNR WC . The CTLE implementation has a 3 dB better SNR WC compared to the UE implementation. For 4-PAM modulation, in the CTLE-based receiver, the optimal f 3dB is around 0.53 × f baud compared to 0.45 in the UE implementation, and the CTLE provides around 4.7 dB of SNR WC improvement. We note that the optimal f 3dB for 4-PAM is 1.38 × higher (relative to baud rate) than for 2-PAM, significantly less than the 2× increase in data rate afforded by 4-PAM. We also note the bandwidth of the TIA in the CTLE-based implementation is less than that of the UE TIA.

B. Feed-Forward Equalization
A feed-forward equalizer (FFE)-based optical receiver can be modeled as shown in Fig. 3(a). Each FFE tap produces a delayed, scaled version of the input pulse. By adding a timeshifted and scaled version of the signal, pre-and post-cursors can be reduced. This operation is demonstrated in Fig. 3 (b) for a three-tap FFE. Once tap weights are set, the worst-case vertical eye opening is calculated from the equalized pulse response using (5).
When selecting tap weights in FFE-based receivers, noise enhancement of FFE should be considered. In FFE-based receivers, the FFE filter sums scaled and delayed versions of the same signal, and considering that noise at the output of the TIA is colored, noise samples present in these signals are correlated. This should be considered when calculating the output noise power. To calculate output noise power, we begin by calculating the autocorrelation of the noise at the output of the TIA, The output-referred RMS noise voltage at the output of an N-tap FFE can then be calculated as follows: Here, α i and α k is the i th and k th tap coefficients, respectively. As can be seen, tap coefficients appear both in V ISI and V n,FFE calculations. Therefore, the optimal coefficients maximize SNR WC as opposed to minimizing ISI or noise. Tap weights can be calculated using adaptive algorithms that minimize the error between the output of the FFE and a training sequence. Alternatively,they can be calculated from the pulse response and noise autocorrelation function [21].
Finally, SNR WC can be calculated using (10). Fig. 3 (c) plots SNR WC versus f 3dB /f baud for both 2-PAM and 4-PAM receivers. The optimal bandwidth of a 3-tap FFE-based receiver for the 2-PAM case is around 0.13 × f baud , and it offers 3.4 dB of SNR WC improvement. In the case of 4-PAM, the optimal bandwidth is 0.25 ×f baud with 4.5 dB of SNR WC improvement.

C. Decision Feedback Equalization (DFE)
Typical DFE-based optical receivers have a finite impulse response (FIR) feedback loop as shown in the block diagram in Fig. 4 (a). For an M-tap FIR DFE-based receiver, each tap is designed to eliminate the corresponding postcursor. This operation is illustrated in Fig. 4 (b) using two taps. The 2-PAM worst-case vertical eye opening is calculated as For an infinite-length DFE, all postcursors are removed, and the precursors limit the vertical eye opening. An infinite-length DFE can either be approximated with an analog feedback filter or a long digital FIR feedback filter. One challenge in DFE design is the feedback loop's timing requirement: the slicer output must propagate through the feedback filter to the slicer input within one baud interval. Digital DFE implementations address this with parallelism, implying a complexity and power consumption that increases exponentially with the number of taps [12]. Recently, however, novel DFE architectures break this difficult tradeoff allowing for the pipelining of DFE logic [22]- [24].
Feedback signals are produced from the noiseless signal at the output of the slicer, so a DFE output can be noiseless. The noise voltage at the input of the decision circuit, assuming a noiseless feedback loop, is Unlike FFE-based optical receivers that enhance noise (see (12)), the DFE loop has no impact on the output referred noise of the TIA. The SNR WC is calculated using (10) and plotted in Fig. 4 (c) versus f 3dB /f baud for both a 2-tap and infinite length DFE. The optimal bandwidth for 2-PAM signals is around 0.18 × the datarate while it is 0.22 × the baud rate for 4-PAM signals. An ideal infinite length DFE allows for a bandwidth reduction down to 0.04 × the baud rate before the impact of the precursors starts limiting SNR WC . A two-tap DFE improves SNR WC by around 4 dB in the case of 2-PAM and by around 5.5 dB in the case of 4-PAM. As the number of taps increases, DFE curves approach the infinite-length curve.
III. COMPARISON An overlay of the SNR WC curves for all types of receivers is shown in Fig. 5. As seen, a 2-tap DFE-based receiver exhibits optimal SNR WC . CTLE and 3-tap FFE-based optical receivers exhibit similar SNR WC improvement. FFE-based receivers exhibit less SNR WC improvement compared to DFEbased receivers because of the noise enhancement. Meanwhile, CTLE-based receivers provide less SNR WC improvement because, while they provide significant improvement in white noise suppression, they do not have any impact on colored noise. Finally, we note that SNR WC scales in proportion to the input current, I pp , without affecting the optimal bandwidth in each case. Table II summarizes some of the most recently published high-speed 2-PAM and 4-PAM receivers. The receiver in [3] uses a 2-tap FFE and a 2-tap DFE and lowers bandwidth to optimize the sensitivity. It was found that the optimal bandwidth for 4-PAM receivers is higher (relative to baud rate but not bit rate) than 2-PAM receivers for a given DFE size, especially when the effect of input jitter is included. This combination of 4-PAM modulation and input jitter amplification by the lower-bandwidth front-end [28] led to the choice of a 20 GHz front-end, which is 0.4 ×f baud . The number of taps is limited to two as more taps lead to increased power consumption while only providing marginal improvement in SNR. A high data rate of 100 Gb/s was achieved despite the low bandwidth.

IV. STATE-OF-THE-ART
Reference [13] describes a full-bandwidth 4-PAM receiver where dc-coupled CMOS inverters are used in the entire signal path. A bandwidth of 27 GHz (0.51×f baud ) is achieved by using series inductive peaking at the input TIA stage and shunt inductive peaking between stages. It achieves a data rate of 106.25 Gb/s. Reference [14] describes a low-power SFTIA with shunt inductive peaking. A record high-speed of 128 Gb/s is achieved. However, only electrical measurements are reported, and the DC gain is 59.3 dB. , which is low compared to other receivers. Reference [25] describes a fullbandwidth SFTIA that uses both shunt and series peaking to achieve a high bandwidth of 60 GHz to support 112 Gb/s 4-PAM modulation. Reference [27] describes a 50 Gb/s receiver that uses T-coils in the TIA stage along with a CTLE stage to achieve a bandwidth of 30 GHz. All four receivers have a f 3dB /f baud > 0.5.
The receiver described in [26] optimizes SNR performance at the slicer input by limiting the bandwidth of the SFTIA to 0.3 × baud rate and uses a 2-tap DFE to eliminate the resulting ISI. This receiver achieves a data rate of 32 Gb/s while using a front-end bandwidth of only 4.8 GHz. Similarly, [15] is an optimized 64 Gb/s receiver that limits the bandwidth of the front-end to 12 GHz (0.375 × baud rate) and eliminates ISI by using a 3-tap DFE.
Reference [12] is a 64 Gb/s low-bandwidth 2-PAM receiver in which the bandwidth of the TIA is only 15 GHz followed by a 1-tap DFE to remove the 1 st postcursor. According to [12], the number of taps is limited to one as more taps only resulted in minor SNR improvement. Compared to the 4-PAM receiver in [3], the ratio of bandwidth to baud rate is almost twice larger, which is in line with our findings.
It follows from Table I and this discussion that both full-bandwidth receivers that employ inductive peaking and limited-bandwidth equalized receivers are both still in use optical receivers. With the development of high-speed analog-to-digital converters and ADC-based front-ends that allow for sophisticated equalization, especially in sub-10nm CMOS, we anticipate that low-bandwidth front-ends may see even more use in the future.
V. CONCLUSION This tutorial brief covered the optimization of the frontend of optical receivers. We looked into different optimization techniques used and quantified the optimal bandwidths for 2-PAM and 4-PAM signaling. We found that the optimal bandwidth relative to the baud rate is higher in the case of 4-PAM modulation, but it is, in fact, lower relative to the bit rate. This is because of the 2 × bit rate increase offered by 4-PAM. A review of the state-of-the-art optical receivers was presented. The ongoing trends are the implementation of bandwidth extension techniques and equalization techniques to enable the design of 4-PAM receivers capable of achieving the data rates required by the 400G Ethernet standard and emerging 800G and 1.6T standards. We anticipate that lowbandwidth techniques will see use in ADC-based nanoscale CMOS optical front-ends.