CMOS-Driven VCSEL-Based Photonic Links: an Exploration of the Power-Sensitivity Trade-Off

This article explores the power-sensitivity trade-off in optical receivers aiming to improve the energy-efficiency of the overall link. Optical receivers with field-effect transistor (FET) front-ends (FEs) are usually designed for optimal noise performance by matching the circuit’s input capacitance <inline-formula> <tex-math notation="LaTeX">$\left ({C_{I} }\right)$ </tex-math></inline-formula> to the total input parasitic capacitance <inline-formula> <tex-math notation="LaTeX">$\left ({C_{D} }\right)$ </tex-math></inline-formula>. However, the receiver’s power dissipation is also proportional to the input capacitance <inline-formula> <tex-math notation="LaTeX">$C_{I}$ </tex-math></inline-formula>. Therefore, this paper studies the feasibility of the capacitive matching rule in the context of minimizing the power dissipation of the overall link. For that purpose, design trade-offs for the receiver, transmitter, and the overall link are presented. Comparisons are made to study how much the receiver can be downsized, sacrificing optimal noise performance, before its power reduction is offset by the transmitter’s increase in power. Simulation results show that energy-efficient links require low-power receivers with input capacitance much smaller than that required for noise-optimum performance. As an example, for a 25 Gb/s operation, an optical loss budget of 12.6 dB, and a receiver designed in 65 nm CMOS technology with <inline-formula> <tex-math notation="LaTeX">$C_{D}$ </tex-math></inline-formula> of 200 fF, the overall link dissipates 2.55 pJ/bit when the receiver’s noise is minimized, leading to a receiver with <inline-formula> <tex-math notation="LaTeX">$C_{I}/C_{D}=1.29$ </tex-math></inline-formula>. When optimized for overall link efficiency, the receiver size is significantly reduced to <inline-formula> <tex-math notation="LaTeX">$C_{I}/C_{D}=0.38$ </tex-math></inline-formula> and the link’s energy-efficiency also improves to 1.41 pJ/bit. If the link budget or knowledge of the transmitter side is incomplete, our analysis indicates that maximizing gain with value of <inline-formula> <tex-math notation="LaTeX">$C_{I}/C_{D}=0.5$ </tex-math></inline-formula> is a reasonable choice.


I. INTRODUCTION
In recent years, the increasing demand for bandwidth-intense services such as social networks, online high-definition video streaming, video conferences, online games, mobile internet, and cloud-based storage has caused an exponential growth in internet traffic. Cisco Global Cloud Index predicted that more than 20 zettabytes of data were transferred in 2021 as shown in FIGURE 1(a) [1]. The figure also shows that the traffic has increased by nearly three times over the last five years. This growth is expected to continue, necessitating a corresponding The associate editor coordinating the review of this manuscript and approving it for publication was Harikrishnan Ramiah .
increase in the number of hyperscale data centers that include thousands of high-speed interconnects.
FIGURE 1 (b) shows that the total traffic is dominated by data communication that takes place within the data center. This in turn drives the development of robust, high-speed, and energy-efficient interconnects to transfer the data around the data center. Electrical links are usually deployed for relatively short distances. To extend the reach of electrical links, sophisticated equalization techniques can be deployed to compensate for their losses [2]. This solution considerably increases design complexity and dissipates more power and silicon area. Alternatively, optical links provide lower highfrequency losses, better immunity to interference, and higher capacity compared to their electrical counterparts. Therefore, optical links are widely used to communicate data between data centers or within data centers for distances beyond 1 m. IEEE Ethernet standards [3] have been drafted to specify the performance of optical interconnects. Hyperscale data centers include thousands of high-speed interconnects. Therefore, to maintain a reasonable power dissipation, recent research suggests that optical interconnects must achieve an efficiency of better than 1 pJ/bit at 25 Gb/s [4], [5]. In addition to being energy-efficient, optical links must be low-cost with costs below 10's of cents/Gbps [4], [6]. Most short-reach optical links in data centers are based on the vertical-cavity surface-emitting laser (VCSELs) operating at 850 nm over multimode optical fiber (MMF) [7]. MMF provides a cost-efficient solution for short-reach optical links up to 300 m. Compared to its single-mode fiber (SMF) counterpart, MMF has a larger inner core diameter which enables the use of optical connectors with relaxed tolerance and inexpensive optical components. CMOS-driven VCSELbased optical links have recently been demonstrated for NRZ and PAM-4 operations in [8], [9], [10], [11], and [12], respectively. FIGURE 2 shows the system-level diagram of a VCSELbased MMF optical link typically used for short-reach (up to a few 100 m) communications. The link operates as follows: high-speed serial data are fed to a laser diode driver (LDD) circuit that directly modulates the current flowing through the VCSEL. The modulated light emitted from the VCSEL is transmitted through a MMF to a photodiode (PD). The current generated by the PD is converted to a voltage by a transimpedance amplifier (TIA), and further amplified by a main amplifier (MA). Finally, a clock and data recovery unit (CDR) synchronizes an internal clock to the incoming data and uses it to capture and regenerate the data.
In short-reach photonic links, the transmitted optical modulation amplitude (OMA) must be sufficiently large that despite coupling and fiber losses, the received optical power exceeds the receiver's sensitivity. Better sensitivity reduces transmitter power dissipation. However, improving the sensitivity can incur significant power overhead in the receiver. Therefore, the power-sensitivity trade-off in optical receivers needs to be optimized to minimize the link's total power dissipation.
Sensitivity is a function of both the input-referred noise current of the analog front-end (TIA/MA) of the receiver, and the voltage amplitude requirements of the CDR driven by the front-end [13]. The input-referred noise of optical receivers with a FET front-end is usually minimized by choosing the receiver's input capacitance (C I ) equal to the total parasitic capacitance from the PD, pad, and wiring (C D ) [14]. The receiver's power dissipation is proportional to its transistor size and, hence, its input capacitance. Therefore, maintaining the capacitive matching rule for high values of C D leads to a significant power overhead in the receiver for a marginal improvement in the input-referred noise. The increased total input capacitance (C T = C D + C I ) also restricts the TIA's maximum achievable gain for a targeted bandwidth [13]. This in turn necessitates cascading more MA stages to mitigate the power penalty incurred by the swing requirements of the CDR, further increasing power dissipation.
The observed noise-power trade-off raises a question about the practicality of the capacitive matching rule. In [15], it is shown that a near-optimal noise performance can be obtained by drastically shrinking C I to one-fifth of C D . In addition to reducing possible instability, this reduces the power dissipation of the TIA. This observation is supported by a more recent design in [16] where the utilized TIA has C I of only 20 % of C D to reduce the TIA's power dissipation at the expense of a minor degradation in receiver sensitivity (only 0.3 dB). In [16], however, all analyses are performed under the capacitive matching rule with no clear justification for the reduced C I in the implemented circuit (i.e., it is not shown why 0.3 dB is an acceptable degradation in receiver sensitivity). The TIA's transistor size not only sets the power dissipation and sensitivity of the receiver, but also sets the transmitted OMA. Thus, transmitter power dissipation must be accounted for accurately in considering a noisier yet lower power receiver. Co-optimization of the transmitter and the receiver is essential to achieve optimum energy-efficiency for the overall link.
The main challenges for transceiver co-optimization are intuitively discussed in [17]. In reference works [5], [8], [18] co-optimization is performed on actual links by changing supply voltages and/or bias currents to achieve the best link energy-efficiency at a given data rate and bit-error rate (BER). In [19], the trade-offs that set the limit for the receiver sensitivity are analyzed. Then, the energy-efficiency of the link is calculated using state-of-the-art photonic devices and laser drivers.
The end-to-end link modeling in [20] optimizes receiver sensitivity and power by studying their dependence on frontend design as well as follow-on digital sampler requirements. The experimental on-bench optimization in [5] and [18] is the most accurate methodology. However, input capacitance is not adjustable post-fabrication. Equation-based approaches in [19] and [20] tend to make idealized approximations and assumptions in developing the models which introduce modeling inaccuracies.
This work presents a link-aware receiver sensitivity optimization to minimize power dissipation of the overall link. We show that energy-efficient links require low-power receivers with input capacitance much smaller than that required for noise-optimum performance. The presented design framework uses numerical simulations based on extracted parameters to select the optimum FET size, the number of MA stages, and transmitted OMA for minimum link power dissipation. Compared to prior work in link modeling [19], [20] and the blind receiver-side noise optimization in [14], the presented framework considers both frequencyand time-domain representation to accurately model the impact of design parameters on signal integrity. Transistorlevel Spectre simulations confirm the accuracy of the framework. An initial version of this work can be found in [21] by the author.
The rest of this paper is organized as follows: Section II discusses receiver modeling and revisits the analysis of the inverter-based TIA. Section III investigates the powersensitivity trade-off for various receiver architectures, showing that maintaining the capacitive matching rule leads to increased power dissipation for only marginal improvement in sensitivity. Section IV models the transmitter side of the optical link and discusses the link budget. The optimization procedure is presented in Section V and then used to study how small, but noisy, the receiver should become to minimize the link's total power dissipation. Section VI discusses the impact of technology advances, bondwire inductance, alternative TIA topologies, and higher pulse amplitude modulation scheme on the power-sensitivity trade-off. Finally, Section VII concludes the work.

II. OPTICAL RECEIVER MODELLING A. TRANSIMPEDANCE AMPLIFIER
The inverter-based (Inv)-TIA in FIGURE 3 (a) is chosen for its superior noise performance and moderate power dissipation due to the current-reuse between the PMOS and NMOS transistors. Unlike the common-gate TIA, the Inv-TIA is self-biased which decouples the gain element from the transconductance of the input transistor and allows for optimization without being limited by DC bias constraints. The Inv-TIA is extensively used in recent research either as a wideband pre-amplifier followed by a multi-stage MA [2], [18], [22], [23] or as a limited-bandwidth pre-amplifier followed by an equalizer [9], [16], [24], [25].

B. SMALL-SIGNAL MODEL
The small-signal model of the Inv-TIA is depicted in FIGURE 3 (b). The CMOS inverter is modeled by its total transconductance g m , and equivalent output resistance r ds . C D includes the photodiode, wiring and pad capacitance. C gs , and C gd are the total gate-to-source and the gate-to-drain capacitance, respectively. The capacitance C o includes the total drain-to-bulk capacitance C db and the loading capacitance of the subsequent stage C next . Therefore, the open-loop transfer function of the voltage amplifier can be written as A (s) = A 0 / (1 + s/2πT A ), where A 0 = g m r ds is the low-frequency voltage gain of the core amplifier and T A = r ds C o is the time constant at the output node. For a particular technology, A 0 is constant for a given supply voltage and W p to W n ratio. Considering this model, the Inv-TIA exhibits a second-order transfer function given by where C i = C D + C gs and R F,TIA is the feedback resistor. Therefore, the low frequency transimpedance gain is given by Comparing the denominator of (1) with the standard transfer function of a second-order system, the natural frequency ω n and the pole quality factor Q can be calculated. The TIA's 3-dB bandwidth (f TIA ) is calculated as f TIA = ρ (Q) ω n /2π, where ρ is a function of the pole quality factor and is used to convert the natural frequency to the corresponding 3 dB bandwidth based on the shape of the TIA's amplitude response [14]. Due to the pole-splitting effect introduced by the feedback capacitor C gd , the TIA's effective input and output capacitances differ from C gs and C o . They are respectively calculated as C I = C gs + (1 + A 0 ) C gd ,and C L = This means that the input capacitance C I is much larger than C gs due to the Miller effect and C L is smaller than C o . It worth mentioning that both transistors contribute to the Miller capacitance. Ignoring C gd may lead to inaccurate outcomes [26].
Although the model includes many variables, parasitic capacitances C gs , C db and C gd , the transconductance g m , and the output conductance r ds −1 are proportional to transistor width (W ). Therefore, the TIA's design space is defined by only three variables: R F,TIA , C D and W . The number of variables can be further reduced by fixing C D at 200 fF. The effect of changing C D is studied in Section VI.
The parameters of a CMOS inverter with C next = C I are extracted through simulation using Cadence Spectre and listed in Table 1. The circuit is simulated in TSMC 65 nm technology using a 1 V supply and biased at V IN = V OUT = 0.44 V. The biasing point is slightly less than V DD /2 because PMOS and NMOS transistors have equal width W p = W n = 1 µm × N finger where N finger is the number of fingers. The equal sizing strategy maximizes the total transconductance for a given total width W = W p + W n [27]. It is also confirmed that the per-finger current in Table 1 is sufficiently low so that the design will have no electromigration issue in the layout. Using N finger as a proxy for parasitic capacitances, transconductance, and output resistance allows the TIA's bandwidth, sensitivity, and power dissipation to be calculated.   Table 1 then used with R F,TIA to calculate the bandwidth using (1). Points with amplitude peaking (Q > 0.707) are indicated by hollow markers. For a given N finger , the bandwidth is reduced toward larger R F,TIA due to the direct trade-off between the bandwidth and the gain. For a targeted bandwidth, R F,TIA needs to be reduced for too large and too small values of N finger , indicating that there is an optimum value for N finger that maximizes the gain for a fixed f TIA . For example, in FIGURE 5 (b) the required R F,TIA and the resulting pole Q are plotted as a function of For a very narrow front-end (C I C D ), the total output capacitance C L is much smaller than C D while the total input capacitance C T is dominated by the parasitic capacitance C D . This gives the Inv-TIA two real poles (i.e., Q < 0.5) with the input pole at lower frequency. As the transistor width increases, C L increases while C T is still dominated by C D . As a result, the TIA exhibits an underdamped response with Q > 0.5. Increased Q allows the TIA to employ higher R F,TIA for a fixed f TIA . As the width continues to increase, the self-loading from C f forces the pole Q to drop which necessitates reducing R F,TIA to maintain the targeted bandwidth [26]. The gain from (2) is also plotted in FIGURE 5 (b)

D. INPUT-REFERRED NOISE CURRENT
In short-reach links where no optical amplification is employed, the noise of the receiver's analog front-end dominants the noise from the PD. Further, Flicker (or 1/f) noise is not considered since it has a negligible corner frequency (few 100 kHz) compared to the targeted bandwidth [13]. The main noise contributors in the Inv-TIA are the thermal noise of the transistors and feedback resistor, depicted in FIGURE 2 (b) as I 2 n,ch and I 2 n,RF , respectively. The total integrated input-referred noise power i 2 n is determined by [14] where k is the Boltzmann constant, T is the temperature in Kelvin and γ is the excess noise factor. BW n0 = πQf TIA /2ρ, BW 3 n2 = 3πQf 3 TIA /2ρ 3 are the noise bandwidths for white and colored noise, respectively [14]. C * T is the total input capacitance excluding the Miller term (i.e., C * T = C D + C gs + C gd ) [14]. The root mean-squared input-referred noise current is the square-root of (3). FIGURE 6 (a) shows i n,rms as a function of C I /C D for a TIA bandwidth of 8 GHz where C I is circuit's input capacitance including the Miller term. Setting γ = 0.75 achieves the best match between modelgenerated and circuit-simulated noise. The bold marker in FIGURE 6 (a) indicates the location of the minimum noise (MN) point. The noise current reaches a minimum value of 0.91 µA rms at R F,TIA = 397 and C I /C D = 1, showing good agreement with the capacitive matching rule. However, simulation results show that the noise-optimum size depends on the 3 dB bandwidth. For example, at f TIA = 12.5 GHz, the noise-optimum size is C I = 1.25C D . The capacitive matching rule in [2] is reached under assumptions of constant R F,TIA and constant pole Q which can be approximated as When the TIA is sized up, large R F,TIA makes R F,TIA C T T A . Therefore, maintaining a constant Q requires both A 0 and T A to increase. Practically, this is not feasible since the voltage gain of a single-stage CMOS inverter is constant for a given biasing and its maximum value is limited by the technology node. In this work, when the TIA is sized up, R F,TIA is chosen to satisfy the required bandwidth under a constant A 0 constraint. This makes both the resulting Q and the noise-optimum size depend on the bandwidth.

III. RECEIVER SENSITIVITY-POWER TRADE-OFF A. POWER PENALTY DUE TO THE SWING REQUIREMENTS OF THE CDR
A noise-limited input signal produces a peak-to-peak output voltage of V PP O,min at the output of the receiver's analog frontend (FE) given by V PP O,min = SNR i n,rms Z FE,0 , where SNR is the required signal-to-noise ratio for a given BER. It equals 14.07 (in linear units) for a BER of 10 −12 . Z FE,0 is the mid-band gain of the overall FE. V PP O,min is sufficient to drive an ideal CDR circuit to achieve the desired BER. However, the decision circuit in a realistic CDR has a finite sensitivity and requires a minimum peak-to-peak input voltage swing V PP S to function properly. Therefore, the FE's output voltage needs to be increased by V PP S to attain the same BER as for the ideal CDR. The receiver OMA sensitivity (in linear

units) is then calculated as
where R PD is the responsivity of the photodiode in A/W. Unless mentioned otherwise, R PD is fixed at 0.55 A/W. The term in brackets represents the power penalty (PP) incurred by swing requirements of the CDR. The PP becomes larger for smaller V PP O,min . In FIGURE 6 (b), the sensitivity is plotted as a function of C I /C D for a front-end that includes only a TIA. In this simulation, f TIA and V PP S are fixed at 8 GHz and 50 mV pp , respectively. The maximum gain (MG), minimum noise (MN), and best overall sensitivity (BS) points are indicated by bold markers and the performance at these points is summarized in Table 2. With no MA, the gain is limited, and the overall sensitivity is dominated by the swing requirements. As a result, the BS and the MG points are almost identical. Moving from MN to MG improves the transimpedance gain by a factor of 1.16× but worsens the input-referred noise by 1.12×. This reduces the PP due to the CDR requirements by 1.04 dB while worsening the noise-based sensitivity by 0.48 dB for a net improvement in sensitivity of 0.56 dB. Also, higher gain in the TIA is useful in suppressing the noise contribution from downstream circuits. This is in addition to reducing the DC power dissipation from 4.9 mW to 2.35 mW, further motivating a reduced TIA input capacitance.

B. MAIN AMPLIFIER
To alleviate the PP incurred by the swing requirements of the CDR, the TIA is followed by an n-stage inverterbased Cherry-Hooper (Inv-CH) main amplifier (MA). The schematic of the Inv-CH is shown in FIGURE 7. Inv1 acts as a transconductance converter while Inv2 together with R F,CH implement a transimpedance transfer function. This topology is widely adopted for various data rates and technologies [18], [22], [26], [28]. Similar to Section II, the transfer function of the Inv-CH amplifier is derived taking into account the output resistance and Miller capacitance of both inverters. The voltage gain of the Inv1 is reduced due to the low input impedance of the transimpedance stage formed by Inv2 and R F,CH . This in turn reduces the Miller effect from C gd to the input of Inv1, minimizing the loading capacitance to the preceding stage.
Cascaded MA stages can have equal device dimensions [22], scaled up [29] (Section 5.1.2), or inversely scaled [30] relative to the TIA's inverter, depending on the ratio of the total output capacitance to the total input capacitance. Once the scaling factor is fixed, the receiver's design space is defined by only three variables: W , R F,TIA , and R F,CH , assuming that C D is still fixed at 200 fF. In this work, the input capacitance of each stage of the MA is matched to that of the TIA. Thus, as C I /C D is varied, the width of transistors in every inverter is varied together.
The sensitivity is plotted in FIGURE 8 as a function of C I /C D for V PP S of 50 mV pp (this assumption is justified in Section V.A) data rate (f bit ) of 16 Gb/s, and various values of MA stages, n. To calculate the sensitivity for a given N finger and receiver architecture, R F,CH is first chosen to set the bandwidth of the MA (f MA ) to the targeted f bit . Then, R F,TIA is chosen to achieve an overall receiver bandwidth (f FE ) of 0.5f bit . To avoid signal distortion due to circuit nonlinearities, a constraint on the maximum peak-to-peak voltage amplitude at the output of the MA is set. Whenever this voltage exceeds 600 mV pp , the MA's gain is reduced to keep the output voltage within the permitted range. The inputreferred noise current is calculated taking into consideration all noise sources from the TIA and the MA and considering different transfer functions that noise sources pass through.
In FIGURE 8, both the MG and MN points are set by the TIA, staying relatively constant as the number of MA stages increases. However, more gain stages reduce the CDR's PP, which in turn moves the receiver's overall sensitivity minimum (BS) toward the noise-optimum size (MN). Therefore, the power dissipation of a sensitivity-optimized receiver increases due to the increase in both the number of stages and the per-stage power dissipation.

C. RECEIVER POWER DISSIPATION
At a fixed V DD and hence fixed current density, the power dissipation of a CMOS inverter increases linearly with its input capacitance. The receiver's front-end employs an inverter for the TIA and two inverters for each MA stage. Defining the power dissipation of an inverter with W p = W n = 1 µm as P DC,1µm and considering that all inverters are identical in device dimensions, the receiver power dissipation is calculated as The MA's scaling factor (MASF) indicates the size of the MA relative to the TIA. In following simulations, MASF is fixed at 1 which is typical for low photodiode capacitance [22]. The impact of changing the MASF is studied in Section VII. Given the simulated value of P DC,1µm in Table 1, P DC,RX can  be calculated as a function of the TIA size N finger and the number of MA stages (n). Since the inverters in the MA are equal in size (and hence power dissipation) to the inverter in the TIA, power grows proportional with 2n (since there are two inverters in each MA stage). As a result, as the number of gain stages increases to improve the sensitivity, the energy-efficiency becomes inadequate to meet standards that require links with 1 pJ/bit efficiency at data rates of at least 25 Gb/s [5].
For example, Table 2 shows that the energy efficiency of a noise-optimized receiver with a single-stage and a three-stage MA is 0.86 pJ/bit and 2.1 pJ/bit, respectively. Even at the best overall sensitivity point, the energy efficiency is 0.59 pJ/bit, and 1.88 pJ/bit for n = 1 and 3, respectively. On the other hand, for n ≥ 1, the shallowness of the overall sensitivity curves around their minima motivates reducing the power dissipation of the receiver. The shallow part of the sensitivity curve is bounded by the MN and MG points. Table 2 indicates that the relative power between these two points for a given n is about 2 : 1 since the MN design point is approximately C I = C D and the MG design point is approximately C I = 0.5C D . FIGURE 8 shows that the MG point is an interesting design point since it is located toward the lower end of the shallow part of the sensitivity curve. Table 2 shows that for n = 3, for example, reducing transistor dimensions such that C I /C D is reduced from 0.89 (BS) to 0.5 (MG) decreases power dissipation from 30.15 to 17.13 mW while the sensitivity is degraded by only 0.3 dB. However, to investigate exactly how small the receiver can become before its power 89336 VOLUME 10, 2022 reduction is offset by the transmitter's increase in power requires appropriate calculations for power dissipation of transmitter circuits as well as the link budget.

IV. OPTICAL TRANSMITTER AND LINK BUDGET A. LASER DIODE
Most short-reach optical links in data centers are based on VCSELs operating at 850 nm over MMF [7]. The VCSEL is an electro-optical converter that emits an optical power (P out ) proportional its current (I v ) as shown in FIGURE 9 (a), approximated as P out = η (I v − I th ), where η is the slope efficiency in W/A and I th is the threshold current. I bias is the VCSEL's biasing current which is supplied by the laser driver to transmit a binary ''0''. The modulation current (I mod ) is the current added above the bias current to transmit a binary ''1''. The peak-to-peak value of the VCSEL current is I mod giving an OMA of ηI mod . The output power has a diminishing return at a current of I v,max that must not be exceeded to avoid spending electrical power that is not converted into optical power. On the other hand, the lower limit of the VCSEL's current is determined by the threshold current. The more the VCSEL is biased above the threshold current the faster it becomes. The diode-shaped (V-I) characteristic of the VCSEL is illustrated in FIGURE 9 (b). It can be approximated to where V v , V th , and R v are the forward voltage, the threshold voltage, and the differential resistance, respectively. The V-I curve can be used to find the voltages V v,min and V v,max across the VCSEL terminals when its current is set to I bias or I bias + I mod , respectively.
The static characteristics in FIGURE 9 provide an intuitive understanding of the VCSEL's operation but are not sufficient to describe its dynamic behavior and inherent nonlinearity. Therefore, a more accurate modeling of the VCSEL, driver, and packaging parasitics is considered later in this section.

B. LASER DIODE DRIVER
The laser diode driver (LDD) consists of two stages, the predriver and the driver to which the VCSEL is connected. The pre-driver decouples the large input capacitance of the driver from the signal source and provides a broadband matching with the 50 environment. The main task of the driver is to provide the required current to the VCSEL. The current steering circuit in FIGURE 10 (a) is a common implementation [18]. The circuit is a differential amplifier with one side wire-bonded to the VCSEL while the other side is terminated by an on-chip dummy load. The driver is powered by V DD_D .
The VCSEL is biased by V DD_V and its DC biasing current is tuned by I bias . The pre-driver is usually operated in limiting mode and therefore the driver's differential input voltage V IN is sufficiently large to switch the tail current I 0 to either the left or right transistor as explained using the current switch model in  the load resistor of the right transistor, the DC voltage of the cathode terminal of the laser diode must be fixed at V DD_D and therefore its anode must be raised to To transmit a binary ''1'', FIGURE 10 (c), the tail current is switched to the right transistor drawing current I 0 from the parallel combination of R D and R v . The required tail current can be calculated from the modulation current as A small driver output resistance is required to damp any undesired ringing that can result from the supply and signal package parasitic inductance [31]. However, too small of an R D increases the driver's power dissipation [31]. Considering this trade-off, R D is chosen to be equal to the VCSEL's differential resistance R v [8]. Therefore, the tail current source is equally split between the two resistors (i.e., I 0 = 2I mod ). The maximum modulation current that can be supplied by the driver depends on the permitted output voltage range. Too large of an output voltage may break down the transistors but too small of an output may push the transistors into the triode region which in turn produces pulse-width distortion and jitter [13]. The output voltage changes from V DD_D in the case of transmitting a logic ''0'' to V DD_D − I mod R v in the case of transmitting a logic ''1''. If the output voltage is allowed to change by 0.5V DD_D between the two cases, then the maximum modulation current is then calculated as Although other, more power-efficient approaches to drive a VCSEL are possible [31], we consider this conventional implementation so that we pessimistically estimate transmitter power and the possible increase in transmitter power dissipation introduced when we design a receiver having slightly worse sensitivity, but significantly reduced power dissipation.

C. TRANSMITTER POWER DISSIPATION
For DC balanced non-return to zero (NRZ) data, the DC power dissipation of the transmitter including both the driver and the VCSEL can be calculated as P DC,TX = P DC,0 + P DC,1 2 (8) VOLUME 10, 2022  where P DC,0 = 2I mod V DD_D + I bias V DD_V and P DC,1 = I mod V DD_D +(I bias + I mod ) V DD_V are the DC power required to transmit a logic ''0'' and ''1'', respectively, and V DD_V is calculated by (6). As V DD_D is set by the nominal supply voltage of the CMOS technology, the above equation reveals that the transmitter power increases at higher data rates, poorer receiver sensitivity, and less efficient optical devices.

D. VCSEL AND DRIVER MODELING
The dynamic behavior of the VCSEL is described by a second-order transfer function obtained by solving the rate equations as [32] P out where f r and γ v are the relaxation frequency and damping factor of the VCSEL. D v and K v are the D-factor and the K-factor, respectively. The VCSEL bandwidth can be increased by increasing the VCSEL current until it becomes limited by the increased damping factor. As I v changes from I bias (to transmit a binary 0) to I bias + I mod (to transmit a binary 1) the bandwidth also changes. This inherent nonlinearity of the VCSEL is modeled in [32] as shown in FIGURE 11. The description and values of different model parameters are summarized in Table 3. The model consists of an electrical part that accounts for electrical parasitics and an optical part that accounts for the VCSEL's nonlinear optical dynamics. The optical part of the model is a second-order RLC circuit with signal-dependent oscillation frequency and damping factor, driven by a current-dependent voltage source. The emitted power P out is measured by the voltage across the capacitor C V . Therefore, comparing the transfer function from the voltage source to the output with (9) while arbitrarily fixing C V at 100 fF, allows R V , and L V to be calculated as a function of the current flowing through the VCSEL's junction (R j ) as given in Table 3.
For accurate modeling of the VCSEL, the P-I characteristics, the relation between the resonance frequency and square root of bias current above the threshold, and the relation between damping factor and the resonance frequency squared are extracted from the measured performance in [33] as polynomial functions. These functions are then used in the calculation of the model's optical parameters. A Verilog-A code is used to implement the optical part of the model and therefore the values of the current-dependent voltage source, R V , and L V are updated each simulation time-step to account for the VCSEL's signal-dependent behavior. FIGURE 11 also shows the model of the driver's output impedance (R o and C o ), and packaging inductance (L pkg1 and L pkg2 ) between the driver and VCSEL chip. The model-generated P-I characteristic, and modulation response at various values of the VCSEL current are shown in FIGURE 12 (a)-(b), respectively, excluding the effect of the driver impedance and packaging inductance. Both figures are in good agreement with the measured performance in [33] which validates the accuracy of the VCSEL model. The work in [33] is used because it provides the most complete set of measurements that allows for accurate modeling of the VCSEL.
The main objective of modeling the transmitter is to choose the bias and modulation conditions of the VCSEL considering all parameters that could degrade the transmitted signal quality. This allows the power dissipation of the transmitter to be accurately calculated. To do so, I mod and I bias are chosen based on eye diagram simulations at the output of the transmitter. For example, FIGURE 13 shows the simulation results for the eye diagrams at the transmitter output for data rates of 16 Gb/s and 25 Gb/s, a bias current of 4 mA, and a modulation current of 1 mA.
The OMA is measured by the internal vertical eye-opening which is less than η max I mod = 0.78 mW. This calculation  of the OMA accurately accounts for the impact of ringing and inter-symbol interference on the quality of the transmitted signal.

E. LINK BUDGET
The emitted OMA from the laser must be sufficiently large that despite link losses and penalties, the received optical power exceeds the receiver's sensitivity limit. An example of a link budget in a short-reach optical link is given in [9]. In the worst scenario, losses and penalties can add up to 10.6 dB, including 1 dB of aging and end-of-life penalty. A margin of 2 dB above the receiver sensitivity limit at BER of 10 −12 is also considered to ensure that the BER is achieved even with some process, voltage, or temperature (PVT) variations or in case of some of the losses or penalties were underestimated. Therefore, the link budget totals up to 12.6 dB, meaning that the launched OMA must be 12.6 dB larger than the receiver sensitivity limit at a BER of 10 −12 .

V. OPTIMIZATION PROCEDURE AND LINK EVALUATION
At this point, we can calculate the DC power dissipation of all active parts of the link (TIA, MA, VCSEL, and LDD) for a given data rate and optical channel (PD, MMF, and VCSEL). Table 4 shows the procedure, values, and bounds   FIGURE 14 shows the calculated efficiency as a function of C I /C D for a data rate of 16 Gb/s, swing requirement of 50 mV pp (to attain a single-ended receiver output voltage ≥ 100 mV pp [8]), and receiver architectures with a single-stage and a three-stage main amplifier. The vertical lines indicate the locations of the receiver's minimum noise (MN), best sensitivity (BS), and maximum gain (MG) obtained in Section III. The bold markers indicate the minima of the corresponding curve.
The TX energy dissipation naturally reaches a minimum value at the receiver's size that achieves the best receiver sensitivity, since this size minimizes the modulation current of the VCSEL and hence the TX's power dissipation. Note that the VCSEL's bias current depends on the VCSEL diode and the data rate but not on the receiver's sensitivity. More importantly, the overall link's energy dissipation reaches a minimum at a narrower receiver size than that required to minimize the TX energy dissipation. This can be explained as follows: as the receiver's width increases, its power dissipation quickly dominates the link's energy efficiency. On the other hand, the TX energy efficiency curves show less variation against the receiver size as a result of the shallowness of the sensitivity curves in FIGURE 8. This allows for significantly shrinking the receiver size before its power reduction is VOLUME 10, 2022 offset by the transmitter's increase in power due to increased modulation current requirements.
Due to the moderate data rate and swing requirements, a single MA stage is sufficient to optimize the performance. For n = 1, Table 5 and FIGURE 14 indicate that the link achieves an efficiency of 1.51 pJ/bit and 1.79 pJ/bit when the receiver is optimized for sensitivity (C I /C D = 0.65) and noise (C I /C D = 0.95), respectively. Downsizing the receiver to C I = 0.28C D , improves the efficiency to 1.24 pJ/bit. This clearly implies that energy-efficient links require low-power receivers with transistor size smaller than that required for optimized sensitivity or noise performance. Table 5 also shows that as n increases, the receiver must employ smaller transistors to compensate for the increased power caused by the increased number of stages. For n = 3, the link achieves an optimum efficiency of 1.38 pJ/bit at C I /C D = 0.2, 1.54 pJ/bit better than the efficiency achieved when the receiver's noise is optimized at C I /C D = 0.99.

B. LINK EVALUATION FOR HIGH DATA RATE AND SWING REQUIREMENTS
The optimization of the link is repeated for a data rate of 25 Gb/s and a swing of 100 mV pp (to attain a single-ended receiver output voltage ≥ 100 mV pp [8]), as shown in FIGURE 15. The hollow markers in the figure indicate the points where the required OMA exceeds the transmitter capability, limited by the maximum modulation current that the LDD can provide. Therefore, in FIGURE 15 (a), V DD_D is increased to 1.2 V to increase I mod,max to 7.1 mA pp . At this high data rate, the bandwidth requirements of the receiver's front-end (TIA/MA) become more difficult to meet in the given CMOS processes which limit its gain. This in addition to the increased swing requirement moves the receiver's BS point toward the MG point and three MA stages become required to optimize the link performance. Table 5 and FIGURE 15 (b) show that the link with n = 3 achieves an efficiency of 1.90 pJ/bit and 2.55 pJ/bit when the receiver is optimized for sensitivity (C I /C D = 0.83) and noise (C I /C D = 1.29), respectively. The efficiency is improved to 1.41 pJ/bit when the receiver is downsized to C I = 0.38C D , confirming that transistor size much smaller than the noise-optimum size and even smaller than that required for optimized sensitivity is needed for optimal energy efficiency. Table 5 also indicates that a larger number of gain stages in the receiver reduces modulation current requirements which is desirable for long-term reliability of the VCSEL.

VI. VALIDATION OF MODEL ACCURACY
To validate the accuracy of the presented model and optimization procedure, receivers with a single-stage and a threestage MA are designed and simulated in Cadence Spectre. The circuit parameters (N finger , R F,TIA , and R F,CH ) required to achieve the best energy-efficiency of the overall link are obtained from the Matlab code, then used in circuit simulations. A. AC SIMULATIONS FIGURE 16 shows Spectre simulated frequency responses of the TIA, MA, and the overall FE for various data rates and receiver architectures. The simulated and modeled results of the bandwidth, gain, and input-referred noise of the overall FE are in good agreement for all comparison scenarios with a maximum error of less than 1 GHz, 2 dB , and 0.12 µA rms , respectively. Further, the bandwidths of the TIA, MA, and the overall FE are approximately 0.5f bit , f bit , and 0.4f bit , respectively, in good agreement with the guidelines presented in [34] for designing full bandwidth optical receivers.

B. TRANSIENT SIMULATIONS
The TX model in FIGURE 11 is used with the designed receivers to simulate the eye diagrams at the output of the receivers as shown in FIGURE 17. The output power of the TX (the voltage across C V ) is converted to a current by an ideal voltage-controlled current source (VCCS), then fed to the RX input. The VCCS has a gain of 30.225 mA/V to account for the link budget (12.6 dB) and the photodiode responsivity (0.55 A/W). The internal vertical eye-opening (IVEO) is better than 88 % and 80 % of the peak-to-peak output V out,pp required for a BER of 10 −12 at 16 Gb/s and 25 Gb/s, respectively. V out,pp is calculated from circuit simulations as V out,pp = SNRV n,rms + V PP s , where V n,rms is the simulated rms output-referred noise voltage. The close agreement between the IVEO and the V out,pp validates the accuracy of the presented optimization procedure.
The absence of amplitude peaking, and the sufficiently wide bandwidth observed in frequency domain (FIGURE 16) translate to a lack of ringing and a negligible inter-symbol interference (ISI) in time domain simulations. As a result, the top eye diagrams in FIGURE 17 shows wide horizontal openings and consequently low deterministic jitter. The closure in the bottom eye diagrams is mainly caused by the distortion in the transmitted signal as evident by FIGURE 13 (b). At 25 Gb/s a VCSEL driver would often employ equalization. In this work, equalization was ruled out to constrain the problem. Finally, FIGURE 17 shows that the single-ended output voltage ranges from 96 mVpp to 450 mVpp, depending on data rate and receiver architecture. This means that our assumptions for the swing requirements V PP S led to similar or even more conservative results compared to [8].

C. PROCESS AND TEMPERATURE VARIATIONS
The overall gain and bandwidth of a receiver with n = 3 are simulated under process corners for various temperatures as shown in FIGURE 18. In FIGURE 18 (a) and (b) the receiver is sized with C I /C D = 0.38 and C I /C D = 0.83 to minimize the power dissipation of the link and to achieve best receiver's sensitivity, respectively. Comparing the results in FIGURE 18 (a) and (b) indicates that downsizing the receiver does not change how the circuit behaves under process and temperature variations. To overcome these variations, the feedback resistors in the TIA and the MA can be made tunable. It should be noted that we have considered a margin of 2 dB above the receiver sensitivity limit at BER of 10 −12 to ensure that the BER is achieved even if the nominal performance is not fully restored after tuning the circuit parameters.

VII. DISCUSSION
The initial values in Table 4 greatly impact the link energyefficiency. This section investigates the impact of several parameters such scaling of the MA, technology advances, and higher pulse amplitude modulation on the receiver powersensitivity trade-off. The performance of the link across a broad range of technologies and data rates is summarized in Table 6.  For these values of the MASF, the best receiver sensitivity is achieved at C I /C D of 0.9, 0.83, and 0.8, respectively. This indicates that the receiver size that achieves the best energy efficiency of the overall link is well below that required to achieve the best receiver sensitivity for all values of the MASF. Simulation results for the eye diagrams at the receiver output for various data rates and receiver architectures. The circuit parameters and the required peak-to-peak output voltage are also listed for each eye.

B. ADVANCES IN PHOTONIC AND INTERCONNECT TECHNOLOGIES
Advanced photonic and interconnect technologies are assumed where the photodiode and pad capacitance and the photodiode responsivity are changed to 120 fF and 0.8 A/W, respectively. The link budget is reduced to 8.6 dB. Signal degradation due to package inductance is ignored. The VCSEL is assumed to have sufficient bandwidth allowing its slope efficiency to be calculated by its maximum value of 0.78 W/A instead of being calculated from the eyediagram simulations as in Section V (see FIGURE 12). This advanced platform is used with the extracted parameters for the CMOS inverter in Table 1 to evaluate the link performance for various data rates and swing requirements as shown in FIGURE 19 (a). These factors significantly improve the link's energy efficiency and allow for further reducing the receiver power. For example, at 25 Gb/s, the energy dissipation of the link in FIGURE 19 (a) reaches a minimum for n = 1 and C I /C D = 0.4 compared to n = 3 and C I /C D = 0.38 for the link in FIGURE 15 where a typical photonic platform is used as shown in Table 6. The table also shows that at lower data rates, the optimum energy efficiency of the overall link is achieved by drastically undersizing the receiver far from the capacitive matching rule. Downsizing the receiver improves the efficiency of the overall link by 0.27 pJ/bit and 0.52 pJ/bit at 25 Gb/s, and at 10 Gb/s, respectively.

C. ADVANCES IN CMOS TECHNOLOGY
As CMOS technology scales, the peak transit frequency improves. Further, FinFET processes overcome the low intrinsic gain in scaled-CMOS technologies and offer an improved transconductance to drain current ratio [35]. To capture these effects, the parasitic capacitances in Table 1 are scaled by a factor of 0.5× while the transconductance, and the output resistance are unchanged. This has and effect of doubling the transit frequency at the biasing point to f T = 114 GHz while keeping the DC gain of the inverter fixed at A 0 = 6.2 V/V. Further, the supply voltage, P DC,1µm , and the excess noise factor are assumed to be 0.8 V, 0.058 mW/µm, and 2, respectively. This hypothetical CMOS technology is used with the typical photonic platform in Table 4 to evaluate the link performance for various data rates and swing requirements as shown in FIGURE 19 (b). Advances in CMOS technology improve the sensitivity of the receiver and reduce the DC power dissipation on both the receiver and the transmitter. This in turn improves the link's energy efficiency and allows for further shrinking the receiver below its noise-optimum size. As a result, at 25 Gb/s, the energy dissipation of the link in FIGURE 19 (b) reaches a minimum value for a receiver with n = 1 and C I /C D = 0.27, compared to n = 3 and C I /C D = 0.38for the link in FIGURE 15 where 65 nm CMOS technology is used. Table 6 shows that selecting C I /C D based on link efficiency rather than noise optimization improves energy efficiency by 0.55 pJ/bit and 1.14 pJ/bit at 25 Gb/s, and at 10 Gb/s, respectively. As expected, more improvement is observed compared to FIGURE 19 (a) because of the use of higher C D .

D. BONDWIRE INDUCTANCE AND MULTI-STAGE INV-TIA
The input-referred noise current of shunt feedback TIAs as a function of the circuit's input capacitance for a fixed parasitic capacitance, considering the impact of bondwire inductance is studied in [15]. The work concluded that for the range 0.2C D < C I < 2 C D the noise is very close to the optimum value. The width of the input device was chosen to be one-fifth of the photodiode capacitance, reducing the power dissipated while maintaining a near-optimal noise performance. This conclusion coincides with our findings.
The Inv-TIA can be implemented by cascading three inverters within the feedback loop to achieve a high dc gain A 0 . This large A 0 allows the TIA to employ a much larger feedback resistor and, consequently, reduces its noise contribution. The need to design this TIA with input capacitance TABLE 5. Performance comparison between the receiver's best sensitivity and link's est energy efficiency design points.  . Link performance at various data rates and swing requirements (a) using 65 nm CMOS technology and advanced photonic and interconnect technologies (b) using advanced CMOS technology and typical photonic and interconnect technologies. A receiver with a single-stage MA is used for both simulations.
far below the capacitive matching rule is recognized in [12], where the utilized TIA has C I of only 20% of C D to reduce the TIA power dissipation at the expense of a minor degradation in the sensitivity (0.3 dB). Hence, our findings are consistent with those for the TIA with a multi-stage feedforward amplifier.

E. HIGHER PULSE AMPLITUDE MODULATION
Higher pulse amplitude modulation (PAM) is emerging as a more bandwidth-efficient modulation scheme. PAM-4, for example, encodes two bits of information per symbol, allowing links to double the throughput using the same symbol rate as PAM-2. However, the need to resolve closely adjacent voltage levels at the receiver makes receiver sensitivity an important feature. Thus, PAM-4 receivers favor a larger C I compared to their PAM-2 counterparts. However, simulation results show that the C I that minimizes the overall power dissipation in PAM-4 link is still smaller than the noise-optimum size. In PAM-4 links, VCSEL bandwidth and linearity are also important considerations. A nonlinear equalization scheme is proposed in [32] to allow the VCSEL to be driven at a low bias current to improve its bandwidth efficiency while maintaining a linear operation.

VIII. CONCLUSION
The sensitivity-power trade-off in optical receivers is analyzed to minimize the energy-per-bit dissipation for the overall link. The sensitivity is calculated as a function of the receiver's input capacitance relative to the detector capacitance for various receiver architectures, data rates, and swing requirements. The observed shallowness of the sensitivity curves around their minima suggests that maintaining the capacitive matching rule to optimize the noise performance leads to a significant degradation in the energy-efficiency of the receiver for a minor improvement in the sensitivity. This observation motivated the investigation of how small the receiver can become, sacrificing its optimal noise performance, before its power reduction is offset by the transmitter's increase in power. For that purpose, accurate modeling for the transmitter and link budget is presented. Table 6 shows that across a broad range of technologies and data rates, simulation results show that the optimum energy-efficiency of the overall link is achieved by drastically under sizing the receiver far from its noise-optimum size.
In links that deploy PAM-4 or poorer photonic devices, receiver sensitivity becomes a crucial parameter. As a result, theses links may favor receivers with larger C I . In these links, receivers can be operated at the lower end of the shallow part of their sensitivity curves defined by the maximum gain (MG) point. The MG point is also the reasonable choice if designers do not have complete knowledge about the transmitter side and/or the link budget.