A 112-Gb/s —8.2-dBm Sensitivity 4-PAM Linear TIA in 16-nm CMOS With Co-Packaged Photodiodes

A flip-chip co-packaged linear transimpedance amplifier (TIA) in 16-nm fin field effect transistor (FinFET) CMOS demonstrating 112-Gb/s four-level pulse-amplitude modulation (4-PAM) with −8.2-dBm sensitivity is presented in support for optical receivers required in the next-generation intra-data center links. A proposed three-stage TIA is comprised of a shunt-feedback stage followed by digitally programmable continuous-time linear equalizers (CTLEs) and a variable gain amplifier (VGA). Broadband low-noise design is achieved by having the first stage with much lower bandwidth (BW) followed by the proposed BW recovering CTLEs. A low-power design is supported by the inverter-based single-ended architecture with a single-ended-to-pseudo-differential conversion in the last stage. TIA’s BW extension is further supported by optimizing the photodiode-to-receiver (PD-to-RX) interconnect and utilizing several inductive peaking techniques. It achieves 63-dB<inline-formula> <tex-math notation="LaTeX">$\Omega $ </tex-math></inline-formula> gain, 32-GHz BW, and an average input-referred current noise density of 16.9 pA/ <inline-formula> <tex-math notation="LaTeX">$\sqrt {\text {Hz}}$ </tex-math></inline-formula> while operating at 0.9-V supply and consuming 47-mW power. Opto-electrical measurements are performed on a co-packaged prototype comprised of identical proposed TIAs in CMOS with combinations of various commercial PDs and PD-to-RX interconnect lengths confirming 112-Gb/s 4-PAM reception meeting pre-forward error correction (FEC) symbol error rate (SER) of <inline-formula> <tex-math notation="LaTeX">$4.8 \times 10 ^{-4}$ </tex-math></inline-formula> without any post-equalization.

data centers with faster, lower cost, and energy-efficient solutions [1]. The majority of this demand burden is taken by the intra-data center links as they carry a relatively much higher portion of overall data center traffic. Standardization bodies are working toward such links taking its throughput capacity beyond 100 GBd/s for inter-rack and intra-rack links covering distances from 1 m up to 2 km [2], [3], [4]. Optical links, on the other hand, have been the most favorable domain of communication for such range of distances as optical channels have negligible frequency-dependent loss [5] compared to electrical links suffering from frequency-dependent loss beyond 20 GHz [6].
Considering the simplicity of signal modulation, energy efficiency, and cost efficiency, intensity-modulation and directdetection (IM/DD) systems are being pushed for emerging 400G-DR4/FR4/LR4 and 800G-DR8 Ethernet standards where +100 Gb/s/λ is targeted [3], [7]. Although optical channels have a negligible loss, the opto-electrical components in the signal path are typically bandwidth (BW) limited. Therefore, having significant expenditure in optical components and packaging, these standards are adapting to four-level pulseamplitude-modulation (4-PAM) signaling instead of conventional binary-coded non-return-to-zero (NRZ) signaling to double the data rate for a given system BW. However, adapting to 4-PAM signaling comes at the cost of reduced signal level spacing by 9.5 dB and enforced linearity constraints in both optical and electrical components [8]. Moreover, this also entails the extensive use of digital signal processing (DSP), including forward error correction (FEC) adding link latency and power [9], [10], [11].
Intra-data center links have been heavily reliant on pluggable optical transceiver modules connecting at the edge of the switch board, which is several tens of centimeters away from the switch application-specific integrated circuit (ASIC). Thus far, these pluggable modules have been scaling up with the data rate and channel count to meet the throughput demand. However, they are soon becoming a bottleneck [12], [13] due to heavy cost and power associated with frequency-dependent losses in printed circuit board (PCB) traces and multiple discrete components in re-timer and buffer circuitry [14], [15], [16]. To combat this, several efforts are evolving around to reduce the number of components while keeping the integration dense, reliable, and cost effective, due to the advancement in the packaging technologies supporting the fiber optics cables go as close as possible to the switch ASIC reducing the length of electrical channels [17], [18]. This has opened the doors for co-packaged optics (CPO) or first-level package integration [11], [15], [16], [19], [20], [21], [22], [23], [24], [25] as well as integration of silicon photonics (SiP) along with CMOS switch ASIC [26], [27], [28], [29], [30], [31], [32], [33], [34].
This work focusing on the optical receiver front end attempts to fill-in the following research gaps. First, it proposes a CMOS-suitable single-ended inverter-based multi-stage linear TIA comprised of continuous-time linear equalizers (CTLEs) and a variable gain amplifier (VGA). The TIA is carefully designed with several combinations of lowpower, low-noise, and BW extension design choices. Second, it addresses the co-optimization of photodiode-to-receiver (PD-to-RX) interconnect between flip-attached PD and CMOS integrated circuit (IC) hosting the TIA (see Fig. 2). Finally, it provides comprehensive prototype measurement results with multiple PD-to-RX interconnect lengths and multiple commercial PDs achieving 112-Gb/s 4-PAM. This article, which is an extension of our recent work [53], is organized as follows. Section II provides the TIA architectural design considerations. PD-to-RX co-packaged interconnect optimization is described in Section III. Section IV presents the proposed TIA circuits. Section V describes the co-packaged prototype followed by detailed measurements and comparison with the state of the art in Section VI. Finally, the work is summarized in Section VII.

A. Co-Packaged Flip-Chip Integration
TIAs to be hosted in CMOS along with other DSP blocks require widely accepted flip-chip packaging compared to traditional wire-bonding solutions to support high I/O density, solid power/ground integrity, and low parasitics. This work focuses on the receiver-side integration of flip-chip co-packaged TIA in fin field effect transistor (FinFET) CMOS with discrete commercial PD, as shown in Fig. 2. This heterogeneous integration offers much more flexibility of choosing the best suitable technology for both the TIA and the PD to achieve the superior performance overall. For the best performance, PD-to-RX interconnect is optimized in this work with more details described in Section III.

B. Low-Power Design Choices
1) Inverter as a Fundamental Block: To achieve a lowpower design, circuits supported by a low-voltage operation must be selected. A CMOS inverter is considered a foundational block for the proposed multi-stage TIA design. Inverter is an excellent power-efficient analog amplifier providing 2× g m for the same drain current [54], [55], [56]. It provides low-voltage operation providing higher linear swing for a given supply supporting 4-PAM signaling. Supported by the advanced FinFET CMOS nodes, having the same drive strength for equally sized PMOS and NMOS transistors self-biases the inverter at mid-rail when in the shunt-feedback configuration. Hence, no separate biasing or tail current circuitry is required. Furthermore, this also allows the layout to be symmetric about the horizontal axis, and having no other internal parasitic poles makes the layout iteration much easier for maximizing the BW. Importantly, it is compatible with Cherry-Hooper style configuration supporting multistage energy-efficient high-BW broadband amplification [29], [32], [52], [57].
2) Single-Ended TIA With Single-to-Differential Block at the End: Since the PD output in IM/DD systems is single-ended, TIAs from prior work utilize replica-based single-ended-topseudo-differential (S2D) block from the first TIA stage [58]  or have an inverter-based self-referenced S2D block in the first stage [45] or in the second stage [32]. In this work, a singleended TIA architecture with inverter-based S2D block in the last stage is chosen, as shown in Fig. 3. This has numerous benefits and some disadvantages. Single-ended architecture reduces power consumption up to half and thermal noise by up to a factor of √ 2 compared to replica-based TIA architectures. Compared to other architecture variants, it saves a significant active silicon area. It also gives a big relief from the significant design overhead dealing with mismatches in the amplitude and the phase errors in differential paths, especially when the targeted symbol rate has a UI < 20 ps [32]. However, a single-ended inverter-based architecture is sensitive to power supply noise, which can elevate the power supply induced jitter (PSIJ) [59], [60]. Although this is not required in this work, it can be mitigated with either placing the TIA circuits with its own isolated supply voltage or having a dedicated on-chip low dropout regulator (LDO) [48], [61], [62]. For example, on-chip LDOs can offer power supply rejection ratio (PSRR) of >20 dB up to several hundreds of megahertz [63], [64], [65].

C. Broadband Low-Noise Design
A low-noise design approach from [36] and [58] is considered, as shown in Fig. 4. It consists of an inverter-based TIA having the first transimpedance stage (TIS) with significantly low-BW followed by BW recovering CTLEs targeting overall post-layout transimpedance gain of >60 dB and BW of >35 GHz. To achieve such high-BW performance, the use of passive elements, such as inductors and/or T-coils, is almost always required occupying a significant silicon area. Reducing the counts of such passive elements with the help of active circuit design techniques can further support the compact integration, which can enable multi-channel integration on the same die [45]. With the selected low-noise broadband approach, this work also attempts to minimize the number of passive inductors for compact design.
The first TIS stage having an inverter-based single-stage core amplifier with a fixed voltage supply (constrained by CMOS process) exhibits nearly the second-order system and offers slower roll-off [66]. This makes the design of following CTLEs much easier to recover the desired BW. To keep the in-band ringing and group delay variations well under control, two cascaded CTLEs are considered dividing the BW recovering task among them. Note that a single-stage core amplifier with a shunt-feedback architecture is chosen for a low-noise design instead of a multi-stage core amplifier with a shunt-feedback architecture implemented in [34] and [67]. This is because the latter approach is only efficient when the ratio of f T to the desired TIA BW is much higher (i.e., >10), which cannot be satisfied in this case [66]. The latter approach also entails significant design effort overhead associated with meeting sufficient phase margin while dealing with multiple complex poles [66], [68]. The noise reduction insight with considered approach can be explained as follows. An inverter with resistive shunt feedback (R F ) typically results in a second-order system. Assuming the first TIS having maximally flat (Butterworth) second-order characteristics with its core amplifier's (an inverter) dc gain A 0 1, the input-refereed current noise spectrum of the TIA can be given by the following equation [36], [58]: where g m is the transconductance of an inverter in TIS, k is Boltzmann's constant, T is the absolute temperature, γ is the MOSFET thermal noise factor, C T is the total input capacitance, V 2 n−CTLE is the thermal noise of the CTLEs referred at its input, and f 3 dB,TIS is the −3 dB BW of TIS stage. Noise terms are grouped by the input-referred current noise contribution from the TIS as I 2 n−TIS ( f ) and the CTLEs as I 2 n−CTLEs ( f ). In attempt to reduce I 2 n ( f ) for a fixed power expenditure, supply voltage implies that g m and C T remain constant, whereas the selection of R F and f 3 dB,TIS becomes a critical design choice. Importantly, R F and f −3 dB,TIS in the second-order transimpedance systems have their upper bound limit analyzed in [68], which is given in the following equation: constant for a fixed supply voltage. Equation (2) signifies that R F and f 3 dB,TIS can be traded with inverse-quadratic relationship. If R F is increased, then all the white noise terms in (1) (terms independent of f ) would be reduced. The f 2 colored noise term remains unchanged as it only contains constant parameters. The f 4 colored noise term also remains unchanged because the result of R 2 F · f 4 −3 dB,TIS in the denominator remains unchanged since R f and f −3 dB,TIS are traded with inverse-quadratic relationship given by (2). Increasing R F resulting in reduced f −3 dB,TIS implies that higher peaking from the CTLEs is required to recover overall targeted TIA BW. This can alleviate the CTLE noise, V 2 n−CTLEs . However, its impact on I 2 n ( f ) is not significant as the CTLEs' noise gets suppressed by R 2 F as dictated by (1). An illustration of noise reduction is shown in Fig. 5 when f −3 dB,TIS is scaled down by a factor of n resulting in R F to scale up by a factor of n 2 . Nevertheless, design iterations are required to find the right balance between the choice of R F and colored noise contribution while considering the extent to which CTLEs are capable of recovering the desired BW. Further details on the transfer characteristics and the input-referred noise contribution of individual blocks in the proposed TIA are reported in Section IV.

III. CO-PACKAGED OPTIMIZATION OF PD-TO-RX INTERCONNECT
The PD-to-RX interconnect (shown in Fig. 2) is in the high-speed signal path, and its design impacts the overall TIA BW. The PD output and the TIA input typically have large capacitances ranging from few tens to couple hundred femtofarads. Extending the BW by inserting an appropriate passive network between the PD and the TIA is well recognized where on-chip inductors are often inserted in between [69] or exploiting the inductive property of a well-modeled bond wire is considered during co-optimization with TIA [40], [47], [58], [70], [71], [72]. In this work, having both the PD and the CMOS TIA flip-chip mounted to a common package substrate affords the opportunity for an optimized micro-strip interconnect to extend the BW.
Assuming an interconnect with an ideal transmission line for simplicity, characteristics impedance, Z 0 , in terms of inductance per unit length, L , and capacitance per unit length, C , is given by the following equation [73]: It can be seen that simply increasing Z 0 of the interconnect (i.e., reducing the micro-strip width) decreases its C and increases its L . Hence, exploiting the inductive property of PD-to-RX interconnect, optimum Z 0 is selected to achieve passive front-end BW extension in this work. Fig. 6 shows the test bench model used for optimizing the PD-to-RX interconnect. A simplified first-order TIA input impedance model with R in = 26 and C in = 100 fF is extracted from the proposed TIA. An electrostatic discharge (ESD) diode with a post-layout extracted capacitance of 80 fF is placed at the input for higher reliability and protection of >1 kV human body model (HBM) and >250V charge device model (CDM). The total bump pad capacitance at the RX input is extracted to be 100 fF, which includes the metal pad to the substrate capacitance of 70 fF and pad-to-pad capacitance with neighboring supply/ground pads of 30 fF. To reduce the effective capacitance imposed by an ESD diode and a bump pad, a multi-layer T-coil occupying 20 × 20 μm area is inserted, which helps increase the BW by 2×, as shown in Fig. 6 [29], [74], [75], [76], [77], [78].
The T-coil lump model is extracted using the EMX tool. The bump model with lumped elements is obtained from [79] with the values of physical parameters provided by the fabrication and assembly vendors. Proprietary PD model parameters are obtained from the PD vendor. The PD-to-RX micro-strip interconnect with the desired length and Z 0 is modeled using the ADS EM tool providing layout extracted S-parameters.
Optimization of PD-to-RX interconnect is performed for two different interconnect lengths: 250 and 500 μm. Optimum Z 0 of a given interconnect length is selected by the one that provides the flattest possible dc gain with the highest BW. The transfer characteristic from the optical input to the TIA input is simulated across various Z 0 's, as shown in Fig. 7. It shows that Z 0 = 80 offers optimum choice for L = 250 μm [ Fig. 7(a)] where as it is Z 0 = 50 being the optimum choice for L = 500 μm [ Fig. 7(b)]. In both cases, selected Z 0 results in the passive front-end BW extension up to 60 GHz. Due to manufacturing limitations, Z 0 = 75 for L = 250 μm is fabricated in the prototype presented in Section V. It is also verified that having no interconnect between PD and RX (i.e.,  L = 0 μm) results in the lowest BW simply because now there is no inductive element in between. Furthermore, the choice of optimum Z 0 is further confirmed by the simulated eye diagrams at the TIA input at 112-Gb/s PAM-4 shown in Fig. 8 (for L = 250 μm) and Fig. 9 (for L = 500 μm). Selected Z 0 in both cases results in maximum eye opening also assuring insignificant group delay variations. Importantly, this passive input network BW extension due to optimized PD-to-RX interconnect is achieved without any additional cost of noise or power. Since the TIA input impedance model and the selected T-coil depend on the TIA design, the overall co-design entails iterative optimization. Fig. 10 shows the proposed three-stage inverter-based TIA operating at 0.9-V supply. Stage-1 is comprised of a shunt-feedback inverter designed with 10-GHz BW (roughly 1/4 th of the overall BW) allowing higher value for R F to maximize the dc gain and lowering the input-referred current noise, as discussed in Section II-C. Although, having higher R F value is favorable for noise reduction, its value is constrained by the linearity. For example, in the pursuit of reducing overall noise, having much higher R F value such as 650 results in the TIS1 BW of 7 GHz, which can still be recovered by the CTLEs. However, having such high gain in TIS1 could induce non-linearity even before the signal BW gets recovered by the subsequent CTLEs. Hence, the final value of R F = 324 with 10-GHz BW in TIS1 is carefully selected based on the trade-offs between noise and linearity.

IV. PROPOSED TIA CIRCUIT IMPLEMENTATION
Stage-2 is a Cherry-Hooper style stage forming a digitally programmable CTLE. A CTLE is comprised of a transconductor formed by an inverter for low-frequency gain in parallel with a CR-based high-pass filtered inverter. Resistors R E1 and R E1 are digitally tuned to adjust high-pass filter cutoff frequency. PMOS and NMOS transconductances, biased (V B ) with diode-connected inverter, are separately high-pass filtered to provide more programmability. This also helps tune-out any ringing in the frequency response arising from the process variation and packaging-related parasitics. CTLE transconductors' output current is converted back to a voltage by another  Stage-3 is comprised of a digitally programmable inverterbased VGA, with another CTLE similar to the one in Stage-2 in parallel for further equalization. The last portion of Stage-3 has a large TIS with L 1 in series and L 2 in shunt-feedback providing further BW extension [80]. Overall, Stage-3 provides around 7 dB boost at 30 GHz.
The sizing of each TIA stage is performed as follows. The inverter in TIS1 is sized up as much as possible to increase its g m lowering its device noise. However, its sizing is limited by the dominance of self-loading increasing the capacitance at the TIA input (C T ). The sizing of the subsequent transconductors (i.e., CTLE1) in Stage-1 is kept relatively low compared to TIS1 to avoid further loading on TIS1. This allows the inverters in TIS1 to operate at maximum possible gain-BW product (g m /2πC L ) supported by the technology. The TIS2 stage with R F of 47 allowing much higher transimpedance BW is sized relatively much larger to drive the input capacitance of CLTE2 + VGA. Finally, the TIS3 (S2D) is sized the largest to sufficiently drive the subsequent buffers without impacting the BW performance.
Post-layout simulations of CTLEs + VGA (Stage-2 + Stage-3 combined) response across maximum-to-minimum code settings is shown in Fig. 11. It highlights that dc gain and the CTLE peaking frequency can be changed independently. The transient pulse response simulation (post-layout) of implemented S2D block in this work at 56 GBd/s is shown in Fig. 12. It indicates that the resulting pseudo-differential signal D(s) has roughly 15% increased swing compared to its single-ended output D P (s). It can be easily proven by formulating and where I (s) is the input current to the S2D block from the previous TIA stage and Z F (s) is the equivalent impedance of the shunt feedback. It indicates that D(s) has a slight gain of (D(s)/D P (s)) = (A(s) + 1/A(s)) compared to D P (s).
To subtract the dc current from the PD, a dc offset compensation (DCOC) loop in feedback is formed with a 1.3-M resistor in series with inverter-based Miller capacitor (inverter with 9.3 pF of capacitor in shunt). The DCOC low-pass filter in closed loop provides a lower cutoff frequency around 1 kHz.
PD cathode is biased at 4 V through on-chip RC filter shown in Fig. 10 to decouple noise to the chip ground and to dampen any series resonance due to packaging inductance. The RC filter is formed with a metal resistor of 40 and an MOM capacitor of 80 pF.
Current-mode logic (CML) buffers in this work are chosen specifically for testing the TIA circuits. Although CML buffers require higher static power and supply voltage compared to the CMOS inverter-based buffers used in [32], they offer inherent differential operation with higher common-mode rejection ratio (CMRR) making it best suitable for off-chip driving the high-speed signals coping with ground/supply noise [29], [44], [45], [46], [81] Three cascaded linear CML buffers equipped with shunt-inductive peaking and operating at 1.2 V (see Fig. 13) are followed by required T-coil and ESD diodes to drive 50-load of the test equipment. They are designed to achieve 0-dB gain and 45-GHz BW. The tail current devices in the CML buffers are designed with a 96 nm gate length (6× than the minimum gate length of 16 nm) increasing its output resistance to help improve the CMRR. They provide (simulated) >30 dB CMRR up to −3-dB TIA BW converting the pseudo-differential output of the TIS3 to fully differential. All inductors and T-coils are designed with an extracted self-resonant frequency of >80 GHz and a low quality factor of around 3.5 to support broadband operation.
To achieve the maximum possible performance, a careful device layout is considered as follows. Double-sided gate contacts are used to reduce the gate resistance, hence minimizing the noise [82]. Maximum of four fins per finger is used to minimize the self-heating effect affecting the transistor  performance [83]. Gate-to-drain capacitance (C gd ) and drainto-source capacitance (C ds ) are minimized by bringing up the gate, drain, and source connections to higher metals in a staggered and staircase pattern. Minimum of three dummy fingers on both sides of the device is used to minimize the impact of process variations.
The simulated input-referred mean-square current noise contribution of various blocks in the TIA is shown in Fig. 14. It highlights that Stage-1 (TIS1) contributes (55.2%) to the majority of the noise. The total integrated input-referred current noise from R F is 0.9 μA rms , whereas it is 2.5 μA rms from the device thermal noise of TIS1. Stage-2 and Stage-3 make up for 25.5% and 6.6% of the total noise, respectively. The TIS3 (S2D) block in Stage-3 is only responsible for 1.2% of the total noise. The DCOC circuits account for 2.2% of the total noise. Note that the input T-coils also contribute to 6.8% due to its parasitic resistance. The CML buffers being last in the signal path contribute only 3.7% of the total noise. Fig. 15 shows the simulated power breakdown of each TIA stage with the total TIA power of 51 mW.
Considering the optimized PD-to-RX response, the total transimpedance response (Z T ) in the post-layout simulation at TT corner and 25 • C is shown in Fig. 16. Stage-1 response (from PD output to Stage-1 output) achieves a low-BW of 10 GHz, while the following CTLEs extend the total transimpedance BW up to 39 GHz. Note that although Stage-1 has much lower BW, having the BW extension support up to 60 GHz from the input network shown in Figs. 6 and 7 (i.e., optimized T-coil and PD-to-RX interconnect) provides TIA response in post-layout simulation with optimized PD-to-RX interconnect. This includes the optimized PD-to-RX interconnect, CML buffers, and 50-loads. relatively much slower roll-off compared to the second-or third-order response. As a result of the slower roll-off from Stage-1, equalizing the gain at the frequencies between 10 and 39 GHz makes the task of two CTLEs easier requiring <7-dB boost at Nyquist per CTLE. This relaxed requirement in CTLEs ultimately helps keeping the group delay variation under control preserving the eye quality.
Also, note that the noise contribution of Stage-1 alone accounts for 55.2% of the total noise while having only 1/4th of the overall BW. On the other hand, Stage-2 + Stage-3 accounting for only ∼32% of overall noise helps recover the targeted BW; due to the broadband low-noise design approach. Fig. 17 shows the resulting 112-Gb/s PAM-4 eye diagrams at the output of PD, low-BW TIS1, CTLE1 transconductor, VGA/CTLE2 transconductor, and, finally, TIS3 output.

V. PROTOTYPE
A co-packaged prototype housing four identically proposed TIAs in 16-nm FinFET CMOS exercised with multiple commercial PDs labeled as PD-[A/B/C] and PD-to-RX interconnect lengths (250 μm with Z 0 = 75 and 500 μm with Z 0 = 50 ) is assembled. The overview of the prototype with PDs and PD-to-RX interconnect specifications is shown in Fig. 18. Fig. 19(a) shows the assembled unit comprised of two commercial back-illuminated PD ICs flip-attached onto a package    is mounted on the PCB giving access to dc supplies and digital control signals. PD-C is a singlet, whereas PD-A and PD-B are two from an array of four PDs. The RX slice with dimensions is shown in Fig. 19(b) where the TIA (including ESD diodes and input T-coil) + DCOC blocks occupy 0.0165-mm 2 area.

A. Electrical Measurements
Electrical characterization is executed on RX4 where S-parameter measurements are performed using Keysight N5227B PNA. S-parameter inferred transimpedance for low, mid, and maximum gain settings is shown in Fig. 20(a). They reveal the maximum dc transimpedance gain of 63 dB and 3-dB BW of 32 GHz. Digital tuning of VGA through maximum and minimum codes reveals the TIA dynamic range of 9 dB. The group delay measurements are shown in Fig. 20(b) confirming the group delay variation of <±5 ps up to 32 GHz.
Single-ended total harmonic distortion (THD) measurements shown in Fig. 20(c) and (d) are obtained by Rohde & Schwarz FSW-26 spectrum analyzer at the 1-GHz tone. It demonstrates that with 8% THD, up to 670 μA pp of input PD current can be handled. It also shows that the 1-dB compression point with maximum gain occurs at 320 μA pp .
Noise measurements are performed on RX1 with PD-A attached and with laser source turned off, i.e., no input optical signal is applied. The single-ended output voltage noise distribution is measured using Keysight DCA-X  (6) or, equivalently, the average input-referred current noise density of 3.0 μA rms /(32 GHz) 1/2 = 16.9 pA/(Hz) 1/2 .  an AWG results in 4-PAM eyes with RLM > 0.95 and outer ER > 3 dB at the MZM output. Optical probe with lensed fiber tip is used to free-space couple light onto a PD. Differential outputs are probed through on-package pads and measured using the Keysight 86118A module. The co-packaged prototype mounted on the probe station with test equipment is shown in Fig. 22(b).

B. Optical Measurements
The 112-Gb/s 4-PAM optical measurements are performed for all three RX[1:3] individually with −6.1-dBm optical modulation amplitude (OMA). Differential output eye diagrams satisfying the minimum pre-FEC symbol error rate (SER) limit of 4.8 × 10 −4 (indicated by the eye contours) without any on-scope equalization are shown in the top of Fig. 23. It is observed that both RX1 and RX2, having the same PD but with 2× difference in their PD-to-RX interconnect length, achieved similar eye quality, due to the optimized Z 0 chosen for their respective PD-to-RX interconnect length. Also, it is noted that RX3 with PD-C having 14% higher responsivity than PD-A/B from RX1/2 achieved slightly improved eye opening compared to the ones from RX1/2. Eye quality is further improved after applying on-scope four-tap feed-forward equalizer (FFE) and four-tap FFE + four-tap decision feedback equalizer (DFE) equalization, as shown in the bottom of Fig. 23. Operating at the maximum gain setting, the TIA consumes 47 mW, while the CML buffers consume 30 mW of power.
The 100-Gb/s 4-PAM output eye diagram without on-scope equalization with −4.1 dBm input OMA on RX1 is shown in Fig. 24(a). To further show the potential of the TIA, 150-Gb/s 4-PAM at −3.6-dBm input OMA after on-scope 16-tap FFE + 2-tap DFE equalization is measured and shown in Fig. 24(a). Fig. 25(a) shows the 112-Gb/s 4-PAM SER across the input OMA achieving the sensitivity of −8.2-dBm OMA at the pre-FEC SER limit of 4.8 × 10 −4 without on-scope equalization. Enabling on-scope four-tap FFE, four-tap DFE and combination of both further reduce the SER, hence proving the suitability of the proposed TIA inhabiting in the front end of the DSP-based optical receivers. Similar measurement results are obtained for 4-PAM 100 Gb/s [ Fig. 25(b)] achieving the sensitivity of −12.5-dBm OMA without on-scope equalization. Note that due to linearity limitations, the SER does not reduce beyond −3.3and −4-dBm input OMA at 112-and 100-Gb/s 4-PAM, respectively. Considering the input   To demonstrate the support for low-latency links, optical measurement with NRZ PRBS13 test pattern performed at 72/64/56 Gb/s with bit error rate (BER) measurements across input OMA are shown in Fig. 26(a). Eye diagrams of 72-GB/s NRZ achieving BER less than 1 × 10 −12 without and with (four-tap FFE) on-scope equalization at −5.6-dBm OMA are shown in Fig. 26(b) and (c), respectively. The sensitivity measurements are summarized in Table I across different  datarates.  Table II shows the comparison with the state-of-the-art works implemented in CMOS. Compared to [44], this work with 19% higher BW is capable of offering NRZ datarate of up to 72 Gb/s at a similar sensitivty of [44], but at the cost of 26% higher power consumption in TIA. The work of [46] (electrical measurements) offers the impressive BW of 60 GHz with the gain of 65 dB but at the cost of higher power and noise compared to this work. Work from [32] with inverter-based TIA also implemented in 16-nm FinFET achieves a superior gain of 78 dB , but it trades with 18% lower BW while offering similar noise performance compared to this work. The work of [43] capable of 100 Gb/s offers 3 dB higher gain but with 37% lower BW than this work. With authors' best knowledge, even with approximately 40% lower PD responsivity and higher PD+ESD capacitance, this work offers the highest opto-electrically measured data rate and best sensitivity at equivalent datarates compared with [32] and [43].

VII. CONCLUSION
A 112-Gb/s 4-PAM linear TIA in 16-nm FinFET CMOS co-packaged along with various PDs and optimized PDto-RX interconnect lengths is presented. An inverter-based single-ended TIA operating at 0.9-V achieves 63-dB gain, 32-GHz BW, and an input-referred current noise of 16.9-pA/ √ Hz while consuming 47-mW. The PD-to-RX interconnect is co-optimized to maximize the passive front-end BW. Optical measurements at 112-Gb/s 4PAM reveal a sensitivity of −8.2-dBm without any on-scope equalization meeting pre-FEC SER of 4.8 × 10 −4 . Presented TIA with considered co-packaged architecture demonstrates strong potential for future high-density, low-energy, and low-cost +100-Gb/s class optical receivers required by the emerging 400-G/80-G/1.6-T Ethernet standards.