Optimizing the Photodetector/Analog Front-End Interface in Optical Communication Receivers

This article addresses the optimization of the interface between the photodetector (PD) and the analog front-end in high-speed, high-density optical communication receivers. Specifically, the article focuses on optimizing design elements in the interface, including the interconnecting transmission line, the T-coil, the transimpedance amplifier (TIA), and digital equalization tap weights. To optimize the optical link, we use a combination of analytical models, electromagnetic simulations, and machine learning techniques to describe different interface elements as most appropriate for each. Finally, we use the genetic algorithm to obtain optimal design parameters. The proposed optimization approach leads to a quick design time and reveals insights into some of the best design practices. As an example, we use the proposed method to investigate the relationship between optimal transmission line width and the amount of equalization available on the receiver. These conclusions are further supported by measurements taken on an assembled prototype with various PD-to-TIA interconnect lengths.


I. INTRODUCTION
T O SUPPORT the demand for the current 400 G and emerging 800 G and 1.6 T Ethernet standards in data centers, the per-lane data rate and the number of lanes have to be increased.Higher order modulation implementations, such as PAM-6 and PAM-8, are in active research to improve the per-lane data rate.Moreover, as the limited bandwidth of the analog front-end (AFE) has an increasingly detrimental effect on intersymbol interference (ISI) for higher order modulation schemes, equalization techniques are used to account for the limited bandwidth.On the other hand, increasing the number of lanes presents packaging challenges on the receiver side.
Many integrated CMOS optical receivers were developed on the receiver side, allowing the AFE and the SerDes circuits to coexist on one chip, such as the 100 Gb/s 4-PAM optical receiver in [1], the linear transimpedance amplifier (TIA) in 16 nm FinFET in [2], the linear TIA in 28 nm CMOS in [3], and the linear TIA copackaged with the photodiode in [4].However, the photodetector (PD) remains a discrete component.Since silicon-based CMOS technologies are not optimized for efficient light absorption [5], PDs are typically made from germanium or compound semiconductors (e.g., InGaAs), that offer better sensitivity and responsivity to light [6].Such PDs can be designed and optimized independently or in an array to achieve a desired combination of high responsivity, low noise, and wide bandwidth.Alternatively, they may be integrated into a silicon photonic platform alongside other optical components.In either case, the PD is generally not monolithically integrated with a DSP-based equalizing front-end, therefore implying packagelevel heterogeneous integration of the PD and front-end.With the eventual increase of the number of data lanes in the near future, the spacing between the discrete PD and their corresponding front-ends will inevitably increase, as shown in Fig. 1.Moreover, this leads to different interconnect lengths between the PD and the AFE.Consequently, more parasitics will be present at the optical receiver's input.Signal integrity impairments, such as reflections, will manifest at the interface between the PD and the AFE.To mitigate these impairments and the impact of the added parasitics on AFE performance, the package, and the AFE should be co-designed for optimal performance.Moreover, the optimal AFE design is different for various interconnect lengths.This necessitates developing an automated and fast optimization flow that takes interconnect design into account.
System-level high-speed data link modeling and optimization have been studied intensively in recent years.Prior work, such as [7], [8], and [9], primarily focused on modeling equalizers without detailed consideration of the proceeding AFE.In particular, the authors in [8] and [9] do not take the noise into account.Manukovsky et al. [10], presented using machine learning (ML) techniques to model SerDes systems without providing much design insight.Yang et al. [11] presented the results of an IBIS-AMI holistic model, but without describing many key implementation details.
This article studies the modeling and optimization of the packaging interface, and the AFE of an optical communication receiver holistically.Particularly, we have the following contributions to make our work distinct from others.First, we discuss how each AFE block is modeled in detail and make them open-sourced 1 so that readers can reuse the information provided in this work.We use foundry-provided models to accurately capture their impact on the design.Second, we take both jitter and noise into account so that their degradation on the channel performance can be investigated.Third, we have the T-coil included in our link model, and we apply some novel ML techniques to accelerate its modeling.
The rest of this article is organized as follows.Section II describes the modeling of the parts of the interface under consideration.Section III discusses the optimization procedure.Section IV presents the optimization results and discussion.Section V presents the experimental validations.Finally, Section VI concludes this article.

II. MODELING THE INTERFACE
This article considers 4-PAM modulation with a baud rate of 64 Gbaud (128 Gb/s).The interface is shown in Fig. 2, along with corresponding models, comprising a discrete PD connected to the optical receiver through some packaging interconnect.A 1

Source
code: https://github.com/ChrisZonghaoLi/optical_receiver_optimization Fig. 3. Stack of the organic substrate used for transmission line simulations.low-cost organic substrate is assumed here.Electrostatic discharge (ESD) protection circuits are required to prevent damage during manufacturing, assembly, or use of the components.The ESD circuits introduce parasitic capacitances that could harm performance.To ameliorate this, a bridged T-coil circuit is introduced to extend bandwidth.Then a TIA is followed by additional variable-gain amplification (VGA) stages and an analog-to-digital (ADC) converter.In the following optimization, we assume the noise and impairments of the VGA and ADC are negligible compared with the noise and bandwidth limitations of the TIA.Finally, digital equalization is used to remove ISI at the output of the TIA.A model of the complete front-end is formed by combining analytical (two-port linear) models, electromagnetic (EM) simulations, and ML techniques for each element in the model as appropriate.Each component in Fig. 2 will be described in detail in the following sections.

A. Modeling the PD and the Interconnect
The PD is modeled as an ideal current source in parallel with a junction capacitance, C PD , of 10.7 fF and a series resistance, R PD , of 87 Ω.These values are based on the GlobalFoundaries (GF) 45SPCLO CMOS (silicon photonics) process but are also consistent with standalone germanium PDs and other silicon Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
photonics [12].The rise and fall times of the signal are assumed to be 6 ps, corresponding to around 0.4 unit interval (UI) at 64 Gbaud.We also assume that the PD generates a peak-to-peak current, I pp , of 100 μA.The PD source impedance is analytically modeled using the following ABCD matrix: where s is the complex frequency.In terms of packaging, we assume flip-chip packaging of the PD die and the AFE die onto an organic substrate [13].The solder bump introduces parasitics and discontinuity.It is modeled as 10 fF shunt capacitance, C bump , and a 20 pH series inductance, L bump [14].The ABCD matrix of the bump on the PD side, T bump , is given by the following: As the distance between PDs and AFEs increases, the trace interconnecting the two should be designed as a transmission line to alleviate reflections and signal degradation.Thus, here we consider a transmission line connecting the PD to the AFE.We consider transmission line lengths, TL, of 250 and 500 μm, typical values between the PD the AFE.We also consider a hypothetical transmission line length of 5 mm.Such a long-length transmission line may be needed to support future high density interconnects where a large number of PDs are arrayed around and connected to the receiver IC.For a given length TL, the width of the transmission line TW is a design parameter, and it is assumed to be bounded between 15 and 100 μm with 5 μm steps.A lower limit for the trace width is typically set to ensure the minimum trace width allowed by the organic substrate is manufacturable, while an upper limit is assumed to permit high interconnect density.The width of the transmission line controls the characteristic impedance.The microstrip transmission line was simulated using Ansys HFSS over the design space to obtain its ABCD matrix, T TL , as a function of frequency.We note that the EM simulations consider losses in the transmission line.The organic substrate stack shown in Fig. 3 was used in these EM simulations.The model assumes an epoxy-based substrate dielectric material developed for high-speed and low-dielectric loss applications [15].It has a relative dielectric constant of 3.3, dielectric loss at 5.8 GHz of 0.0044, and a surface roughness of 200 nm.With a 15 μm transmission line, it results in 0.1 dB loss for 250 μm, 0.3 dB for 500 μm, and 1.1 dB loss for 5 mm at the Nyquist frequency of 32 GHz.
Similar to the solder bump on the PD side, there is a solder bump connecting the transmission line to the receiver IC, T bump,rx .The ABCD matrix of this bump is given by the following: The pad on the receiver side introduces a relatively large capacitance that creates a discontinuity at the interface and introduces a pole at the input of the AFE, limiting bandwidth.Here, we assume a fixed pad size regardless of the transmission line width.This assumption is made considering that having a large pad for bonding is necessary.A typical capacitance of C PAD = 100 fF is modeled with the following ABCD matrix:

B. ML Model for T-Coil S-Parameters Predictions
A bridged T-coil is often incorporated at the input of the AFE to offset the impact of the ESD capacitance C esd , which is assumed to be 80 fF at the receiver's input, as shown in Fig. 2. The capacitor C esd is necessary to protect the circuit from ESD.However, C esd introduces a low-frequency pole, which decreases the front-end bandwidth.The T-coil ameliorates the impact of C esd .Intuitively, the T-coil essentially introduces inductance on either side of C esd , creating an artificial LC transmission line that increases the front-end bandwidth while introducing a small delay [16].The T-coil can be modeled as two mutually coupled inductors with a bridge capacitance [17].Modeling the T-coil while sweeping circuit element parameters [such as R, C, k, and C br in Fig. 4(a)] may lead to an unrealistic T-coil design because the values will depend on the physical geometry of the T-coil.This makes it challenging to perform optimization by sweeping design variables.An alternative approach is using EM simulators to model the T-coil while sweeping the T-coil geometric parameters.However, this could be problematic as EM simulations can be time consuming, especially considering the large design space where many T-coil designs need to be considered.We leveraged the neural network (NN) proposed in [18] to promptly predict each T-coil's  S-parameters over a wide frequency range to resolve these challenges.This is done by taking the T-coil's geometric parameters as inputs to a NN that quickly predicts S-parameters allowing for accelerated optimization iterations.The design geometric parameters, as shown in Fig. 4(b), are the T-coil length, L, width, W , metal spacing, S, inner number of turns, N in , and outer number of turns, N out .
To demonstrate the idea of the proposed NN 2 and its feasibility, we used a GF 22 nm FD-SOI CMOS process as the targeted technology node here since its design kit has built-in T-coil layouts.However, designers can apply this NN to any other technology nodes.The T-coil geometric parameter inputs are given in Table I.The NN outputs the real and imaginary parts of the T-coil S-parameters as a function of frequency.Since the number inputs geometric parameters are significantly smaller than the number of output S-parameters, a series of upsampling layers are required in the NN.A single-input-multichannel deconvolutional layer [DeConv, shown in Fig. 5(a)] and upsampling convolutional NN [UpCNN, shown in Fig. 5(b)] are employed to achieve this objective.The upsampling layer can use different upsampling algorithms, such as the nearest neighbor and linear interpolation.For simplicity, this work applies the former.Fig. 6 shows the entire structure of the UpCNN.The T-coil's geometric parameters are first mapped to some high-level abstract representation through a multilayer perceptron, which is then passed to the DeConv and a series of UpCNNs.The predicted T-coil S-parameters will be the final output of the NN.These S-parameters are then converted to the ABCD parameters, T T coil , to represent the T-coil network. 2 Source code: https://github.com/ChrisZonghaoLi/upcnnFig. 6.Structure of the proposed NN for predicting T-coil S-parameters [18].The NN is trained with S-parameters (dc to 256 GHz) from 2920 T-coils.They are simulated with Cadence EMX using 32 cores of Intel Xeon Gold 6242R CPU.It takes about 30 h to prepare these training data.Training the NN on a NVidia RTX A4000 GPU required approximately 10 min.Note these were one-time efforts for this technology.We used K-fold crossvalidation to evaluate the NN performance.The loss function used to train and test the proposed model is a modified mean squared error, as in [19] the following: where N is the number of elements in the training set, K is the number of frequency points, S n,k is the true S-parameters (obtained by EM simulation) at frequency point k for the nth T-coil, and Ŝn,k is the corresponding prediction.This loss function trains the model to minimize the error across all frequencies.One way to evaluate the accuracy is to evaluate the error vector magnitude between the NN output and the true EM-simulated S-parameters given by the following: where EVM n,k is the error vector magnitude of the S-parameters at the frequency point k for the nth T-coil.Fig. 7 shows the mean S-parameters EVM of the NN output over 584 test cases.It can be seen that the error increases with frequency.However, given Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.the Nyquist rate of 4-PAM here is about 32 GHz.According to Fig. 7, for 584 test their mean EVM is at most 0.01-0.02below the Nyquist frequency, which is about −30 to −40 dB error.
We have also investigated the performance of our proposed NN in the time domain by examining its derived pulse response.We terminate the middle tap of the T-coil with the C esd = 80 fF as well as both input and output with a 50 Ω resistor.We convolve its impulse response with a current pulse I PP to generate the pulse response, which is then compared with the one generated from the EMX simulation.For example, the optimal T-coils for TL = 250 and 500 μm have been examined, with their geometric parameters given in Table III in Section IV.Noted that these T-coils do not necessarily match to 50 Ω.Figs. 8 and 9 show the pulse response comparison for these two T-coils.Our model's predictions toward the main cursor are reasonably accurate but over-optimistic on the post-cursor reflections, possibly due to larger EVM in the high-frequency domain, as shown in Fig. 7.The output of the NN is the S-parameter, and the cost function during the training process is only evaluating the accuracy of the predicted S-parameters, not the time-domain pulse responses [20].This is acceptable here since the motivation of our ML model is to replace the EM simulation by promptly predicting the S-parameters of a given T-coil so that the design space can be quickly narrowed down [18].

C. Modeling the TIA
The input of the AFE is a TIA that follows the T-coil and converts the input photocurrent into voltage.A commonly used TIA architecture is inverter-based shunt feedback, such as the 128 Gb/s PAM-4 linear TIA in [21], the 64 Gb/s PAM-4 TIA in [22], the 53 Gb/s TIA in [23], and the 64 Gb/s NRZ TIA in [24].Thus, we use it here.Inverter-based TIAs consist of an inverter with a shunt-feedback resistor converting the input current to output voltage as shown in Fig. 2. Inverter-based TIAs are simple to implement using CMOS technology, allowing them to be integrated alongside DSP equalizers, and have been used in optical receivers at 100 Gb/s and beyond (for e.g., [1]).The small-signal model is shown in Fig. 2. The design parameters of the TIA (which we assume to have been designed in 14 nm CMOS FinFET [24]) are transistors widths and the feedback resistance.In the small signal model, some parameters scale with transistor width.In this model, some parameters are coupled.Namely, the transconductance, g m , the gate-to-source capacitance, C gs , the gate-to-drain capacitance C gd , and the equivalent output resistance of the TIA, R a .The value of g m is related to the gate capacitance, C g , by the cutoff frequency of the technology node.We assume that the transistors are in deep inversion and that the ratio of C gd/ C gs = 2 (i.e., C gs = 2/3 C g , and C gd = 1/3 C g ), based on [24].We make this assumption because V t (threshold voltage) values are usually significantly below V gs = 0.5 V in our inverter-based TIA feedback configuration.However, adjusting the C gs /C gd ratio is important if V t approaches V gs .Table II gives the numerical values used in this study alongside the relationships between coupled parameters.We note that C a represents the combination of the output capacitance of the TIA and the input capacitance of the following stage.For the purposes of this study, we take g m to be the design variable proportional to transistor width, while other parameters scale with it according to Table II.Moreover, considering the limited input current swing we assumed, and considering one-stage inverter-based TIA, which is characterized by having good linearity, we ignore nonlinear nonidealities.
The parameters of the ABCD matrix of the TIA, T TIA , are expressed by the following set of equations: The ABCD matrix of the series connection of all the elements from the PD to the output of TIA is given by the following: From the T link , we are interested in the transimpedance from the PD current to the voltage output of the TIA.This transfer function is given by H(f ) = 1/C link , where C link is the C parameter of the T link matrix.The impulse response, h, is obtained by taking the inverse Fourier transform of H. Finally, the pulse response h pulse , is obtained by convolving the impulse response with the input current pulse of 1 UI in duration with 6 ps riseand fall-time.

D. Modeling Receiver Noise
The pulse response captures the time-domain behavior of the system, including reflections.However, it does not consider other signal impairments, such as noise or jitter.We describe how jitter is taken into account in Section III.The noise contributions arise from the feedback resistance, R F , I 2 n,R F , and the MOS channel thermal noise of the TIA transistors, I 2 n,g m .The noise variances at the output of the TIA from each of these noise sources are given by the following expressions: In these expressions, Z in refers to the impedance looking into the T-coil including C gs of the TIA (Z in in Fig. 2), Z f is the parallel combination of the feedback resistor, R F , and the gateto-drain capacitance, C gd (Z f in Fig. 2), and Z a is the parallel combination of the equivalent output impedance of the TIA (Z a in Fig. 2).Finally, I 2 n,R F = 4kT /R f and I 2 n,g m = 4kT γg m , where T = 300 K is the temperature and γ = 2.
The total noise variance at the output of the TIA, σ 2 n,TIA , is the sum of ( 12) and ( 13).

E. Feed-Forward Equalizer (FFE) and Decision-Feedback Equalizer (DFE)
A FFE and a DFE follow the AFE.In this model, the output of the TIA is connected directly to an equalizer.This simplification is done for the purpose of studying/investigating the effective or achievable signal-to-noise ratio (SNR).However, in a real system, a variable gain amplifier follows the TIA to achieve higher gain and condition the signal for being sampled by the ADC or DFE slicers.Moreover, in our model, we have assumed that there is an ideal ADC after the TIA and before the FFE.This allows for using a long-tap FFE.Here, we assume that the number of taps is given, but the tap weights are free parameters.While it is possible to include tap weights as part of the global optimization, for a typically large number of FFE tap weights (over 10) the optimization can become intractable.Instead, we choose to select the minimum-mean-square-error (MMSE) FFE and DFE tap weights based on the pulse response and noise under consideration.
Specifically, the MMSE FFE tap weights, Φ, are as follows [25]: where Y des,Δ is the desired pulse response having Δ UI delay, H is the channel pulse response matrix, P is the diagonal matrix whose diagonal is ones except for the K (number of DFE taps) entries after the main cursor that are set to zero, and I is the identity matrix.
The MMSE DFE taps are then calculated as follows: where J is a vector of zeros with the same length as Y des,Δ except for the K entries after the main cursor, which are equal to one.The delay, Δ, controls the number of FFE precursor taps and should be selected for optimal performance.To find the optimal number of FFE taps, we sweep the Δ to maximize unbiased Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.MMSE SNR given by the following [25]:

III. OPTIMIZATION PROCEDURE
To optimize the interface, a signal integrity criteria that reflects the quality of the signal at the receiver's output and takes impairments, such as reflections, jitter, and noise, into account has to be defined.Therefore, we opt to define a signal integrity figure of merit (FoM) that can be calculated statistically and correlates with bit error rate (BER).Statistical analysis of high-speed serial links provides an efficient way to evaluate performance since it relies on the pulse response of the channel, rather than relying on a large amount of randomly generated bit patterns.A statistically calculated FoM with the proposed optimization approach allowing it to be calculated rapidly, allowing for faster convergence on optimal design.Specifically, the FoM is defined as follows: where σ ISI represents the residual ISI at the output of the equalizer, σ n,output is the rms voltage noise at the output of the equalizer, and σ jitter represents the rms jitter-to-amplitude voltage conversion.The term A signal is calculated from the pulse response as follows: assuming the equalized pulse response, h pulse,eq , is baud rate sampled with O samples, and that the index of the main (max) cursor is zero, then A signal = 1/3 × h pulse (0).The peak of the pulse response, A signal , represents the peak-topeak amplitude of the modulated and equalized signal.This FoM is a form of SNR, and a higher FoM corresponds to a better BER.The residual ISI power, σ 2 ISI , is also calculated from the pulse response as follows: where the factor 5/9 takes into account the differing ISI contributed by different 4-PAM symbol amplitudes [26].Noise variance at the output of the equalizer is calculated using the calculated M FFE tap weights, w 0 , w 1 ,..., w M , and the autocorrelation of the noise at the output of the TIA, R as follows: where the noise autocorrelation is calculated by taking the inverse Fourier transform of the noise power spectral density obtained by adding the operands of ( 12) and ( 13) [25].
Jitter causes eye height to fluctuate around the sampling point.Thus, jitter translates into amplitude noise, reducing signal integrity.In other words, when the signal is jittery, the location of the peak of the signal changes with respect to sampling time.This means that, if the sampling phase is fixed, the signal would be sampled at OFF-peak when there is jitter.This leads to eye height degradation.The jitter-to-amplitude conversion variance, σ 2 jitter , is [26] as follows: where σ j is the rms jitter, and μ is the slope of equalized pulse response at the sampling points.The value of σ j is assumed to be 0.015 UI.In this equation, the factor 5/9 accounts for the density of 4-PAM transitions.The amount of eye height variation equals the amount of time variation times the slope.The quantities are squared to make them power quantities.
For this optimization, we use the genetic algorithm (GA) with the flow shown in Fig. 10.In this flow, an initial population of 200 sets of design parameters (T W , W , L, N in , N out , g m , and R F ) are randomly generated.Pulse responses (h pulse ) and noise variances at the output of the TIA (σ 2 n,TIA ) are calculated for each set of design parameters set as described in Section II.Using this information, the MMSE tap weights are calculated.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.The equalized pulse responses can then be calculated using the tap weights.Noise variances are also referred to the output of the equalizer through (19).The FoM for each design parameter set is calculated.A new generation is created through selection, crossover, and mutation of the best current design parameter sets.The process repeats until no improvement in FoM for a hundred generations.We also used the same seed for optimization to ensure repeatable results.Finally, the parameter values set corresponding to the best achieved FoM is selected as the optimal design.In this optimization scheme, the power consumption can be controlled by limiting the range of g m and the number of equalizer taps.To ensure the practical utility of the optimizer, it is necessary to model each component in the link accurately.Foundry-provided models are used for the PD and to train the T-coil modeling agent.Genetic optimization trials, including random mutations, continue until 100 consecutive generations produce no improvement in FoM.The overall optimization process takes around 29 min on a computer with the following specification: Intel Core i7-8750H @ 2.20 GHz CPU, 2666 MHz 16 GB SDRAM, and a 256 GB PCIe SSD.

IV. OPTIMIZATION RESULTS AND DISCUSSION
The design was optimized for three transmission line lengths: 250 μm, 500 μm, and 5 mm.We note that while the optimal transistor sizes in Table III were the largest permitted in our analysis for all three cases considered here (corresponding to largest g m ), this was not the case in trails where there were fewer taps of equalization (Tables IV and V).The likely reason why g m is optimal is that when g m value is Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.high, the value of R a is low, resulting in a high output frequency pole and allowing for a high R f , which results in a higher gain.Although large g m results in a larger C g lowering the input pole, the T-coil offsets this negative impact.Thus, a large g m is more favorable overall.Fig. 11 shows the eye diagrams obtained with this optimization (see Table III), including impairments.As these figures show, there is a good eye opening in all three cases.A contour corresponding to a BER = 2.4 × 10 −4 is also shown on the eye diagram.This validates that the proposed optimization approach converges on designs with good eye opening and low BER.
Fig. 12 shows a plot of the number of FFE taps versus the value of FoM for all three lengths of transmission lines.As can be seen, FoM increases steadily with the number of FFE taps.Moreover, we notice that the first few taps result in a significant improvement in FoM with diminishing returns as the number of FFE taps increases beyond about six.This is particularly true for the long 5 mm transmission line, which benefits significantly from a few equalization taps.With sufficient equalization, the FoM for the 5 mm transmission line is on par with the 250 and 500 μm transmission lines.
In addition to optimizing designs, the proposed approach can be used to gain insight into optimal design guidelines.Here, we explore the relationship between the optimal transmission line width, TW, which controls the characteristic impedance of the transmission line, and the amount of available equalization in the case of the long 5 mm transmission line that can exhibit strong reflections.
We use the optimization platform to obtain optimal design values for various numbers of FFE taps with no DFE taps.Fig. 13 shows the optimal transmission line width versus the number of FFE taps.We see that with little or no equalization, the optimal transmission line width is wide and tends to narrow with the increasing number of FFE taps.To explain this behavior, we look at h pulse , shown in Fig. 14, in two cases: with no equalization and 16 FFE taps.In both cases, we use optimal design values for each.With no equalization, the pulse response shown in Fig. 14(a) is obtained.This pulse response shows little to no reflections.In this case, a wide transmission line is preferred to avoid reflections that manifest as ISI.In other words, the optimizer chooses to achieve impedance matching between the transmission line and the input of the TIA to avoid reflections.To prove this, we inspected the input impedance of the TIA and compared it with the transmission line's characteristic impedance.The value of the transmission line's characteristic impedance is around 37 Ω, while the value of the input impedance is 32 Ω, confirming the close matching.
On the other hand, a narrow transmission line is preferred in the case of 16 FFE taps.To explain this, we look at the pulse response at the output of the TIA shown in Fig. 14(b).
Here we see a lot of reflections due to a large impedance mismatch between the characteristic impedance of the transmission line and the input of the TIA.However, when inspecting the pulse response at the output of the FFE [see Fig. 14(c)], we see that reflections are significantly reduced, particularly at the sampling points.This makes it unnecessary to do the impedance matching since the FFE is taking care of the reflections.The optimizer chooses a narrow transmission line, likely to reduce its introduced capacitance at the input of the chip.
Based on the preceding analysis, narrow transmission lines are preferred with sufficient equalization, along with a lower bandwidth, lower noise, and higher gain front-end.Such a design affords a lower power consumption in the AFE, but higher power in the DSP equalizer.With less equalization, a wider transmission line is preferable to ensure smaller reflections and better signal integrity.Note that the pitch of neighboring receiver lanes is typically limited by practical considerations, such as the pitch of mating fiber arrays, typically 100's of μm, and is unaffected by trace width optimizations.Of course, the optimization could be constrained to accommodate especially narrow channel pitches, if and as required.Therefore, we conclude the following design guideline: sufficient equalization to cancel reflections results in a narrow transmission line for the optimal design; otherwise, impedance matching is needed requiring a wider transmission line.These simulations highlight the importance of equalization in counteracting reflections.

V. EXPERIMENTAL VALIDATION
This section presents measurement results that illustrate the trends and tradeoffs elucidated by the automated optimization approach.Measurements were performed on a TIA prototype fabricated in 16 nm FinFET CMOS and flip-chip copackaged along with commercial PDs.Two copackaging arrangements were optimized for the same TIA, as shown in Fig. 15 with TL = 250 μ m and TL = 500 μm.Details of the complete frontend design are presented in [13] and [27].Although the prototype TIA's design parameters differ somewhat from the simulation model shown in Fig. 2, the same trends and tradeoffs are evident in the measured results.As predicted by the ML-assisted genetic optimizer in this work, the optimized interconnect is wider with TW = 60 μm and a characteristic impedance Z 0 = 50 Ω for the longer trace, and narrower with TW = 22 μm, and a higher characteristic impedance Z 0 = 75 Ω for the shorter trace.This allows both copackaging arrangements to maintain comparable 4-PAM signal integrity up to 100 Gb/s, as shown by the unequalized TIA output eye diagrams in Fig. 16.
Furthermore, as in the ML-assisted genetic optimization, we see a dramatic improvement in signal integrity (quantified by the vertical eye opening after equalization measured on the oscilloscope) once the span of the equalizers is sufficient to compensate for reflections and ringing induced in the package.Results incorporating FFE and DFE equalizers and varying the number of taps are shown in Fig. 17 at 140 Gb/s.An 8-tap FFE with one precursor tap equalizes the combination of packageinduced ISI and TIA bandwidth limitations, with additional taps providing little benefit.The inclusion of a 2-tap DFE provides a noticeable improvement, with little benefit from increasing the DFE length to 10 taps.The TIA has 32 GHz bandwidth, 45% of the baud rate in these experiments, comparable to the ML-assisted genetic optimization results.

VI. CONCLUSION
The article presented the optimization of the interface between the PD and the AFE in high-speed, high-density optical receivers.We used the proposed framework to optimize transmission line width, the geometry of the T-coil, the inverter-based TIA, and FFE and DFE tap weights.We have applied a hybrid modeling methodology, consisting of analytical models, an EM simulation, and a NN model, to describe the interface and effectively optimize parameters.The framework is also used to draw insight into optimal design practices.For example, we have shown trends highlighting the relationship between the amount of equalization and the width of the transmission line.We showed that narrow transmission lines are favored when there is enough equalization.However, it should be noted that this could lead to high power consumption because of the increased number of taps required to counteract reflections.Therefore, a wider transmission line may be favored in power-efficient designs with limited equalization.These trends are further validated with measurements performed on a fabricated and assembled TIA prototype with various PD-to-TIA interface lengths at 100 Gb/s.

Fig. 1 .
Fig. 1.Illustration of increased interconnect density leading to longer and potentially different interconnect lengths.

Fig. 2 .
Fig. 2. Interface packaging (top) and the corresponding models used for the optimization (bottom).Colors are used to delineate which model correspond to which component.Design parameters are annotated in red, which are: transmission line width TW, T-coil geometric parameters L, W , S, N in , and N out , TIA modeling parameters R F , C gs , C gd , and g m , FFE tap weights w, and DFE tap weights b.

Fig. 4 (
a) shows the T-coil-enhanced ESD circuit, and Fig. 4(b) shows the layout of a T-coil.

Fig. 5 .
Fig. 5. (a) Structure of DeConv layer.(b) Structure of UpCNN, which consists of a pure upsampling layer and a convolutional layer.The upsampling algorithm used here is the nearest neighbor [18].

Fig. 13 .
Fig. 13.Optimal TW versus the number of FFE taps for 5 mm.

Fig. 14 .
Fig. 14.Pulse responses for TL = 5 mm.(a) Pulse response obtained with optimal design values assuming no equalization.(b) Pulse responses at the output of the TIA, obtained with optimal design values assuming there is a 16-tap FFE.(c) Pulse responses at the output of the FFE equalizer, obtained with optimal design values assuming there is a 16-tap FFE.

TABLE I GEOMETRIC
PARAMETERS OF T-COIL IN GF 22 NM FD-SOI

TABLE II TIA
PARAMETERS USED FOR SIMULATIONS Table gives optimal design values, assuming 32 FFE taps and 4 DFE taps, and the corresponding FoM.Table IV gives the results assuming 6 FFE taps and 2 DFE taps, while Table V gives results with no equalization.