A Monolithic O-Band Coherent Optical Receiver for Energy-Efficient Links

A 1310-nm (O-band) coherent optical Link is demonstrated for short-range optical interconnects that operate to 56-GBd symbol rate (SR) (112 Gbps) with FEC-acceptable BER. The coherent optical receiver (CORX) leverages a monolithic 45-nm CMOS SOI photonic-enabled process to realize an energy-efficient quadrature phase shift keying (QPSK) demodulation. Co-design of the optical and electronic circuit elements supports high-speed operation and low-power consumption. The coherent link is demonstrated with an optical transmitter photonic IC (PIC) fabricated in silicon photonic (SiPh) process with laser diodes wirebonded to a 90-nm SiGe driver electronic RFIC. The transmitter operates at 5.9-pJ/bit energy efficiency (EE) while the receiver achieves 0.73 pJ/bit and, to our knowledge, is the best EE reported for a coherent optical receiver.


I. INTRODUCTION
M ORE than 77% of the total data center traffic is attributed to short-range (<1 km) data center interconnects (DCIs) and these short-range links demand continual scaling in spectral efficiency and bandwidth (BW) [1].To address this growth, DCIs will operate at bit rates (BRs) above 200 Gbps per wavelength with high EE, defined as total power consumption, P total , divided by BR, i.e., (P total /BR).
The simplicity and low cost of intensity-modulation direct detection (IMDD), illustrated in Fig. 1(a), have prevailed for short-range fiber optic links despite its low tolerance to optical impairments.For example, Fig. 1(a) illustrates that a 400-Gbps transceiver uses 4 × 100-Gbps four-level pulse amplitude modulation (PAM4) to deploy next-generation 1.6-Tbps ethernet [2].Nevertheless, scaling IMDD links requires heavy equalization, resulting in significant power consumption.A 200-Gbps/lane PAM-4 link has been shown to operate over 400 meters using 71 feedforward equalizer (FFE)  taps and 15 decision feedback equalizer (DFE) taps to achieve a pre-forward error correction (FEC) bit-error rate (BER) limit of 2 × 10 −2 with more than 7-dBm received optical power, demanding a significant output power from the transmitter (TX) source laser [3].Moreover, scaling beyond PAM-4 results in further increases in linearity and power consumption in the transmitter and receiver.
Coherent detection, shown in Fig. 1(b), is an alternative to IMDD [4].Quadrature amplitude modulation (QAM) offers additional scalability to multiple amplitude levels by changing both the phase and amplitude of the signal.Detailed comparison of IMDD versus coherent detection has been extensively researched concluding lower laser power requirements for coherent with comparable ASIC power [2], [5].While 1.6-Tbps coherent links have been demonstrated with discrete components [6], strict power consumption requirements must be met with reduced equalization and more energy-efficient demodulation using an analog CORX [4], [7], [8], [9].
Coherent optical signal processing places requirements on both photonic and electronic circuits.Heterogeneously integrated energy-efficient dual-polarization (DP) coherent optical links operating at 224 Gbps/wavelength have been demonstrated [10], [11].Monolithic optical transmitters and receivers offer reduced parasitics between the photonic and RF integrated circuit components.A monolithic CORX at C-band has been demonstrated operating with 3.2-pJ/bit EE using a photonic BiCMOS 0.25-µm SiGe technology [12].However, RF CMOS circuit techniques complementing silicon photonic (SiPh) devices enable further improvement in EE [13].The GlobalFoundries 45-nm CMOS SOI technology (45CLO) offers nMOS devices with f T = 290 GHz and supports a process development kit (PDK) that includes optical structures for waveguides, photodetectors (PDs), fiber coupling, polarization control structures, as well as ring and Mach-Zehnder modulators (MZMs) [14] with recent implementations of 112-Gbps IMDD links [15].
Previous work demonstrated an electronic-photonic CORX in 45CLO [16].This article complements initial measurements with analysis of the design approach to minimize power consumption and co-simulation of the photonic and electronic circuits at 56 GBd.The optical quadrature phase shift keying (QPSK) link is demonstrated for short-range DCI operating up to 112 Gbps for a single polarization with record EE of 0.73 pJ/bit for the CORX which includes photonic tuning elements.In Section II, reviews energy-efficient design of the CORX.Section III describes the circuit implementation of the CORX and Section IV presents the transmitter design and characterization.Section V presents the receiver and link measurements for QPSK modulation.

II. COHERENT RECEIVER EE
The CORX consists of an optical 90 • hybrid which mixes the local oscillator (LO) and signal electric fields, respectively E LO and E RX , that impinge on a PD as illustrated in Fig. 1(b).In terms of the optical power, the fields can be expressed as a function of the LO power P LO , the received optical power P RX and the relative frequency and phase of each, e.g., ω LO is the LO frequency and φ LO is the LO phase of electric field.Therefore, the electric fields are expressed respectively as follows: The transmitted optical power is found from the received optical power by accounting for the transmitter loss (L TX ), i.e P LAS = P RX L TX , where L TX is the L TX and P LAS is the transmitter input optical power.Channel loss is negligible in short-range DCI since these are much smaller than the L TX .The field incident at each quadrature PD differential pairs can be found applying (2), (1) according to For a locked phase and frequency between signal and LO, i.e., ω LO = ω RX and φ LO = φ RX , the amplitude of the current at each PD is attributed to the optical power converted into electrical current through the PD responsivity, R PD [17] Usually P LO is much larger than P RX so the first term will generate a dc current at the PD while the second term will generate the modulated current.The peak-to-peak current swing at each PD is then

A. Laser Power Requirements
The minimum peak-to-peak current at each PD to achieve the desired BER is [18] where Q is a constant for a given BER and i n,rms is the rms input referred noise current (IRNC).In terms of the IRNC, the minimum required transmit laser power is The total dc laser power consumption is where η LAS is the wall-plug efficiency, defined as the laser's ability to convert electrical dc power into optical power, and is assumed for both LO and laser powers.An optimum dc laser power consumption is found from trading off the LO power in the receiver for TX power.This minimum power is The total dc optical power is clearly closely connected to the IRNC of the electronic receiver and the losses of the transmitter.The laser efficiency and PD responsivity are contributions to power beyond the scope of this work.

B. Receiver Power Requirements
To calculate the power consumption required to reach a given IRNC requires some details about the process technology.The PD current is amplified using a transimpdedance amplifier (TIA) to generate a voltage for sampling.The overall link efficiency depends on the dc power required to amplify PDs current to a minimum sampling voltage as well as the optical power consumption generating a minimum detectable current for the TIA.To detect a peak voltage V O at RX output, the required transimpedance, Z T , is (2V O /I PP ) and substituting I PP with (5) Assuming a technology-dependent coefficient K Z that relates the desired Z T to power consumption, the dc power dissipation of a single channel is P dc,RX = K Z × Z T .The total power consumption for a dual channel I/Q receiver, excluding the transmitter electronic driver is The first term is found from (8) while the second term is the electronic receiver contribution.Since the optical power consumption reduces with lower rms current but the receiver power increases with lower rms current, the total power is minimized for Applying this condition to the total power consumption, the minimum required total power for a dual channel I/Q receiver is Consequently, the minimum power is closely related to the efficiency of the transistors at producing transimpedance gain for a given dc power consumption and the reduction of the sampling voltage range.
The TX losses also feature prominently in (12).For amplitude modulation, the MZM is biased at the quadrature bias where the applied voltage produces maximal optical power variation.For phase modulation, the MZM is biased at the null of the optical power.The optical carrier undergoes 180 • phase shift as the modulated signal swings around the null bias point.As the signal swings, the electric field as well as the optical power varies depending on the modulator phase efficiency V π defined as the voltage required to generate π phase shift [4].Fig. 2 plots the optical power loss as a function of modulated voltage normalized to the MZMs phase efficiency, V π .For SiPh processes, a typical V π L of 2 V-cm is expected where L is the modulator length which is inversely proportional to the speed.Trade offs between driver and laser power as well as optimum swing for TX optimum power consumption can be found in [19].For this analysis, a typical of 20-dB optical loss due to limited modulation is assumed.
Moreover, assuming 5-dB coupling loss for the input and output couplers at the TX as well as at the receiver, L TX is at least 35 dB.The η LAS depends strongly on the linewidth requirements and device technology and might be relatively low for an integrated SiPh tunable laser.For instance, [20] shows an implementation of a heterogeneously-integrated III-V/silicon interferometric widely tunable laser with 17% peak efficiency.Moreover, the optical versus electrical power curve is typically not linear, and we would expect a drop in the efficiency for higher output optical powers.However, for an external cavity laser (ECL), η LAS could be as high as 50%.For this analysis, a η LAS = 25% is assumed to estimate the ECL used for the measurement.Considering a minimum of 50-mV peak swing requirement, K Z = 0.01 mW/ , R PD = 0.9 A/W, and Q = 7 for a BER below 10 −12 , the minimum required optical power will be 16.4 dBm, and power consumption for dual channel receiver will be 44 mW.This minimum power consumption requires 3.2 µA IRNC.Nevertheless, dependence of noise on the BW and high BW requirements for desired symbol rate (SR) exceeding 50 GBd, where SR = BW/0.7 for each channel and BR = 2BW/0.7 for the dual channel I/Q receiver, make this power challenging and determining the BW that provides the minimum current suggests the optimal SR.The transimpedance required to amplify minimum detectable current to 50-mV peak swing is 66 dB .The dc power P dc,TOT,MIN is proportional to (1/(η LAS ) 1/2 ), hence a laser with twice the better efficiency reduces the total dc power by a factor of 1.4 while increasing minimum noise requirement by the same amount.In general, an exact optimization value depends on several link components that could be refined through further study.

C. Noise and Bandwidth
To evaluate the optimal power consumption against BW requirements, the analysis might assume a shunt-feedback TIA shown in Fig. 3 that uses a feedback resistance R F .The transimpedance is where C IN is the total input capacitance contribution due to the PD and the transistor capacitance.The damping factor in the second-order transfer function must be equal to √ 2/2 to ensure a well-behaved response, forcing the pole frequency of the core amplifier to be The gain-BW product is limited to the technology which suggests the transimpedance-BW limit [22]  With enough transimpedance, the noise added from following stages can be neglected.Hence, neglecting the shot noise contribution of PDs, IRNC for the shunt feedback TIA is calculated from [18] to include the thermal noise contributions at the input due to R F and the channel noise contributions at the output.In terms of the Boltzman constant k and temperature T , the rms current is The noise BW is scaled using Personik coefficients where p 2 and p 3 are roughly 1.11 and 3.3 for a Butterworth response, To minimize the noise contributions, the approximation C PD = C GS is applied to ( 15) and ( 16) [18], the IRNC equals For the core amplifier, an inverter amplifier produces high gain from the composite G m of both nMOS and pMOS devices while operating at low dc currents producing high intrinsic gain while minimizing power consumption for a given BW [23].The composite G m for the inverter is where W p = 1.2 W n are nMOS and pMOS transistor widths.The dc intrinsic gain is A 0 = G m R DS where R DS = r ds,n ||r ds, p .For the 45CLO technology, f T and intrinsic gain as a function of current is plotted in Fig. 4. The cell has a A 0 = G m • R DS = 4.8 for a wide current range.
Based on (17), the IRNC is plotted as a function of SR and f T in Fig. 5(a).The IRNC contours indicate that for a given SR improving f T reduces the IRNC and therefore should achieve lower dc power.
Extending the argument to the power consumption based on (10), (17), and the earlier assumptions for V O and L TX of 50 mV and 35 dB, the total power consumption for the coherent link is plotted in Fig. 5(b).Higher transmit laser power is required to achieve higher SR; however, increasing f T can reduce required laser power by improving receiver sensitivity.Nevertheless, biasing the device at a higher f T requires an increase in current and receiver power dissipation.As a result, there is a trade-off between speed and f T in total power consumption shown in Fig. 5(b).
The EE is calculated using EE = (P TOT /BR) where BR for the dual channel receiver is twice the SR (2 • BW/0.7) and is plotted in Fig. 5(c) and determines a minimum EE for a desired SR.For instance, a maximum SR of 60 GBd can be achieved with EE of 0.75 pJ/bit for f T of 300 GHz, which consumes 44 mW dc power from the optical sources and requires 3.2 µA IRNC, and 46 mW for the dual channel RX.In practice, device-level analysis should be conducted to capture the exact BW for a given f T .Section III reviews implementation challenges and trade-off between achievable BW, noise, and EE in CMOS circuits.

III. RECEIVER CIRCUIT DESIGN AND ELECTRONIC/PHOTONIC CO-SIMULATION
The optical and electronic receive circuity implemented in the 45CLO process are illustrated in Fig. 6.

A. Optical Front-End
The received optical signal and LO are coupled into the chip through waveguide-grating couplers.Each coupler has an anticipated loss of 4 dB.The optical 90 • hybrid comprises 3-dB directional couplers and a thermal phase shifter (PS) biased to 90 • to generate quadrature fields expressed in (3).The directional coupler splits the power and applies a 90 • phase shift to the opposing output arm.In Fig. 6, a pn-type PS is also introduced to allow a large tuning range to adjust the phase of the LO.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Fig. 6.Optical receiver implemented in 45-nm RF/photonic integrated circuit process consisting of an optical hybrid, TIA, LA, 50 output buffer (OB), and a Costas PFD.Detailed design parameters can also be found in [16].Fig. 7 simulates the wavelength dependence of the power splitting ratio and indicates less than 0.1-dB penalty across the band.The PDK PDs also illustrate a wavelength variation for the PD responsivity in Fig. 7. To illustrate the ability of the optical hybrid to tune the quadrature phase relationship, an optical simulation of the QPSK constellation is performed using the directional couplers and the thermal PSs PDK components in Fig. 8(a).A 56-GBd constellation is plotted under two heater bias conditions in Fig. 8(a) and (b).The mismatch in the bias may result in phase mismatch and distort the constellation.The simulation uses 0 dBm laser power and −15 dBm modulated signal power incident at the hybrid generating an average 218-µA dc current at each PD.Also, 70-µA peak current swing is generated close to the dc and swing values predicted based on (4).These constellations do not capture the BW limitation of the receiver.

B. Electronic Front-End
In this design, the differential PD current is amplified through a low-power pseudo-differential push-pull shuntfeedback transimpedance amplifier (TIA).The PD dc current calculated in (4) flows through a variable current source shown in Fig. 6.The current source is adjusted manually based on the incident optical power to ensure it does not flow through the feedback resistor affecting the inverters' bias and gain, causing the outputs to rail.An automatic dc sink can be implemented similar to [24].A fully differential TIA offers high common mode rejection and immunity to environmental noise, such as supply noise, compared to a single-ended design with a power penalty.To better evaluate the common mode rejection of the inverter cell, the output voltage variation with a noisy V DD is found from the inverter model in Fig. 3. Neglecting channel length modulation, −V DD = V gs, p −V gs,n and which will provide roughly 6-dB isolation between output and V DD .In practice, channel length modulation further limits the ideal isolation.Simulation shows 4.6-dB supply noise rejection for a single-ended inverter TIA.The differential PDs and common-mode noise rejection are improved from the pseudo-differential outputs.Assuming the differential nMOS and pMOS devices have a mismatch of δ n and δ p , the supply noise rejection becomes In the limit that the mismatch is zero, the rejection is infinite.Otherwise, the mismatch produces finite rejection.
In Section II, the minimum EE that can be achieved for a given technology was calculated.In practice, layout and packaging parasitic components further limit the available f T and achievable EE.Section II assumes the maximum SR based on available f T ; however, the shunt feedback TIA can be designed to maximize Z T for a given power consumption while achieving desired BW using inductive peaking [25], [26].
For further analysis, device level parameters, illustrated in Fig. 3, are included in the equations.The 3-dB BW for the core cell ω 0 is estimated as follows: where C DS = C gd,n + C gd, p + C db,n + C db, p is the total capacitance at the drain of nMOS and pMOS transistors, C db,n/ p is the capacitance between drain and substrate and C gd,n/ p is the capacitance between gate and drain.Note that, as described in Section II, C, the damping factor in the second-order transfer function of the shunt feedback TIA found in ( 13) must be equal to √ 2/2 to ensure a well-behaved response, forcing the pole frequency of the core amplifier to be ω 0 = (2A 0 /R F C IN ) resulting in a 3-dB BW for the TIA equal to BW = (1/2π)((2) 1/2 A 0 /R F C IN ).Consequently, the BW of the core amplifier ((ω 0 /2π)), modeled as a first-order amplifier, should be chosen √ 2 times higher than desired BW of the TIA.
The IRNC in ( 16) is recalculated as a function of transistor parameters The total inverter input capacitance is C in,i = C gs,n +C gs, p + (1− A)(C gd,n +C gd, p ), where gate-source capacitance for each device, C gs,n/ p , G m , and C DS are all proportional to device widths W n , W p .A larger device improves noise performance, reduces required transmit power at the expense of higher receiver power and limited BW.These tradeoffs between noise, BW, and power consumption yield an optimum choice for device width.Fig. 9 plots the BW and noise trade-off as a function of device width and indicates that, for data rates exceeding 50 Gb/s, the device should ideally be under 10 µm.
With the link parameters used in Section II, the EE of the receiver, taking into account the total power consumption calculated from (10), is shown in Fig. 10.The dc power consumed in the receiver as well as the minimum transmit laser power requirement is also shown in Fig. 10.Introduction of K z allows an estimation of a multistage receive chain power consumption based on the desired gain while assuming the TIA dominates the IRNC.Although the calculated BW of the TIA will be further limited in the multistage design, series inductive peaking allows for BW adjustment.The total power consumption is minimized to 90 mW with 45 mW of receiver dc power consumption for a 8-µm device resulting in 32 GHz of BW, which should support a desired SR of 56 GBd.A higher BW is possible through passive inductive peaking to peak the frequency response to 40 GHz without extra power in the receiver chain.However, this frequency response will increase IRNC slightly and reduce maximum R f to 375 .Consequently, the TIA illustrated in Fig. 6 uses M1 = 8.25 µm and M2 = 10 µm to scale the pMOS slightly according to the relative velocity.The TIA stage consumes around 2.4 mW with 47-dB gain, suggesting a transimpedance power efficiency of K Z = 0.01 mW/ as estimated in Section II.The calculations also predict 3.2-µA IRNC and a desired transimpedance of 66 dB for the receive chain.To achieve the higher desired transimpedance, a cascade of inverter cells with scaled transistors as well as inductive peaking follow the TIA to minimize loading effects and BW reduction.
Fig. 11 plots the simulated receiver frequency response assuming 400-pH output wirebond inductance from the driver to a printed circuit board assembly.The post-layout simulations indicate that a 40-GHz 3-dB BW is achievable.Measured S-parameters of an electrical test structures with wirebond assembly is cascaded with the PD model to determine the transimpedance of the packaged receiver.The measured result is also shown in Fig. 11 including 4-in cable connection to the VNA and PCB traces.The slight discrepancy between simulation and measurement may be attributed to the PCB packaging and connection to measurement device which is present in all time domain measurements as well.
The simulated output voltage noise for PD operating under dark current and with 0-dBm unmodulated optical power is plotted in Fig. 12. Higher dc currents flowing in the PD will increase the shot current noise at the input.The IRNC integrated across twice the BW will increase from 3.9 µA for dark current to 5.6 µA for 0.93-mA dc current at the PD.
The co-simulation of the photonic and electronic circuits is indicated in Fig. 13   As shown in Fig. 6, the design also includes a Costas phase/frequency detector (PFD) with detailed analysis in [27] and [28] to enable analog phase recovery of LO similar to the implementation in [29].

IV. COHERENT TRANSMITTER A. Optical Front-End
The TX photonic IC (PIC) was fabricated in Intel's SiPh process to leverage the integrated laser and includes DP-IQ traveling-wave MZMs with more than 30-GHz EO BW.Previous work has reported on the design and performance of the SiPh DP-IQ MZM and includes detailed DP transmitter measurements [30].

B. Electrical Front-End
Fig. 14(a) provides a schematic of the MZM driver EIC fabricated in a 90-nm GlobalFoundries SiGe BiCMOS process (9HP).The output stage load resistor R L is 200 to reduce the total current required to drive 30-MZM termination while suppressing backward reflections.The dual-channel driver consumes 250 mW (2.2 pJ/bit/channel).The driver also includes a continuous time linear equalizer (CTLE) circuit in the output stage to peak the output.This is realized by the emitter degeneration as shown in Fig. 14(a).As operating frequency increases the emitter degeneration impedance reduces and hence the gain of the driver increases resulting in a peak in frequency response.As shown in Fig. 15, the CTLE circuit generates 11 dB of peaking at 36 GHz to compensate for BW degradation in a silicon modulator.The simulated driver circuit exhibits 66-GHz BW and can provide 2-V peak-to-peak swing excluding packaging and parasitic components.Tradeoffs between driver and laser power as well as optimum drive swing for TX EE can be found in [19].
The measured TX 56-GBd constellations with a 70-GHz reference PD is shown in Fig. 16(a) and (b).Moreover,  Fig. 16(c) shows the BER measurement with −2 dBm LO power per PD.The constellation and baseline BER offers a comparison to the QPSK constellations that will be plotted for the CORX.

V. RECEIVER MEASUREMENT RESULTS
The chip micrograph for the TX, RX chips and chip-onboard assembly are illustrated in Figs.14(b) and 17.The TX chip measures 3.4 mm by 8.25 mm.The entire RX monolithic electronic/photonic integrated circuit (MEPIC) is contained within 2.6 by 1.1 mm, where a significant area is required for the LO PS.The optical hybrid and electronics have relatively equal area.The die is wirebonded to a high-speed test PCB.
The measurement setup is illustrated in Fig. 18.For testing, a 1310-nm ECL splits into the LO and signal paths where 25% of ECL power goes to LO and 75% goes to the transmitter.In the signal path, a coherent TX is driven with a 500-mV PRBS-15 signal from a bit pattern generator (BPG) (SHF 12105A).
The signal path also includes an O-band fiber amplifier (PDFA) compensating for high coupling loss in both transmitter and receiver and an attenuator for sensitivity measurements.The ECL output power is set to 20 dBm for 320-mA input current providing 14-dBm LO and 18.7-dBm input power to the TX.The signal power at the output of the attenuator with    only to introduce the optical fibers to the waveguide grating couplers on the receiver chip.The excessive optical losses in the link measurement are due to nonideal coupling as well as other components in the setup that significantly attenuates the modulated signal.As described in Section II, the modulators are biased at minimum transmission resulting in very low signal power and are driven with a limited swing not providing the full V π swing, further limiting the modulation factor for the QPSK signal.With the additional optical loss compared to the initial estimation, and a 50/50 split in the laser power, the modulated signal power was significantly limited generating less than 1-µA current.In theory, the LO can compensate for low signal swing as shown in (4).However, in practice, the transmitter is not ideal and its noise affects the signal received, and the signal becomes undetectable even with an ideal noiseless amplifier in the receiver as the optical SNR (OSNR) reduces.Moreover, to boost the modulated current with a very high LO power, the dc current shown in (4) has a larger increase compared to the ac signal, which will result in a higher shot noise which was originally neglected in the analysis assuming the noise contribution is due to the thermal and channel noise in the receiver.To compensate for high L TX and generate a detectable signal at the receiver, a higher portion of the optical power was split into the transmitter.
The receiver outputs are connected through high-speed 2.4-mm connectors to a real-time oscilloscope (RTO).The receiver I/Q channels are connected to a 70-GHz RTO (Keysight UXR0702A) with a 0.875-µs acquisition time at 256 GSa/s to capture the received QPSK signal.The differential, dual-channel electrical circuit draws 42-mA current from a 1.1-V supply, or 46.2 mW.The adder used in the Costas loop draws 5.4 mA from 1.5-V supply consuming 8-mW power.The thermal PS inside the optical hybrid consumes 36 mW for quadrature bias corresponding to 82.2-mW dc power consumption for the data path and additional 8 mW for the Costas implementation.A significant portion of the total receiver power was therefore consumed in optical tuning elements.Fig. 19 details the simulated power breakdown compared to the total measured power consumption of the CORX.The BER as a function of signal power incident at each PD is shown in Fig. 22.At lower data rates, the error rate is mainly due to noise while as data rate increases ISI degrades the error rate and sensitivity.The receiver IRNC can be estimated at 28 GBd, where the minimum signal power to achieve BER below FEC limit of 3.8 × 10 −3 is −35 dBm.The LO power of −4.2 dBm incident at each PD results in a sensitivity of 19.7 µA assuming R PD = 0.9 A/W (−19.6 dBm optical power from (P LO P RX ) 1/2 ) calculated from (4).The  IRNC can be estimated from (5) to be roughly 3.6 µA.The LO power is constant for all BRs yielding a sensitivity of −14.6 dBm for 40 GBd and −13 dBm at 56 GBd.The observed degradation above 28 GBd is worse than expected and was not predicted in RX simulations but can be accounted to other frequency-dependent non-idealities including including group delay dispersion and power supply sensitivity.Based on electrical eye openings, more BER degradation is expected as the data rate increases from 40 GBd to 56 GBd compared to going from 28 GBd to 40 GBd.In the full optical link measurements, the received signal has limited eye opening and worse OSNR at higher data rates that could further limit the sensitivity above 28 GBd.To compensate, fiber amplifiers are also used to improve optical swing, but also optical noise for higher Baud rates.Note that all data includes packaging and cable connections to the measurement equipment which is not included in the simulation and their effect is not predicted.
The Costas loop performance was investigated in an electrical test structure by generating quadrature beat-tones at the I/Q input with 80-mV swing, translating to 600-µA current,  and measured output voltage is shown in Fig. 23.The ideal Costas PFD response is analyzed in [28] and is shown in (23), where V PFD = Z TIA • G mix • I PD assuming the limiting amplifier (LA) stage fully limits the signal and the addition is ideal and perfectly linear.The G mix = 2/π is the gain of the passive mixer stage Fig. 23 also shows the ideal PFD response as a baseline for performance analysis considering Z TIA = 223 and I PD = 600 µA.The finite LA stage gain limits the PFD swing in the measurement.The response also suffers from imbalance in amplitude and 0 crossing which may be due to slight gain mismatch between I and Q channels and dc offsets in the mixing stage.For a more symmetrical response, gain control and dc offset compensation circuitry should be added to the design.
Network switching would ideally be much faster than the dynamics of a phase-locked loop for carrier recovery.Two solutions to eliminating this bottleneck are possible.First, the self-homodyne approach discussed here where the clock is sent along with the data.This eliminates the need to track the frequency and rather to adjust to the phase rapidly.Second, if a network architecture really benefits from generating the LO locally, techniques for rapidly acquiring the LO might use nonlinear adaptation schemes to change the loop filter dynamically and allow for a fast acquisition period, followed by a longer time constant to improve the phase noise rejection.
The linear Costas loop would follow a linear phase model, where the Costas PFD provides an error voltage based on the initial phase error between signal and LO.To remove highfrequency component of the PFD and provide high dc gain, a loop filter should follow the Costas PFD whose output drives the optical phase tuner.An integrator is an ideal choice for the loop filter.The optical phase tuner can be modeled as a voltage control delay line (VCDL) providing a variable time delay or phase shift in the signal as a function of the voltage applied to it.Let us assume that the VCDL has a linear phase response and can be modeled as φ out = φ in + K VCDL V cont .Using the linear model the phase through the loop obeys the following equation: Consequently, the phase error, φ e = φ in − φ out , follows: For the equation to hold correct across all frequencies, φ e should approach 0. In time domain, the input phase fluctuates slowly the output phase follows the input phase with a time Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
delay.We have φ out = φ in + φ e0 e −t τ L (26) where φ e0 is the initial phase error, and τ L = (1/K LF k PFD k VCDL ) is the loop time constant.At t = 7τ L , the phase error is reduced by a factor of 10 −3 .Hence, to ensure the loop can track the phase faster than the phase variations, K P F D and loop filter should be designed properly.
The PFD response highly depends on optical power.The PFD swing is proportional to R PD (P LO P RX ) 1/2 .For instance, the PFD swing for 600 µA current is 100 mV.Assuming a linear attenuation, the PFD swing is expected to reduce to 7.5 mV for the receiver sensitivity of −13 dBm at 56 GBd.Also assuming the optical phase tuner has a K VCDL = 0.5 rad/V, and the integrator has a time constant (1/K LF ) of 10 ps, the loop takes 15 ns to reduce the phase error to 10 −3 .
A performance summary for this design is provided in Table I with comparison against recent work at similar data rates.Notably, this result is fully integrated and was tested on a PCB assembly and not probed electrically.The finFET CMOS has indicated excellent power; however, this process does not support SiPh integration and the measured results are not for a full link optical assembly.When compared to prior monolithic coherent design in O-band, design achieved a sixfold improvement in energy efficiency (EE) for similar data rates.Compared to monolithic IMDD design, the EE we achieved was almost half with for the same data rate.Although SNR requirements are more strict for a PAM4 receiver compared to QPSK to achieve same BER, low IRNC in [25] allows sensitivity of −12 dBm required for FEC level BER.Compared to coherent designs this work achieves best sensitivity except for [33] which has a much lower BW and hence lower integrated noise.This design also has the highest FOM defined as Z T BW/P dc among coherent receivers.

VI. CONCLUSION
This article describes a coherent optical receiver that achieves 0.73 pJ/bit at 56 GBd fabricated in a 45-nm CMOS SiPh technology.An analysis of the trade-offs between device speed and EE illustrates the optimal input-referred noise current.Design optimization indicating transistor sizing, as well as the monolithic technology with high f T allows for maximized FOM and best EE when compared to other coherent designs at same data rates.Measured constellations and sensitivity curves indicate bit error rates below FEC limit of 3.8 × 10 −3 .

Fig. 2 .
Fig. 2. Optical power loss as a function of modulated voltage normalized to the MZMs V π .

Fig. 3 .
Fig. 3. Shunt feedback TIA block diagram with an inverter cell for the core amplifier.

Fig. 4 .
Fig. 4. f T and intrinsic gain A 0 = G m • R DS = 4.8 for an inverter cell in 45CLO technology.

Fig. 5 .
Fig. 5. (a) IRNC.(b) Total power consumption.(c) EE as a function of DR and f T .Cross lines estimate the maximum expected data rate for a given f T .

Fig. 7 .
Fig. 7. Power splitting ratio inside directional couplers and PD responsivity R PD as a function of wavelength.

Fig. 8 .
Fig. 8. Simulated 56-GBd QPSK constellations at the output of hybrid using 0-dBm laser power and −15-dBm modulated signal power with the phase tuner biased at (a) 17 mW and (b) 13.6 mW with normalized amplitudes.

Fig. 9 .
Fig. 9. Simulation of the IRNC and BW for an inverter TIA as a function of device width assuming a PD capacitance of 50 fF.

Fig. 10 .
Fig. 10.Receiver dc power consumption, P RX , transmit laser power requirement P TX , and EE of the design as a function of device width.

Fig. 11 .
Fig. 11.Comparison of simulated transimpedance for the RX channel and measurements based on an electrical test structure.
(a) and (b).The demodulation of the QPSK constellation at 40 and 56 GBd is performed using the post-layout CORX circuitry.No noise is added to the transient simulation and the impairments in the eye indicate slight inter-symbol interference.The simulated error vector magnitude (EVM) equals −10.9 dB for the constellation shown at 56 GBd, and −14.5 dB for the constellation shown at 40 GBd.

Fig. 12 .
Fig. 12. Power spectral density of output noise voltage for PD operating in dark current and 0-dBm optical power, translating to 0.93-mA PD current.

Fig. 14 .
Fig. 14.(a) Coherent optical transmitter including differential driver with CTLE and I /Q MZMs.(b) Transmitter assembly used for testing.

Fig. 15 .
Fig. 15.Simulated S21 of the driver showing 11 dB of peaking at 36 GHz and 66 GHz of 3-dB BW.

Fig. 17 .
Fig. 17.Chip micrograph and PCB assembly for the coherent optical receiver chip and assembly.

Fig. 18 .
Fig. 18.Self-homodyne test setup for link testing of the coherent optical receiver.

Fig. 19 .
Fig. 19.Power consumption from the current drawn from V DD , V DD,buffer , and optical tuning element.

Fig. 20 (
a)-(d) plot the measured QPSK constellations at 28, 40, 56, and 60 GBd based on the I/Q electrical outputs of the receiver.The constellations on the left show the transition between symbols while the constellations on the right are Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

Fig. 22 .
Fig.22.BER curves at different BRs indicating the power penalty to higher data rates as referenced to the FEC limit.

Fig. 23 .
Fig. 23.PFD output as a function of phase error.
Manuscript received 23 August 2023; revised 1 November 2023; accepted 23 November 2023.Date of publication 13 December 2023; date of current version 25 April 2024.This article was approved by Associate Editor Kenichi Okada.This work was supported in part by the Advanced Research Projects Agency-Energy (ARPA-E), U.S. Department of Energy under Award DE-AR0000848.(Corresponding author: Ghazal Movaghar.)Ghazal Movaghar, Viviana Arrunategui, Junqian Liu, Stephen Misak, Xinhong Du, Clint L. Schow, and James F. Buckwalter are with the Department of Electrical and Computer Engineering, University of California at Santa Barbara, Santa Barbara, CA 93106 USA (e-mail: ghazalmovaghar@ucsb.edu).Aaron Maharry is with the Department of Electrical and Computer Engineering, University of California at Santa Barbara, Santa Barbara, CA 93106 USA.He is now with Lucidean, Inc., Newark, CA 94560 USA.Color versions of one or more figures in this article are available at https://doi.org/10.1109/JSSC.2023.3339494.