Design and Analysis of a Nanosecond Burst-Mode CDR Using MATLAB/Simulink and Opti-System Co-Simulation

Optical packet switching (OPS) networks are promising to accommodate the growing traffic and reduce power consumption in data center communications. OPS networks with nanosecond switching time require nanosecond clock and data recovery (CDR) circuits. The nanosecond CDR can be achieved by utilizing a global frequency-synchronized reference clock for both transmitters (TX) and receivers (RX) and adopting a phase compensation scheme, which leads to predictably managed frequency and phase. However, the CDR still needs to be comprehensively evaluated considering various interferes. We add more analysis by developing a novel optoelectronic co-simulation system that combines the software Opti-system and MATLAB/Simulink. We set up a simple OPS network equipped with the CDR architecture using the simulation system. The feasibility of the CDR mechanism is validated, then various interferes are characterized to evaluate the CDR's stability, including the location variation of reference clock source, channel jitters, and carrier power variations.

have emerged as a promising solution due to their high bandwidth capacity and elimination of electrical-optical conversions [7], [8], [9]. Given that many applications in data centers produce short traffic packets, OPS networks with nanosecond configuration time are required, which also necessitates the development of CDR with nanosecond locking time [10]. Existing burst-mode CDR circuits, such as all-digital CDR [11], [12], gated VCO CDR [13], and over-sampling CDR [14], [15], [16], either lack practical integration in OPS transceivers or can only achieve microsecond-level data recovery [17], [18]. Therefore, there is an urgent need to develop efficient nanosecond CDRs to meet the demands of nanosecond OPS networks.
For packet-based optical switching networks, the variability in clock frequency and phase from packet to packet contributes to the lengthy locking time of CDR circuits. To address this issue, references [19], [20], [21] proposed a sub-nanosecond CDR architecture specifically designed for data center OPS networks by taking a network-level perspective. Briefly, frequencysynchronized reference clocks are used for both transmitters (TX) and receivers (RX) in the OPS network, thereby requiring the determination of only the phases in the RX. By leveraging the observation that the phase offsets between specific pairs of transceivers remains relatively constant in a stable environment, the group first measures the phase in the RX and then applies phase compensation in the TX. This approach aligns the phases in RX, thereby accelerating the locking process. The proposed CDR architecture demonstrated impressive performance in 25.6 Gbps burst-mode data transmission, achieving a locking time of 625 ps. However, various interferences can introduce noise into phase offsets, potentially leading to the failure of the phase compensation process. The paper [20] thoroughly investigated the impact of temperature variation on phase offsets, considering it as the primary concern. Nonetheless, it is crucial to further characterize the impacts of other interferes, such as reference clock variations, channel jitters, and carrier power variations.
To provide more insights into the nanosecond CDR, we design and analyze a 25.6 Gbps OPS network equipped with the proposed CDR architecture by developing a novel optoelectronic co-simulation system. The overall structure of our simulation is depicted in Fig. 1 MATLAB/Simulink, we model one TX node and one RX node for the OPS network. The TX and RX are interconnected by an optical channel, which is modeled using Opti-System. Two optical carriers are used and switched alternately to mimic the function of packet switching. We successfully validated the feasibility of the modeled nanosecond OPS network using the simulation test bench. We conducted an investigation into the influence of reference clock variations on the overall system performance. Additionally, we assessed the stability of the nanosecond OPS network under channel jitters and optical carrier power fluctuations. It is worth noting that the simulation system proposed in this paper can also be utilized to validate other electrical-optical codesigned modules.

A. Overview of the Simulation Model
The overall structure of the simulated 25.6 Gbps OPS network is illustrated in Fig. 1. Inside the software Opti-System, an 800 M optical reference clock is generated and transmitted to TX and RX through two optical fibers. In MATLAB/Simulink, two phase lock loops (PLLs) are modeled to multiply the 800 M clock to 12.8G separately in TX and RX. This ensures frequency synchronization of reference clocks in both the TX and RX. The details of PLL are shown in Fig. S5 (supplementary materials). The 12.8G reference clock is adjusted with appropriate phase shifts to generate the required single-ended 25.6G clock. Further details on this process will be explained in Section II-B. TX then sends data at the 25.6G reference clock. Two optical carriers are utilized and switched by an optical switch at a fixed switching time (100ns). The optical signals are transmitted in an optical fiber modeled in Opti-system. RX module includes modified a Bang-Bang phase detector (BBPD) and a finite state machine (FSM) for phase determination and a PLL-based CDR for clock recovery and jitter tracking.
To achieve a short locking time of PLL-based CDR, the phases of the two wavelengths need to be aligned in RX. In our simulation, RX initially measures the phase offsets for the  two optical carriers. These phase offsets, represented as 6-bit PI codes, are then transmitted back to the transmitter (TX). Subsequently, the TX performs phase compensations for the corresponding wavelengths based on each packet, aligning the data delay in the RX for fast CDR locking.

B. TX Node Model
TX transmits data at a bandwidth of 25.6G and has the capability to adjust the phases of the reference clock with a resolution of 1/64 × 2π. By employing proper phase compensation, the phases of the two wavelengths can be aligned at RX. The details of the phase interpolators are depicted in Fig. 2 Assuming the PI codes of the two optical carriers are already sent from the RX. The MSB 2bit of the 6-bit digit is utilized to select 2 clocks from the 8 reference clocks according to the correspondence specified in Table I. These selected clocks are labeled as clock_s1 and clock_s3. For instance, if the MSB_2bit is '00', clocks with phase values of 0 o and 180 o are chosen. Subsequently, clock_s1 and clock_s3 serve as differential reference clocks. The LSB_4bit is utilized to simultaneously delay clock_s1 and clock_s3 based on preset phase shift values, as illustrated in Table II. The delayed clocks are further labeled as clock_1 and clock_3. Clock_1 and clock_3 are both delayed by another 90 o to obtain clock_2 and clock_4.
Clock_1, clock_2, clock_3, and clock_4 are combined to generate a 25.6G clock, as depicted in Fig. 3(a). Two D flip-flops are utilized, where clock_1 and clock_3 serve as the trigger clock, while clock_4 and clock_2 serve as the reset clock. Consequently, clock signals Q1 and Q2 are obtained, as shown in Fig. 3(b). Q1 and Q2 are then integrated by an OR circuit to produce the 25.6G reference clock, which is used to send data. The 25.6G reference clock possesses the desired phase shift.
In the simulation, we use repeated 16-bit sequences (0000,1111,0110,0101) to serve as the transmission data. This sequence is repeated 128 times to form a data packet of 2048 bits. Given that the transmission rate is set as 25.6 Gbps, it takes 80 ns to transmit a single data packet. An inter-packet gap of 20 ns is set to allow for tasks such as optical carrier tuning and phase compensation.

C. RX Node Model
RX has a BBPD and FSM for phase determination and a PLL-based CDR for clock recovery. The phase determination mechanism in RX is shown in Fig. 4. Optical signals from the channel are distributed to 4 D flip-flops and sampled by clock_1, clock_2, clock_3, and clock_4. These four clocks are generated in a similar manner as in TX. Upon the 4-phase clock, specific logic is designed to determine whether the clock is leading or lagging the data. As shown in Fig. 5(a), clock_1, clock_2, and clock_3 are first used to sample the data to obtain signals S1, S2, and S3. S1 and S2, S2 and S3 are then passed through two XNOR  logic modules, which are sampled by clock_4 (red circles and arrows in Fig. 5(a)). If S1 XNOR S2 = 1, and S2 XNOR S3 = 0, indicating that the clock is leading the data, then total_lead = total_lead+1. On the contrary, If S1 XNOR S2 = 0, and S2 XNOR S3 = 1, which indicates that the clock is lagging behind the data, then total_lag = total_lag+1. A similar process happens in S3, S4, and S1, which are sampled by clock_2 (blue circles and arrows).
The decision regarding the relative timing of the clock and data signals is made every 64 bits of data by comparing the total number of leads and lags, as depicted in Fig. 5(b). Within each 64-bit data segment, these lead and lag judgments are made 32 times, which are added by two Adders. If, after these 32 iterations, the total number of leads is equal to the total number of lags, the PI code remains unchanged. If the total number of leads is greater than the total number of lags, the PI code is decremented by 1. Conversely, if the total number of leads is less than the total number of lags, the PI code is incremented by 1. Subsequently, the updated PI codes are used to generate new sets of four-phase clocks. The Adders are reset to 0 and the lead or lag decisions are made again. The pseudocode of the FSM is presented in Fig. 6.
6-bit PI codes are used for precise phase control, resulting in a phase shifting precision of 1/64 × 2π. The range of phase adjustment during a single packet is 2 × 32 × 1/64 × 2π = 2π, covering all possible conditions. Finally, the stable PI codes obtained through this iterative process are sent back to the transmitter (TX) for phase compensation, ensuring accurate phase alignment for the two wavelengths.

D. Optical Fiber Channel
The optical channel is modeled in Opti-system. As shown in Fig. S4 (supplementary materials), the data packet generated by the TX in Simulink is converted into NRZ electronic pulses by an NRZ pulse generator in Opti-System. Two continuous waves (CW) with wavelengths of 1550 nm and 1450 nm are generated using the laser modules to serve as optical carriers. These optical carriers are then routed through an optical switch, which alternates between the two carriers every 100 ns. The powers of the two CW laser modules vary in the transmission process (see Fig. S6 in supplementary materials). The modulated optical carrier is transmitted through an optical fiber modeled in Opti-system. The fiber parameters, such as length (1 km), reference wavelength (1500 nm), attenuation (0.2 dB/km), group velocity dispersion (17 ps/nm/km), and effective area (80 μm 2 ), are set to mimic Corning SMF-28 optical fiber characteristics. To represent temperature-induced phase variation, an 80 ps delay module is included. A photodetector module with a sensitivity of 1 A/W converts the optical signal back into an electrical signal. A low-pass Gaussian filter is applied to eliminate highfrequency noise, with a cutoff frequency set at 30 GHz and an

A. A Typical Transmission Process
Repeated packets were utilized to validate the CDR. The initial two packets in each cycle were employed to measure the phase delays of the optical carriers, and the obtained phase shifts (PI codes) were sent back to the TX. Fig. 7 illustrates a typical phase determination process. Fig. 7(a) displays the RX signals of the first two data packets. By employing the BBPD and FSM, the initial lead vote is observed to be 28, while the initial lag vote is 4 (see Fig. 7(b) and (c)). Based on these vote results, PI code is decremented by 1 and the reference clock is delayed by 1/64 × 2π. Then the phase detection process is repeated. After 5 decisions, the lag vote and lead vote become equal, indicating the attainment of a stable PI code (see Fig. 7(d)). It should be noted that slight fluctuations (±1/64 × 2π) may occur after the lag votes and lead votes reach equal. However, these fluctuations resemble the CDR locking process of practical PI CDR circuits that are implemented by FPGAs and do not impact the data recovery process.
During the gap of the subsequent 99998 packets, TX adjusts the phases of the reference clock periodically to compensate the phase offsets, allowing PLL-based CDR to rapidly lock onto the correct phases of data packets for both wavelengths. The PI codes are updated every 1000000 packets (100 ms). The bit error ratio (BER) is calculated after a 100-second transmission test using a BER module modeled in MATLAB/Simulink (see  Fig. S1). The BER module compares the recovered data with preset local data. With the phase compensation process, the received data is completely accurate, except for the first two packets, demonstrating an instant phase locking for the subsequent packets.

B. Transceiver Performance Under Interferes
Temperature variations are a significant factor affecting phase delays, as extensively studied in [20]. However, other parameters such as reference clock synchronization, channel jitters, and optical carrier power variations also impact the phases. To assess the stability of the OPS network with different reference clock source locations, we characterized the distribution of the 800 M optical source. By introducing two optical fiber modules with varying lengths, we designed five cases to examine different scenarios (see Fig. S7 in supplementary materials). The PI codes varied among the cases, indicating changes in detected phase offsets due to the different distances traveled by the optical reference clock. After performing corresponding phase compensation in the first two data packets, the recorded BERs for subsequent data packets were consistently 0 across all cases. This implies that the phase compensation CDR is insensitive to location variations of optical reference clock source.
We further introduced deterministic jitters into the channel. Sinusoidal jitters with peak-to-peak amplitudes (SJ pkpk ) of 0.1UI, 0.3UI, and 0.5UI were separately added at a frequency of 1MHz. The eye diagram of the received data is shown in Fig. 8. The PI codes produced by the FSM remained unchanged for the three deterministic jitters. This suggests that channel deterministic jitter of up to 0.5UI SJ pkpk has a negligible influence on the phase variation. The BER results after 1000 cycles of transmission tests showed a value of 0.
Next, we introduced random jitters into the system with different root mean square (J RMS ) values: 0UI, 0.05UI, 0.1UI, 0.15UI, 0.20UI, and 0.25UI. The resulting eye diagrams are displayed in Fig. 9. The BER values, as shown in Fig. 10, demonstrated that when the RMS of random jitter was below 0.1UI, the recorded remained at 0. However, as the J RMS exceeded 0.1UI, the BER increased exponentially with the RMS amplitude. The standard deviations were calculated from fifty sets of repeated experiments.

C. Influence of Carrier Power Variation
In the real data center environment, the carrier power may suffer from variations because of the mismatch of different light sources and semiconductor optical amplifiers. To evaluate the  influence of optical carrier power variations, the optical carrier power is first set as 0 dBm. Then after transmitting the first 500000 packets, the powers of both optical carriers are switched to 2 dbm to send the following packets ( Fig. S6 in supplementary materials). The eye diagram of the optical signals with zero jitters added is shown in Fig. 11(a). Fig. 11(b) shows the two signals after passing through LA. Signals of 0 dbm suffer from an obvious reduction in duty cycle, which shrinks the data sampling interval. But the additional phase offset is not observed in Fig. 11(b). BER results are all tested to be 0.
We further evaluate the performance of the CDR under the simultaneous influence of channel jitters and carrier power variations. As shown in Fig. 11(c), random jitter of 0.1UI RMS was added into the channel, and it can be seen that the optical signals suffered from obvious disturbances. After the signals pass through LA, a narrow eye diagram is observed from Fig. 11(d). The BER of the two carriers (0 dBm and 2 dBm) with the same random jitter is recorded by the BER module, which is 6.2 × 10 −10 and 2.7 × 10 −12 respectively. The BER can be further decreased by inserting error-correcting code into the data packets.

IV. CONCLUSION
In this study, we have presented a novel optoelectronic co-simulation method that combines Opti-system for optical simulations and MATLAB/Simulink for electrical simulations. Using this simulation method, we have designed and analyzed a nanosecond optical packet switch (OPS) network equipped with a nanosecond phase caching CDR. To provide more insights into of the CDR, we investigated the impact of reference clock location variations and found that it does not disrupt the CDR locking. Furthermore, we have examined the stability of the CDR by introducing channel jitters and optical carrier power variations into the simulation system. Our results demonstrate that the nanosecond CDR can tolerate deterministic jitters up to 0.5UI and random jitters up to 0.1UI (RMS). However, when random jitters exceed an RMS value of 0.1UI, the bit error rate increases exponentially. Additionally, the CDR can handle optical carrier power variations from 0 dBm to 2 dBm. Based on the evaluation and analysis, we conclude that the phase-compensation CDR is promising to support the nanosecond OPS networks for future data center communications and our simulation can assist in the design of Application-Specific Integrated Circuits (ASICs) specifically tailored for this technology.