Training Free In-Service Chromatic Dispersion Estimation Using Power Spectrum Analysis for MZM Based IM/DD Optical Communication Systems

In high-speed optical fiber communications, chromatic dispersion (CD) estimation is essential for the monitoring and compensation of CD-induced impairments. Compared with coherent detection, in short-reach optical communication systems, intensity modulation/direct detection (IM/DD) is more widely used, and the existing dispersion estimation techniques usually require training signals or a large amount of data for curve fitting or machine learning. However, these methods cannot provide in-service estimation. This paper presents a novel technique for in-service CD estimation in IM/DD short-reach optical communication systems based on our derived analytical model. With the analytical model, the in-service CD estimation is realized via the direct power spectrum analysis of the received signal. Unlike previous studies that only work with single-frequency sinusoidal signals, the practical random data with the rectangular type of base waveform is considered in our model. Therefore, our method can be applied to the dispersion estimation of various pulse amplitude modulation (PAM) based systems. Our proposed training free in-service CD estimation method is verified and investigated, where results show that better than 0.5% accuracy in dispersion estimation can be achieved in optical communication systems with up to 40 GBaud data rate and 100 km transmission distance.


I. INTRODUCTION
C HROMATIC dispersion (CD) has been one of the main limiting factors in optical fiber communication systems and networks [1], [2]. CD causes light waves with different wavelengths to propagate in the fiber at different velocities, and hence, results in the relative delay in arrival time and pulse broadening. Therefore, CD generates inter-symbol interference (ISI) distortion and detection errors [3]. This limitation becomes more important with the recent trend of reconfigurable optical networks since the dispersion accumulation becomes a common issue due to the path length switching [4]. To ensure the quality Manuscript  of signal transmissions, CD needs to be compensated at the receiving end, where accurate estimation is the first step. Hence, there is an urgent need for an accurate, fast and flexible dispersion estimation and monitoring method. Several CD estimation methods have been proposed in previous studies, such as the time domain method [5], [6], [7], the frequency domain method [8], [9], [10], and the machine learning (ML) method [11], [12]. However, these methods are based on the assumption of coherent optical communications, which are typically used for long-range transmissions.
In recent years, short-reach optical communication systems, such as inter-or intra-data-center communications, optical access networks, and mobile front-haul / back-haul, have also attracted intensive interest from both industry and academia. Unlike long-haul systems, these short-reach applications are cost-sensitive due to their large deployment scale [13]. Hence, together with low-cost optical transceivers, the incoherent intensity modulation/direct detection scheme has been widely utilized. With the wide deployment of 5 G technology, the Radio over Fiber (ROF) and Radio over Passive Optical Networks (RoPON) principles have been widely used, which leads to even stronger demand for an accurate dispersion estimation method with a wider estimation range that can be adapted to high speed IM/DD channels [8], [14]. In addition, the optical communication system and network have transformed from the traditional static network to the dynamic network, with no predefined spectrum grid or modulation format, and with reconfigurable network topologies, such as the elastic optical networks (EON). Both the wavelength and the distance of signal transmission may vary, which results in the dispersion at the receiver side not being a fixed value. Therefore, it is challenging for the conventional physical dispersion compensation fiber-based CD compensation method. Therefore, flexible CD compensation using signal processing is needed, where accurate dispersion estimation is the prerequisite.
To cope with the dispersion estimation requirement in IM/DD-based short-reach optical communication systems, a number of approaches have been proposed and studied. Modulation Phase Shift (MPS) method analyzes the phase variance of the sinusoidal optical signal to determine the relative group delay by using a Kerr phase-interrogator [15]. However, the interrogator is expensive and requires a long processing time.
The interferometric method has also been proposed, and it utilizes an interferometer to measure the wavelength-dependent delay between the test sample and a reference channel, which is usually a single-mode fiber with a known group delay [16], [17]. However, the measurement range of this method is limited, and it is time-consuming [18]. The Downhill-Simplex fitting method has also been proposed, which is based on measuring the channel frequency response and fitting the result to an existing model until a set of parameters with the largest degree of similarity are found [8]. However, this method is limited to single-tone sinusoidal signals, which requires training sequences for estimation.
In the literature, several analytical models with the dispersion effect included have also been proposed, and in principle, these analytical models can be used to realize dispersion estimation [19], [20], [21]. The study in [19] builds an analytical model for the dispersive optical fiber communication link, which transmits a single frequency RF sinusoidal signal. The phase change caused by dispersion was considered, which leads to the periodic change of the received RF power. Based on the power fading of the fundamental and the second-order harmonic of the sinusoidal signal due to dispersive group delay, the dispersion can be estimated. Another analytical model has also been established, and the effect of the fiber CD and the phase noise of lasers is analyzed [20]. In previous analytical models, the transmitted signals are assumed to be single frequency sinusoidal signals, and the impact of dispersion is then analyzed in the frequency domain by using Bessel functions, which significantly simplifies the analytical expression of the received signal. However, the assumption of sinusoidal signals being transmitted in the system is not practical, and hence, it cannot provide accurate dispersion estimation, or it requires a particular training signal. Therefore, to realize training-free in-service dispersion estimation, a complete analytical model that can handle practical data streams is needed. However, to the best of our knowledge, such a method is not presented in the literature.
To fill this important gap, this paper presents a novel approach for in-service dispersion estimation based on the frequencydomain power spectrum analysis of short-reach optical communication systems employing IM/DD. The proposed method does not require a single frequency sinusoidal signal to be transmitted and can be applied to general PAM signals, which are widely used and standardized in short-reach optical communication systems. This approach is based on an accurate analytical model that we derive here, which analyzes the influence of CD on the power spectrum of transmitted random amplitude-modulated signals. By cycling part of the received random signal and analyzing the Fourier series, the complicated random signal spectrum can be simplified to a periodic signal spectrum composed of various harmonics, and our model provides an accurate closed-form expression for these harmonics. Based on the characteristic of the obtained spectrum, where dispersion leads to the power fading of odd order harmonic frequency components, we can calculate the accumulated CD, and hence, the dispersion can be estimated accurately.
By applying the theoretical model described above, the signal arriving at the receiver side can be used directly for in-service dispersion monitoring without requiring additional training symbols, which can affect the signal transmission and the system throughput. In addition, since the rectangular wave is considered as the base waveform in the equations derived from our analytical model, the proposed dispersion estimation method can be applied to different orders of PAM formats, which are widely used in IM/DD based short-reach optical communication systems. The proposed training free in-service dispersion estimation method is verified. Results show that the average estimation error is lower than 0.5%, and similar accuracy is achieved over a large range of fiber length (1-100 km), data rate (up to 40 GBaud) and laser central wavelength (1540-1560 nm).
The remainder of this paper is organized as follows. In Section II, the analytical model of short-reach optical communications, is described, and the closed-form expressions of the power spectrum of the received signal are derived, which form the theoretical foundation of the dispersion estimation. In Section III, the dispersion estimation method is introduced, where the closed-form expressions of the power spectrum of the reconstructed periodic signal and dispersion estimation are provided. In Section IV, theoretical calculation, numerical simulation, and experimental verification results are presented and discussed. Finally, Section V summarizes this paper.

A. General Architecture of Short-Reach Optical Communications
As a technology widely used in short-reach communications, such as access networks and inter-or intra-data-center networks, IM/DD saves the system's expenditure due to its simple structure. This type of system generally consists of the data source, the laser transmitter, the intensity modulator, the optical fiber, the photodetector, and the electronic receiver circuit. Fig. 1 presents the architecture of a typical short-reach optical communication system, where the data to be transmitted is modulated to the light wave generated by a laser diode (LD) with a dual-drive Mach-Zehnder modulator (DD-MZM). Here we consider the DD-MZM as other MZM can be considered as a special driving case of DD-MZM. For example, the DD-MZM is equivalent to the single-drive MZM when the modulating voltage is applied to one arm with the other arm connected to the ground. The output signal from the MZM is then transmitted via the optical fiber, and here we consider the single-mode fiber. After fiber transmission, the data-carrying optical signal is then detected by a photodetector, typically based on PIN photodiode (PD), and the corresponding photocurrent is generated. In this type of short-reach system, optical amplifiers are normally not used due to the cost consideration, and hence, the main impairments that limit the performance are the phase noise and relative intensity noise (RIN) generated by the laser diode, the CD accumulated during fiber transmission, and the shot and thermal noise generated by the PD. All of these impairments will be incorporated into our model.
In the system, we use the push-pull operation mode of the DD-MZM, where out-of-phase signals are applied to the upper and lower arms. After passing through the modulator, the electricfield of the optical signal at the output of MZM can be expressed as: where A i = √ P 0 defines the electric-field amplitude of the light radiated by the LD, and P 0 is the laser output power. The laser phase noise Φ ld (t) follows a Wiener process, which is the result of the integral of a white Gaussian noise process with zero mean and γ ld = 2πΔv variance, where Δv is the laser linewidth [13]. δ i (t) is the amplitude noise from LD, whose standard derivation is ρ. v π is the half-wavelength voltage of the modulator, θ i is the bias phase shift, and v(t) is the input signal. The amplitude noise of LD is typically considered as the relative intensity noise (RIN), and for a standard semiconductor laser, the relationship between RIN and laser amplitude noise is [22]: where δ i (ω) = F F T (δ i (t)).

B. Analytical Model of Short-Reach Optical Communications
In short-reach optical communication systems, the most widely used modulation format is non-return to zero (NRZ) PAM, such as the basic PAM-2. Higher-order PAM (e.g., PAM-4 and PAM-8) is also popular, which uses the spectrum more efficiently to achieve a higher data rate [14]. Therefore, we consider the PAM modulated signal that is transmitted in the optical fiber in our analytical model. For simplicity, the following analysis is based on simple PAM-2. However, the same analysis approach can be applied to general PAM, which will be discussed later.
In order to use Fourier series to analyze random signal, we set the input signal v(t) to a periodic signal composed of multiple copies of random binary sequences, whose period is T = NT b . Hence, the expression of v(t) is: where v 1 and v 2 are PAM modulating signal amplitudes, N is the number of transmitted symbols, T b is the time duration of one symbol, S i (n) is the random binary data to be transmitted (n is a positive integer), and S i (n) is the bit-wise not of S i (n).
Combining (1) and (3), the electric-field of the modulated optical signal E T can be expressed as: For simplicity, we let . Considering the rectangular pulse function in (4), the Fourier series of E T can be expressed with coefficients C 0 and C n : where C 0 and C n are coefficients of the DC component and the n th order of the Fourier series, and ω s is the fundamental angular frequency. C 0 and C n can be expressed as: For simplicity, according to Euler's formula, we converted C 0 and C n into the exponential form, where A 0 and θ 0 are the modulus and argument of C 0 , respectively, and A n /n and θ n are modulus and argument of C n , respectively. Substituting (6) and (7) into (4), the electric-field of the modulated optical signal E T can be expressed as: The phase-shift due to dispersion can be expressed as τ 0 = −β 2 Lω s [21], where β 2 is the group velocity dispersion (GVD) parameter. Considering the effect of dispersion, then the electricfield of the optical signal after fiber transmission at the PD side can be expressed as: where A i is the electric-field amplitude after optical fiber transmission, and The PD is a square law device, and hence, the photocurrent I out generated by the PD can be written as: (11) where denotes the responsivity of the PD, I DC defines the DC component of the generated photocurrent, I sig (t) defines the photocurrent generated by the received signal, and I beat (t) is the beat harmonics component of the photocurrent generated by dispersion, i th is the noise current generated by thermal noise and i sh is the noise current generated by shot noise. The expression of the DC component is: where The general expression of photocurrent generated by the received signal is: where the N th order component, which we refer as the N th order signal harmonic, is: In addition, the beat harmonic component of the photocurent caused by dispersion I beat (t) can be expressed as: where the N th order component, which we refer as the N th order beat harmonic, is: Based on (14), the power of the N th order signal harmonics is: where R is the resistance and for simplicity, and we assume that the equivalent resistance here is 1 Ω. Since this is just a constant that does not affect subsequent results, it can be ignored.
In (17), exp(−γ ld Nτ 0 ) represents the power fading caused by laser phase noise together with dispersion [20], where τ 0 defines the group delay for the fundamental angular frequency ω s , and it can also be expressed as: Since in high speed short-reach optical communications, the fiber length L is typically within 100 km and the linewidth Δv of a semiconductor communication laser is small than 10 MHz [23], √ 2πΔv Nω s . Hence, the signal phase change caused by the finite linewidth of LD φ ld is much smaller than the dispersion induced phase change 1 2 β 2 L(Nω s ) 2 . Therefore, the signal phase change due to laser linewidth, which is the φ ld term in (10), can be ignored here. In addition, since the laser intensity noise δ 0 has a standard derivation ρ 1, ρ 4 is much smaller than ρ 2 , and it can also be ignored. With these approximations, (17) becomes: Similar with the N th order signal harmonic, using (16), the power of the N th order beat harmonic in can be expressed as: So far, we have proposed an accurate analytical model incorporating the influence of CD on the power spectrum of transmitted amplitude-modulated signals. In our model, accurate expressions of frequency components generated by the signal and beat harmonics caused by dispersion have been derived, which can be used for CD estimation as discussed in the next section. In addition, RIN and laser phase noise have also been incorporated into our model, and hence, the influence of RIN and laser phase noise on the received signal and the system performance can also be obtained for further analysis.

III. DISPERSION ESTIMATION METHOD
In the previous section, the accurate analytical model of the received signal in short-reach IM/DD optical communication systems has been built. However, due to the randomness of the transmitted signal, realizing dispersion estimation is still challenging. To solve this issue, in this section, we propose a dispersion estimation method based on the analytical model derived whilst greatly reducing the complexity. Using this method, we don't need to insert any training signal and can realize the estimation directly from the transmitted data signal. This avoids reducing the transmission capacity and realizes the in-service estimation.
The proposed low-complexity dispersion estimation method includes two major steps: data selection and dispersion calculation. We first select and slice a portion of the received signal satisfying the 50% duty cycle requirement (i.e., the same duration for high and low power levels), which we refer to the selected signal segment, and we treat this signal segment as a symbol of a periodic signal. As will be shown later, to achieve the dispersion estimation, we only need the fundamental and the 3 rd order harmonics of the selected signal segment. Theoretically, the number of symbols in the selected signal segment can be arbitrary as long as the 50% duty cycle requirement is met. However, in practical applications, the sampling rate and the bandwidth aspects also need to be considered. Assume the bit rate of the originally transmitted signal is B (bit/s), the spectral efficiency of the modulation format is η se (bit/s/Hz), and the number of bits in the selected signal segment forming a periodic signal is n, since the 3 rd order harmonic of the periodic signal is needed for the dispersion estimation, the bandwidth B w and sampling rate F s required in the proposed approach are: To avoid adding extra requirements to the bandwidth and the sampling rate of signal detection, B w and F s need to be equal to or smaller than the bandwidth and the sampling rate required for normal signal detection, which equal to B/η se and 2B/η se , respectively. Hence, at least 4η se bits are needed in the selected signal segment. More detailed discussion on the selected signal segment will be presented in Section IV.
After the data selection process, we can estimate the dispersion using the theoretical framework developed in Section II. Based on (6) and (7), the Fourier series of this selected signal segment with 50% duty cycle can be simplified to: where n is a positive odd number. For simplicity, all terms except 1/n of C n are converted into the exponential form, where A b is the modulus and θ b is the argument. Similarly, (9), which describes the electric-field of the selected signal segment with 50% duty cycle with the impact of dispersion, becomes: In this case, the corresponding power of the N th order signal harmonic shown by (19) can also be simplified to: Since the power of RIN is proportional to the signal bandwidth, at a single frequency point, the power of RIN is very small. Hence, we can ignore ρ in the following process of dispersion estimation. According to (18), the power fading caused by laser phase noise with dispersion is exp(−γ ld Nτ 0 ) ≈ 1. In this case, (26) can be simplified to: term is the dispersion induced power fading, and N is an odd integer.
In addition to the signal part, the power of the M th order beat harmonics shown by (20) can also be simplified to: where M is an even integer, and k and m are odd integers.
From the spectrum of the selected signal segment with 50% duty cycle, the signal harmonics as expressed by (26) and the beat harmonics as expressed by (28) generated by dispersion are easy to be separated. This is because, without dispersion, the power spectrum of the reconstructed data sequence only consists of odd-order signal harmonics. On the other hand, even-order beat harmonics are also generated in the power spectrum of the reconstructed data sequence with the impact of dispersion. Therefore, using the observations discussed above, by calculating the dispersion induced power fading on the signal harmonics (odd-order), the accurate dispersion can be estimated. Using this principle, the estimation process is as follow: first, based on (27), the power ratio of the a th order signal harmonic and the b th order signal harmonic can be expressed as: where a and b are odd positive integers. Since P a,sig and P b,sig can be obtained directly from the power spectrum, the product of β 2 L can be calculated. Hence, in practice, we can then calculate the accumulated CD without knowing the exact distance and the dispersion coefficient β 2 by: where c is the speed of light, and λ is the laser's central wavelength. Furthermore, the CD induced differential group delay is also related with β 2 L [24], and the relationship between them is: where f sig is the frequency of the periodic signal formed by the selected signal segment, which equals to the symbol rate divided by the number of symbols sliced from the original received signal. Therefore, based on the method discussed in this section, we can significantly reduce the complexity of the random signal analytical model by selecting a signal segment with 50% duty cycle and analyzing the corresponding much simpler power spectrum. Based on the power spectrum analysis, accurate CD estimation can be realized.

A. General Setting and Verification of Analytical Model
The equations derived in the theoretical model discussed in Section III were verified using numerical simulations. The purpose of the numerical simulation is to verify the results of different harmonics described by (26) and (28) in the theoretical model, where the power spectrum of the received signal in the simulation was obtained by FFT. In the simulation, we considered a typical short-reach optical communication system, which follows the same system settings as shown in Fig. 1.
The key system parameters used in the simulation are listed in Table I. We considered RIN, laser phase noise, shot noise and thermal noise, which are the typical noise sources in IM/DD systems [13]. The amplified spontaneous emission (ASE) noise was not included, as optical amplifiers are normally not used in short-reach systems due to the cost consideration. The simulation was conducted using the Monte-Carlo type method. Since the various types of noises are random and conform to the normal distribution, we performed fifty times of simulations and then averaged the results.
In the theoretical calculations, we started with selecting the signal segment satisfying the 50% duty cycle requirement, and then we calculated the argument of C 0 and C n (i.e., A 0 , θ 0 , A b and θ b ) using (23) and (24). The results were then substituted Here, as discussed in Section III, the bandwidth and sampling rate required for normal signal detection were considered, and thus, the proposed dispersion estimation method did not require higher bandwidth or sampling rate. First, we show the results of a 10 GBaud PAM-2 signal. The selected signal segment is "0011" and it is found by using the FFT based spectrum scanning method that will be detailed in the next section. The bandwidth and sampling rate used are 10 GHz and 20 GSa/s, respectively, and the results are shown in Fig. 2(a). We can see that the calculation results using the analytical model agree with the Monte-Carlo simulation results, and the average variation is about 0.2%. We have also considered higher data rates and the higher order PAM-4 format. The results obtained when the symbol rate increases to 20 GBaud with PAM-4 modulation are shown in Fig. 2(b). The bandwidth is set at 20 GHz and the sampling rate is 40 GSa/s, and we use the selected signal segment of '0033,' which will be discussed in detail later as well. It can be seen that similar to the previous scenario, there is also a good match between the theoretical calculation results and the simulation results, and the average variation is about 0.25%. Hence, comparable results can be obtained under practical bandwidth and sampling rate limitations, verifying the feasibility of our proposed approach and the accuracy of our theoretical model.

B. Data Selection and Dispersion Estimation
With the accuracy of the derived theoretical model verified, in this section, we further investigate the in-service dispersion estimation method based on the analytical model. As discussed in Section III, the first step of dispersion estimation is the data selection, and the key requirement of the data selection process is selecting a part of received signal satisfying the 50% duty cycle requirement. Whilst a wide range of signal segments can be selected in principle, in practice we also need to consider the constraints of bandwidth and sampling rate. For example, the signal segment satisfying the 50% duty cycle constraint in a 10 GBaud PAM-2 system can be '01' (or equivalent '10'), '1100' (or equivalent '0011') or '111000' (or equivalent '000111'), etc. If we consider the selected signal segment as a 'symbol,' as shown in 22, the bandwidth of the corresponding periodic signal constructed from these signal segments is 5 GHz, 2.5 GHz and 1.67 GHz, respectively. Considering the need for the 3 rd order harmonics, the required bandwidth is then 15 GHz, 7.5 GHz and 5 GHz, respectively. Since 15 GHz (corresponding to the signal segment of '01') is beyond the normal detection bandwidth requirement of 10 GHz, and the probability of finding a '111000' signal segment is much lower than the other two cases, we select '1100' (or equivalent '0011') as the signal segment for further processing, which has a fundamental frequency of 2.5 GHz. Even considering the need for the 3 rd order harmonic of the selected signal segment, the bandwidth requirement, in this case, is still lower than 10 GHz (the receiver bandwidth required by the original random signal), and the sampling rate requirement is lower than 20 GSa/s (the sampling rate required by the original random signal satisfying the Nyquist criteria).
After deciding the selected signal segment, the next step is searching for such signal segment in the received random signal. To do so, we divide the received signal into blocks and use spectrum scanning to find the part of the received random signal that matches the frequency characteristic of the target signal segment (e.g., '1100' or equivalent '0011' in the 10 GBaud PAM-2 system example above). The length of the signal block scanned is 1024 symbols, which is typically sufficient to find a signal part matching the selected signal segment discussed above. The frequency characteristic of the target signal segment that we use here is that such a data sequence has a much higher power density at the fundamental frequency than other possible data sequences.
We also would like to highlight that whilst the selected signal segment with a wide range of lengths discussed above works for the proposed dispersion estimation method, it does not provide the optimal performance. This is because the dispersion imposed onto the symbols adjacent to the selected signal segment may affect the spectrum. Instead, having two extra opposite bits both before and after the selected signal segment can solve this problem and provide optimal performance. For example, in the 10 GBaud PAM-2 signal example above, instead of using '1100,' having two extra opposite bits '00' before and two extra opposite bits '11' after the selected signal segment can achieve the optimal performance (resulting in a search target of '00110011'). Table II summarizes the selected signal segment and the search target of various data rates and modulation formats considered in this paper to achieve the optimal performance. The corresponding fundamental and 3 rd order harmonic frequencies of the selected signal segment are also summarized.
The overall data selection and dispersion estimation process is shown in Fig. 3, which consists of three key steps. In step (i) and (ii), since the length of the search target is 8 symbols, which correspond to 16 sampling points (satisfying the Nyquist sampling criteria), we take each sampling point and the 15 sampling points after it to form a 16-points data segment. The power spectrum of the data segment is obtained using FFT. For example, in the 10 Gb/s PAM-2 example above, since the fundamental frequency of the selected signal segment (i.e., '0011' or '1100') is 2.5 GHz, we target at the power density at 2.5 GHz in the power spectrum. For a symbol length of 1024 symbols (2048 sampling points), we have 2033 such data segments (i.e., 2048-15) and the corresponding power densities at 2.5 GHz, as shown in Fig. 4(b). In step (iii), since the selected signal segment has the highest power density at the 2.5 GHz fundamental frequency, we can find the data segment that matches the search target using this power spectrum characteristic. In step (iv), the middle 8 sampling points of the 16-points data segment found in step (1) (i.e., '0011' or '1100') are sliced, and these are the final selected signal segment that we use for dispersion estimation. Afterward, in step (v), we calculate the power spectrum of the 8-points selected signal segment, and use the power of the fundamental frequency and the 3 rd order harmonics for the dispersion estimation using (29) and (30). We then select the 16-points data block with the higher power spectral density from the two scans and use it for the subsequent dispersion calculation.
Using the proposed dispersion estimation method described above, the key results are shown in Fig. 4(a) and (b), which shows the received 10 GBaud PAM-2 signal after PD detection with 20 GSa/s sampling rate (time domain signal) and the corresponding 16 sampling points FFT scan, respectively. Based on this method and using the parameters listed in Table I (the PD bandwidth was set at 10 GHz), the accumulated dispersion as described by (30) was estimated to be 1012.5 ps/nm. Compared with the actual accumulated CD of 1012.1 ps/nm, which is obtained using the actual GVD of standard single-mode optical fiber at 1550 nm (β 2 = −21.5 ps 2 /km [13]), the estimation error is about 0.4 ps/nm (0.04%).
The previous verification is based on the use of the PAM-2 signal, and the proposed dispersion estimation method is also applicable to higher-order PAM signals. Here we also demonstrate the proposed method for the 20 GBaud PAM-4 signal by using the same procedure shown in Fig. 3 and the same parameters in Table I (the PD bandwidth was set at 20 GHz). Similarly, for the 20 GBaud PAM-4 signal, we select '3300' (or equivalent '0033') as the signal segment for further processing, which has a fundamental frequency of 5 GHz. Compared with other data sequences, such as '3311,' '2200' and '1122,' the fundamental frequency (i.e., 5 GHz) of the '3300' data sequence has a higher power density in the power spectrum. Therefore, using this important characteristic, we can find the selected signal segment that will be used for subsequent calculations. Similar to the PAM-2 example, in the 20 GBaud PAM-4 signal example, instead of using '3300,' having two extra opposite bits '00' (resulting in the PAM-4 symbol of '0') before and two extra opposite bits '11' (resulting in the PAM-4 symbol of '3') after the selected signal segment can achieve the optimal performance, which leads to a search target of '033003,' as shown in Table II. Here we select the length of the signal block to be scanned at 4096 symbols to find a signal part matching the target signal segment.
The process of dispersion estimation is the same as that of the PAM-2 case. We also obtained the power of the 1st order and the 3rd order signal harmonics from the power spectrum and then used (29) and (30) for the dispersion estimation. This process with PAM-4 signal is also shown in Fig. 4(c) and (d). Using this method, the accumulated dispersion was calculated to

C. Simulation Results and Discussion
To better demonstrate the capability of the analytical model and the proposed dispersion estimation method, we conducted more studies by changing the signal transmission distance. All other parameters were kept the same as those shown in Table I, whilst the fiber lengths were changed between 100 m and 100 km. The dispersion was then theoretically estimated using the derived equations and the proposed method. We compared the estimated results with the real values (the actual dispersion value of fiber). The real values are calculated by standard ITU G.652 fiber CD coefficient D λ = 17 ps/(nm-km) using (30) and (31). The results are shown in Fig. 5.
In the estimation, the PAM-2 format was considered, and the data rate was either 10 GBaud or 40 GBaud. It can be seen that the dispersion differential group delay accumulates as the transmission distance increases, and the accumulation speed is faster when the data rate is higher. However, regardless of the data rate or the fiber length, accurate dispersion estimation can always be achieved using our proposed method, where the average error is about 0.43%. The average estimation error increased from 0.23% to 0.46% when the fiber transmission distance changed from 10 km to 100 km. Therefore, the estimation accuracy dependence on the fiber length is insignificant in the proposed method. The decrease in estimation accuracy is mainly due to the larger signal attenuation and the larger accumulated dispersion. We also considered a bandwidth-limited case with an 8 GHz receiver, which is lower than the 10 GBaud data rate. The average estimation error is shown to be 0.47%, which is similar to the 10 GHz case. In addition, as discussed in the previous section, the proposed dispersion estimation method can be applied to other higher orders of PAM signals as well. Therefore, the proposed dispersion estimation method can be applied to different data rates with different fiber transmission distances in IM/DD-based short-reach optical communication systems.
The CD estimation range of our model in the investigation is shown to be 1.7 ps/nm -1700 ps/nm (corresponding to the dispersion of 100 m -100 km standard single mode optical fiber). For the upper limit, the proposed method is capable of even larger dispersion estimation, whilst we limited it due to practical considerations, where the distance of a short-reach communication system is usually no longer than 100 km.
In addition, the proposed dispersion estimation method was also investigated when the signal was transmitted using different wavelengths, where the laser central wavelength changes between 1540 nm and 1560 nm. The results are shown in Fig. 6. In the estimation, the PAM-4 format was considered, the data rate was 10 GBaud, and the fiber length was 20 km, 40 km, or 80 km. It can be seen that with different fiber lengths and laser central wavelengths, our estimated results maintain high accuracy. The average dispersion estimation error is 0.25%, 0.35% and 0.45% for 20 km, 40 km and 80 km fiber transmissions, respectively. In addition, the estimation accuracy is almost the same across all wavelengths. Therefore, our proposed approach can be applied to a wide range of wavelengths, and it has the potential to be used in the wavelength-division multiplexing (WDM) system dispersion estimation.
Moreover, the capability of the proposed dispersion estimation method was also demonstrated when the RIN varies between -110 dB/Hz and -150 dB/Hz together with a large laser phase noise of 10 MHz linewidth, since the typical linewidth of semiconductor lasers used in high-speed short-reach optical communication systems is smaller than 10 MHz [23]. The results are shown in Fig. 7. In the investigation, the PAM-2 format was considered, the central wavelength was 1550 nm, and the data rate was either 10 GBaud or 40 GBaud. It can be seen that for both 10 GBaud and 40 GBaud data rates when the RIN increases, the average error rate maintains at around 0.4%. In addition, when the data increases, the average accuracy doesn't change much. Therefore, our proposed method is robust to these impairments inherent to the light source.
In addition, the estimation accuracy of the proposed dispersion estimation method was also investigated when the SNR is changed from 25.3 dB to 19.7 dB. We fixed the transmission distance at 30 km, the modulation format at PAM-2 and the data rate at 10 GBaud/s in the simulation. As shown in Fig. 8, the estimation error increases slightly when the SNR of the received  signal decreases. This is mainly because the power spectrum density at the target frequency has larger fluctuation when the SNR decreases, leading to a slightly larger estimation error. As shown by the result, the estimation error is still small even when the SNR decreases by 5.6 dB, showing the robustness of the proposed method.

D. Experimental Verification
The experimental platform setup is similar to the system setting shown in Fig. 1. A DFB laser operated at 1550 nm wavelength was used to generate the optical signal, and the optical wave was then modulated by 223-1 PRBS data via a 10 GHz MZM in the PAM-2 format. After transmission through the standard single mode fiber (SSMF), a variable optical attenuator (VOA) was added before the PIN PD to adjust the received optical power. After being converted to the electrical domain, the received signal was captured by a sampling oscilloscope for offline dispersion estimation. The laser used in the experiment had a RIN of −145 dB/Hz and its output power was fixed at 2 dBm. The data rate was 10 GBaud/s, and the SSMF had a CD coefficient of 17 ps/(nm-km). Five different transmission distances of 10 km, 20 km, 30 km, 40 km, and 60 km were tested. At the receiver side, the PD had a bandwidth of 8 GHz,  9. Experimental results on the estimated CD v.s. the real CD. Data rate = 10 GBaud/s, sampling rate = 20 GSa/s, and modulation format = PAM-2. and the sampling rate was fixed at 20 GSa/s. The experimental results of dispersion estimation are shown in Fig. 9. The average dispersion estimation error is 0.22%, 0.32%, 0.24%, 0.26%, 0.35%, and 0.48% for the five fiber transmission distances, respectively, verifying the accuracy of our proposed method. In addition, experimental results also agree well with the simulation results.
Table III compares our estimation method with other existing methods. It can be seen that the accuracy of our method is comparable with the methods implemented in both coherent and incoherent systems. However, those coherent detection schemes are applied in long-haul systems, and they are not suitable for incoherent IM/DD systems. Compared with other previously reported schemes for incoherent IM/DD systems [8], [26], we considered various types of noise in our model and proved our method remains accurate in a RIN, laser phase noise, thermal noise, and shot noise limited system. In addition, no training sequences are required in our proposed method, and hence, in-service dispersion estimation without affecting the data transmission can be achieved.

V. CONCLUSION
In this paper, we have proposed a training-free in-service dispersion estimation method for short-reach IM/DD-based optical communication systems. The proposed method has been based on the complete and accurate theoretical model of the IM/DD system that we derived. It has been shown that this method does not need training, and it can estimate the dispersion without interrupting the data transmission by using the frequency-domain power spectrum analysis of the received signal. The derived analytical model has been verified using numerical simulations, where the theoretical calculation results are consistent with system simulation results with a variation of 0.2%. In addition, the proposed in-service dispersion estimation method using the derived equations has also been verified. Results have shown that even under high laser phase and intensity noise, the dispersion estimation error does not exceed 0.5%. In addition, it has been shown that the proposed method can be used for general PAMbased short-reach IM/DD optical communication systems over a large range of transmission distances, data rates, and signal wavelengths.
In this paper, the derived analytical model has been verified by extensive simulations, with the simulation parameters selected with reference to practical systems. The limitations of sampling rate, bandwidth, and various noises in practical applications have also been included. To further validate the proposed method, we have performed experimental verification on our theoretical model. Results show that an estimation error better than 0.5% can always be achieved for up to 60 km fiber transmission distance in the experiment, which agrees well with simulations and verifies the accuracy of our proposed method.
In this work, we have considered the optical transmitters based on external optical modulators, particularly the Mach-Zehnder modulators (MZMs), since they have been widely used in shortreach IM-DD systems [27], [28], [29]. While our work considered the dual-drive Mach-Zehnder modulator (DD-MZM), it can be considered as equivalent to a single-drive MZM when setting the lower arm input voltage v 2 in (4) to 0. Hence, our theoretical model is also applicable to the widely used single-drive MZMs and the corresponding IM-DD systems. On the other hand, in short-range IM-DD systems, DML and EML are also widely used. Whilst the proposed method is still valid, the exact theoretical expression of the signals in DML and EML systems is different from that of the MZM system. Hence, in our future work, we will extend the theoretical model and the proposed estimation method further to EML and DML-based systems.