Feed-Forward Frequency Domain Equalizer for Fast SOP Transient Tracking in Coherent Optical Modems

We propose a feed-forward frequency domain adaptive linear equalizer that can track fast state-of-polarization (SOP) transients in the fiber. In the proposed method, we rely on the pilot symbols that are inserted into the DSP frame to estimate the channel response and then compensate for it. Unlike the decision-directed least mean squares (DD-LMS) algorithm, the proposed circuit is designed in a feed-forward fashion and avoids the latencies involved in DD-LMS feedback loop in order to achieve the best possible SOP tracking capabilities. The proposed algorithm is shown to be robust against high order differential group delay (DGD) and polarization dependent loss (PDL) and offers a significant tracking improvement compared to DD-LMS.


I. INTRODUCTION
C OHERENT optical communication has paved the way for more flexible and more efficient higher data rates over the last few years. Digital signal processing (DSP) algorithms are an essential part of coherent technology. Advanced modulations, probabilistic pulse shaping, and forward error correction (FEC) are all part of the signal processing chip that powers the coherent transmission. Adaptive filtering and carrier recovery are widely used to mitigate linear fiber impairments and to track time-varying channel changes.
Multiple studies have shown a high correlation between mechanical vibrations and lightning strikes to fast SOP transients [1], [2], [3]. The SOP angular velocity can reach 5.1 MRad/s in the case of a lightning strike around an optical ground wire (OPGW) link [4]. Therefore, it is important that the adaptive equalizer can track fast SOP transients.
Constant modulus algorithm (CMA) is a blind equalization algorithm that exploits the amplitude characteristic of the signal while least mean squares (LMS) utilizes either the known pilot data or detected symbols as the reference signal. CMA and LMS are both based on stochastic gradient descent algorithm and they both suffer from slow convergence [5], [6]. While the CMA algorithm is robust to the carrier frequency and phase offset [7] and consequently allows decoupling of carrier-recovery (CR) module from the equalizer, the DD-LMS requires the compensation of frequency and phase offset [8], which significantly increases the delay of the error signal. The DD-LMS error feedback delay along with the coefficient calculation latency results in slow tracking capability of the conventional LMS algorithm. The authors in [9] proposed a time domain feed-forward (FF) LMS algorithm based on the phase locking of X and Y polarization pilot symbols. The proposed architecture is not sensitive to phase and frequency offset and therefore, eliminates the feedback loop. The location of the pilot symbols is estimated at the receiver and then extracted prior to equalization. The paper does not discuss how the performance is affected against linear channel impairments which may lead to inaccurate estimation of the pilot symbols location. In [10], frequency domain channel estimation and equalization are proposed. Timing and frequency offset compensations are first applied to the received signal and then, the pilot symbols are extracted for channel estimation. The authors did not investigate the performance of the proposed algorithm in the presence of PDL and laser phase noise. The proposed structure may not be able to estimate the carrier frequency offset in the presence of fast SOP changes. A dual-stage equalizer is investigated in [11], where the first stage involves chromatic dispersion compensation in the frequency domain. The pilot symbols are then extracted in the time domain and the channel estimation and equalization are performed in the frequency domain in the second stage. To operate with linear fiber impairments, a guard interval is added to the pilot symbols, therefore increasing the overhead ratio. A recent study tackles the fast SOP transient tracking in the presence of PDL by combining least-squares estimation and a decision-directed sliding window technique [12].
Kalman filtering has become a popular approach for channel estimation and equalization in recent years. The authors in [13] propose a radius-directed linear Kalman filter for tracking fast SOP changes in the fiber. The proposed algorithm works on a symbol-by-symbol basis and is insensitive to phase noise and carrier frequency offset. However, the proposed solution does not support fast SOP tracking in the presence of polarization This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ mode dispersion (PMD) and PDL. An extended Kalman filter for SOP tracking is also presented in [14] where PMD and PDL are not taken into account. A window-split frequency domain non-linear Kalman filtering algorithm is presented in [15] for compensation of fast SOP transients and large PMD.
In this study, we propose a pilot-aided feed-forward frequency domain equalizer, where channel estimation and adaptive equalization are performed directly in the frequency domain. This is achieved by cross-correlating the received signal with the transmitted pilot sequence (PS). A frequency domain smoothing approach is proposed for noise suppression. An inter-FFT-block filtering is then applied to the estimated channel response for further improvement of the estimation accuracy. The response is used to calculate a minimum mean square error (MMSE) or zero-forcing (ZF) filter followed by frequency domain interpolation to full frequency resolution. Finally, the received signal is equalized by the calculated filter response. The proposed architecture is shown to achieve excellent tracking of polarization drift in the presence of PMD and PDL. We also discuss how to achieve optimal channel estimation with various pilot sequence mappings in the DSP frame. It is worth mentioning that pilotbased channel estimation is studied extensively in the wireless domain, e.g. [16], and it is not a new concept. Although the proposed feedforward equalizer employs the well-known least squares (LS) channel estimation, we also present many other contributions to make it suitable for coherent optical modems.
Notation: Matrices and scalars are represented by bold and non-bold italic letters, respectively. The n × n identity matrix is expressed by I n×n . An m × n matrix of all ones and all zeros are denoted by 1 m×n and 0 m×n , respectively. (·) H , (·) T , (·) * , (·) −1 indicate complex conjugate transpose, transpose, complex conjugate, and inverse of a matrix, respectively. For a matrix V of size m × n, V ij denotes the entry in the i-th row and j-th column where i ∈ {0, ..., m − 1} and j ∈ {0, ..., n − 1}. Phase of a complex number is denoted by ∠· Linear convolution is represented by * , circular convolution by , and circular cross-correlation by . Symbol E[·] is used for statistical expectation operation, tr{·} is the trace of a matrix, | · | represents the determinant of a matrix, and || · || 2 is the l2 norm. Lower and upper case letters are used for time domain and frequency domain representations of signals, respectively.

A. Fiber Model
The basic linear model of an optical fiber consists of group delay (GD), chromatic dispersion (CMD), PMD and PDL, plus amplified spontaneous emission (ASE) noise modeled as additive white Gaussian noise [17]. The fiber second-order chromatic dispersion frequency response is expressed by where ω is the angular frequency, D in ps/nm is the chromatic dispersion, λ in nm is the operating wavelength, c in nm/ps is the speed of light, and j = √ −1. The group delay transfer function is given by where τ in s is the group delay. The Jones polarization rotation matrix is represented as where θ, φ, and ψ are the rotation angles in Stokes space. When simulating random polarization rotations, the angles θ, φ, and ψ are chosen according to the method proposed in [18,Appendix A.2] to ensure uniform distribution of Stokes vectors over the Poincaré sphere. The transfer function of a single PDL element, with attenuation loss of α dB in units of decibels, is expressed as where k = 10 −α dB /10 , H Jones (θ, φ, ψ) and H Jones (θ, φ, ψ ) are the polarization rotation matrices at the input and output of the PDL, and θ , φ , and ψ are the rotation angles in Stokes space. The PMD matrix H DGD is unitary and is formulated as where τ d in s is the DGD. We consider a distributed PMD/PDL model where N dgd independent DGD and PDL elements are concatenated together. The overall transfer function is given by [19], [20] The average PDL of the link at angular frequency ω is defined as [21]ᾱ where λ max and λ min are singular values of H PMD/PDL (α dB , τ d , ω, N dgd ) and λ max ≥ λ min . The total PDL of the link in units of decibels is then given bȳ Fig. 1 shows the fiber channel model with the addition of carrier frequency offset and laser linewidth impairments. The distributed PMD/PDL is split into two segments so that a polarization rotation element can be placed in the middle. Two other rotation elements are also considered; one close to the transmitter (TX) and another one close to the receiver (RX).

B. Channel Estimation
The dual-polarization received signal is formulated as where t  (9), the frequency domain representation of the received signal is given by where T x (ω) and T y (ω) represent the FT of the transmitted signals for X and Y pol, respectively.
The discrete frequency domain representation of the received signal can be derived by applying an FFT operation to a block of symbols. Employing overlap-and-save or overlap-and-add filtering in the frequency domain, the N FFT -point FFT of the windowed sequence of the received signal is given by (11) Note that the discrete output frequency is defined as where S x [f i ] and S y [f i ] are the FFT of the windowed transmitted signals which only contain pilot symbols with payload symbols set to zero, and M x [f i ] and M y [f i ] are the FFT of the transmitted signals with the pilot symbols set to zero. We assume that the pilot symbols are distributed uniformly in a frame (Fig. 2). If the frame size is N f , and the number of pilot symbols is N p , then there would be P − 1 payload symbols between two pilot symbols where P = N f /N p . The time domain pilot signal for X pol is given by where p x [n] is the PS for X pol and δ[n] is the Kronecker delta function. Since the pilot symbols are upsampled by a factor of P , the FFT of where Since the length of the FFT used to process the incoming signal is longer than the fiber channel response, after compensating for chromatic dispersion and group delay, we assume that the channel response is relatively constant over a narrow range of N c frequency bins. Consequently, (11) can be expanded to .
where H is a 2 × 2 matrix representing the channel response for frequency bins f 1 to f N c , namely, (15) can be written in matrix form as The least-squares estimation of channel response based on the PS is obtained aŝ where S † = (S H S) −1 S H is the pseudo-inverse of S. The estimation error residual is derived as The channel estimation mean square error (MSE) can be calculated as The fiber channel matrix is unitary in the presence of group delay, polarization rotation, chromatic dispersion, and PMD. For the purpose of the analysis in this section, we assume there is no PDL in the channel. Consequently, where w[n] is the mask applied to the transmitted sequence to filter out the payload symbols defined as (22) and t[n] and σ 2 t are the time domain representation and variance of transmitted symbols, respectively. Note that we assume the transmitted symbols are independent, i.e., (20) can be simplified to One can calculate matrix S H S as where σ 2 S x and σ 2 S y are the average energy of X and Y pilot symbols across the sub-band of interest. Finally, MSE is given by It is well known that the minimum MSE is achieved when σ S x S y and σ S y S x are 0 and [22]. This means the X and Y pol PS should be orthogonal and have the same average energy per symbol over the estimation sub-band, i.e., Consequently, the optimal MSE is given by Finally, channel estimation SNR for each sub-band is defined as the ratio of mean channel response power to estimation MSE, i.e., In order to improve the SNR, one can increase the number of transmitted pilot symbols in a frame, which also increases the overhead. The net loss in throughput as a result of transmitting pilot symbols is Increasing the power of the pilot symbols can also enhance the performance. Given the optimal pilot symbols, the channel estimation solution in (18) can be simplified tô Note that (30)   In order to improve the SNR of channel estimation, frequency domain smoothing and inter-FFT-block averaging is needed. The system block diagram of the proposed equalizer is illustrated in Fig. 3. The digital coherent receiver used in this study is based on the architecture proposed in [23] where the received symbols first go through a static filter that compensates for the fiber chromatic dispersion. An adaptive equalizer is then utilized to track the channel variations. The FF delay in the figure is to compensate for the processing delay of the parallel path which calculates the feed-forward equalizer (FFEQ) coefficients.

C. Pilot Symbols Location
The analysis for channel estimation in Section II-B assumed uniformly distributed pilot symbols in a frame. In order to achieve the optimal least-squares channel estimation, the FFT of PS cross-correlation must be a scaled identity matrix over the estimation sub-band. Any two cyclic orthogonal sequences which have the same total energy satisfy this condition for the case of uniform distribution. Due to upsampling of pilot symbols, the spectrum repeats every N f /N p frequency bins. Therefore, as long as N c is an integer multiple of N f /N p , the optimality condition is satisfied. This property does not necessarily hold when a different positioning of pilot symbols is considered.
Constant amplitude zero auto-correlation (CAZAC) sequences are well-known for their ideal periodic autocorrelation [24], [25], [26]. As proved in Appendix for a CAZAC sequence of square length, certain PS location mappings can maintain the ideal periodic auto-correlation property. Therefore, they satisfy the optimality condition for the least-squares channel estimation. Assuming N p = M 2 , where M is a positive integer number, the first M pilot symbols can be located anywhere in the first N f /M symbols of the frame. However, the rest of the pilot sequence should follow the same positioning pattern as the first M PS throughout the frame. As a result, the maximum number of consecutive pilot symbols would be M . As an example, let's assume N f = 128 and N p = 4. The DSP frame is divided into 2 parts of 64 symbols. The first 2 pilot symbols can be arbitrarily placed in the first part. However, the second pair of training symbols must occupy the same locations in the second part of DSP frame. For instance, if the first 2 training symbols are at index 1 and 2, the second set must be at index 65 and 66 of DSP frame to satisfy optimality.
Complementary sequences [27], which are pairs of sequences for which the sum of their out-of-phase aperiodic autocorrelation is equal to zero, can also be used. For the optimality condition to hold, the PS can be placed sequentially in a frame, i.e., in one cluster. In this case, the TX will be sending frames with alternating PS. At the RX, both pilot sequences are needed to be stored and they are cross-correlated alternatively with the received signal. Another approach is to use complementary sequences of length N p /2, and place them in two halves of the frame with maximum distance. Note that this does not result in ideal auto-correlation. However, with a proper smoothing function, the non-zero part of the auto-correlation can be filtered.

D. Pilot Sequence Filtering
Note that although payload symbols are independent of PS and their cross-correlation with PS pattern is expected to vanish by doing enough filtering across many FFT blocks, they still impose performance impact through degrading SNR of channel estimator, as the channel estimator aims to provide a fast tracking of channel variation. We propose applying a windowing mask to the RX signal to filter out the payload symbols to the greatest extent. However, due to channel linear impairments such as DGD, the pilot symbols are distorted and may not appear in the exact location. As a result, we apply a wide enough sinc pulse, defined as sinc[W n] = sin [πW n] πW n where W is a constant, to the received symbols. Notice that a sinc windowing function is favorable from a complexity point of view, as its equivalent frequency domain smoothing filter translates to a rectangular response. The windowing function has the maximum magnitude at the PS locations and slowly decreases as it reaches the payload symbols in the middle of two pilot symbols. The windowing function can be formulated as It can be shown that the discrete Fourier transform (DFT) of w s [n] is given by where f s is the sampling frequency. For the case of P = 32 and W = 0.1, this is simplified to Consequently, the filtering can be implemented by adders in the frequency domain. The output of PS filtering is then given by where R[f i ] is the DFT of the received signal. Note that the circular convolution would cause a slight distortion on the edge of the spectrum. To avoid this, one could perform a linear convolution instead, at the cost of higher number of frequency bins throughout the next blocks. Also, note that a DC offset can be added to the time domain PS filter to ensure the coefficients are non-negative. Alternatively, we can use sinc 2 [n] for windowing. However, these cases did not lead to a performance improvement in our simulations.

E. Carrier Frequency Offset
If the TX and RX lasers are not perfectly aligned, there would be a carrier frequency offset (CFO) Δ present in the received signal, i.e., The FT of the received signal is then given by We assume that accurate CFO estimation is available from the carrier-recovery/frequency-estimation block. One approach to compensate for CFO is to take the IFFT of the cross-correlation results, apply the conjugate of CFO phase ramp in the time domain and then take the FFT for further processing. However, this will be quite expensive in the hardware as it requires FFT and IFFT of length N FFT . A simpler approach would be to apply the phase ramp to S x [f i ] and S y [f i ] prior to cross-correlation. Note that we can split the frequency offset Δ into two parts: where Δ int = round(ΔN FFT /f s ) and Δ frac = Δ − Δ int . In order to apply Δ int to S x [f i ], we can simply do a circular shift, but the fractional part requires frequency domain interpolation. The interpolation can be avoided if a higher resolution FFT of the pilot template is calculated offline. In order to have a higher resolution FFT of the pilot template, we can zero-pad the PS in the time domain by an integer factor L. In this way, we obtain an L-times up-sampled version of the pilot template   Note that if the pilot symbols are not uniformly distributed across the frame, an LN -point FFT would be expensive to store for a large L. A linear interpolation can be done in the hardware in order to reduce the size of the required FFT. Fig. 5 depicts the SNDR for L = 2, 4, and 8 with interpolation. In this case, L = 4 provides a suitable minimum SNDR of 30 dB.

F. Frequency Domain Smoothing
In order to further diminish the contribution of additive noise to the estimated channel response, one can apply a time domain window to the estimated channel response. This can be accomplished by taking the IFFT of the estimated frequency The final output can be derived as Note that the total number of frequency bins for the decimated

G. Common Phase Estimation
Due to laser linewidth or CFO estimation error, H d will have a common phase across frequency bins prior to inter-FFT-block filtering. Such a common phase needs to be tracked and canceled out prior to filtering the learned channel response across FFT blocks. Otherwise, the channel response absolute value diminishes at the output of the filter due to being averaged across samples with different random phase values. The common phase can be estimated by calculating the angle of the determinant of the matrix H d [k]. However, such a common phase estimate would be noisy. To improve the accuracy of the estimated common phase, the estimated phase value can be averaged over multiple frequency bins. To calculate the common phase across all frequency bins, one can sum all the determinants and calculate the phase of the final result. However, in presence of a linear phase response due to effects such as common delay, the magnitude of the estimated phase vanishes. Here, we rely on the output of inter-FFT-block smoothing filter H avg corresponding to previous FFT blocks as a high-quality reference in order to derive the frequency-dependent common phase of frequency response. For this purpose, we de-rotate the derived phase of the determinant of each frequency bin k with the corresponding common phase of the learned smoothing filter output from previous FFT blocks. Then, the results are summed together across frequencies and the final phase is calculated by adding back the common phase of H avg , i.e., The edge frequency bins are dropped in (40), since the phase information is inaccurate due to lower SNR when raised cosine (RC) or root raised cosine (RRC) pulse shaping is used. We estimate the common phase of H avg using a different method that is robust against channel linear impairments. To calculate the common phase of H avg , the phase of frequency bin k is computed as Then, the unwrapped phase, designated with φ H avg ,u , is calculated as The overall common phase of H avg is derived by averaging over all frequency bins, i.e., There is a π ambiguity in the calculated phase φ H d as a result of the determinant calculation. The phase needs to be unwrapped with respect to the previous FFT block estimate using where φ Δ is the phase jump from FFT block i − 1 to i due to CFO. Finally, the common phase is removed from H d using where H φ [k] is the final estimated channel matrix.

H. Inter-FFT-Block Filtering
We propose three filtering methods for further noise suppression.
I. Moving average: A moving average (MA) over N avg consecutive FFT blocks is calculated for each element of H φ matrix. Note that the SNR improvement is proportional to N avg . The tracking bandwidth of the filter drops as 1 N avg , dropping the SOP transient tracking capability of the circuit as the channel coherence time becomes smaller than N avg FFT blocks. Let H avg [i, k] represent the time-averaged channel estimate of frequency bin k at FFT block i. We write

II. Zeroth-order Fixed-lag Kalman Smoother:
The standard Kalman filter model with no input is given by [28], [29] where X is the state vector, Φ is the state transition matrix, and W represents the process noise. The measurement at time index i is formulated as where H is the observation matrix and V is the observation noise. The process noise and observation noise are assumed to have zero-mean Gaussian distribution with a variance of Q and R, respectively. For the purpose of this study, we only consider the steady-state Kalman filter where the Kalman gain has stabilized to a steady-state value [30]. Furthermore, we assume the channel response deviations in time are purely additive Gaussian with a fixed standard deviation. Hence, the state transition matrix and observation matrix are simplified to be identity matrices, i.e., Φ = 1, Consequently, the standard Kalman filter equation at time index i can be simplified as where K is the steady-state gain and X (kf) [i] is the Kalman filter state. The Kalman error term is calculated as Note that the steady-state Kalman gain can be computed by solving the Riccati differential equation [29]. The fixed-lag smoother provides the state estimate at time i − L, given measurements up to time i. Using Rauch-Tung-Striebel smoother [31], one can derive the smoothed state at time i − L as whereẊ[i] denotes the derivative of channel response with respect to FFT block i. Therefore, the state transition and observation matrices are calculated as and The fixed-lag smoother corresponding to the first-order Kalman model can be computed as [30] where and P is the steady-state covariance matrix. One can solve the Riccati differential equation for the Kalman filter to calculate P, and then compute K and A. The number of FIR taps needed for the implementation of this method is 4 L per frequency bin.

I. Channel Equalization
LetĜ and V denote the final channel estimation and the estimation error, respectively. We assume V has a Gaussian distribution with a mean of zero and a variance of σ 2 V , and is independent of the transmitted symbols T with a variance of σ 2 T , the channel matrix G, and the white Gaussian noise N with a variance of σ 2 N . The received symbols are formulated as Here, we aim to find the optimum frequency-dependent 2 × 2 channel equalizer matrix W such that the power of error with respect to the transmitted sequence T is minimized. More specifically, denoting the output of equalizer byT, we havê where the optimal MMSE equalizer matrix W is given by [32] Thus, Assuming unitary channel matrix H, the SNR of MMSE filter output is given by The variance of channel estimation error is derived in (28). The frequency domain smoothing and inter-FFT-block averaging improve the MSE by a factor of roughly 4 and N avg , respectively. Therefore, Alternative choice of a channel equalizer is to ignore the effect of channel noise and derive a zero-forcing (pseudo-inverse) solution of the channel response as which is also known as the least-squares solution. Note that the MMSE equalizer in (60) converges to the LS solution for high SNRs.

J. Linear Interpolation
The equalizer matrix W is now a 2 × 2 matrix across N FFT 4N c + 1 frequency bins. However, it needs to be re-calculated for all frequency bins prior to received signal equalization. It would be significantly cheaper to perform the calculation of W mmse or W zf in the decimated frequency domain and then, linearly interpolate the response. Our simulations have also shown that there is no performance degradation in this case. We use piece-wise linear interpolation to calculate the full resolution frequency response. It can be shown that the frequency centers of the decimated frequency response are f k = − N FFT 2 − 1 2 + kN d , where N d = 4N c is the number of bins between every two points and 0 ≤ k ≤ N FFT N d . Since we are interested in frequency bins − N FFT 2 to N FFT 2 − 1, there is no need for extrapolation and all the points lie between the two endpoints of decimated response. The interpolated response can be calculated as

III. SIMULATION RESULTS
In this section, we evaluate the performance of the proposed FFEQ versus SOP rotation angular velocity through numerical simulations. The simulation setup is discussed in the first part. Then, the performance of the FFEQ system is characterized and the algorithm parameters are optimized. Next, the polarization drift tolerance of the FFEQ is compared with three benchmarks. Finally, the computational complexity of the FFEQ is discussed.

A. Simulation Setup
The system parameters are summarized in Table I. We consider a QPSK modulation with QPSK pilot symbols. The pilot symbols used on X and Y polarization are orthogonal and have the same energy per symbol as the payload symbols. We apply RRC pulse shaping at the TX and the same pulse shaping is also applied at the RX, prior to cross-correlation calculation. The RRC roll-off factor is set to 0.25, resulting in an upsampling rate of 1.25 sample/symbol. We assume that the group delay and chromatic dispersion are estimated during start-up and therefore, largely compensated. Although any linear equalizer such as CMA or recursive least squares (RLS) algorithm can be employed in combination with the proposed FFEQ, we consider the LMS algorithm. At start-up, the LMS equalizer works in training mode where it uses the pilot symbols embedded in the transmitted signal. Once the equalizer is converged, it moves into decision-directed (DD) mode where the detected payload data is used as the reference. The LMS feedback loop delay is assumed to be 32 FFT blocks. The chosen feedback loop delay is based on an estimate of the amount of processing needed for equalization of the incoming signal, IFFT, carrier recovery, symbol detection, as well as LMS coefficients calculation. A small LMS gain was used in order to minimize the impact of LMS on SOP tracking. The carrier recovery algorithm proposed in [33] is used for characterizing the performance of the FFEQ against frequency offset and laser linewidth. For all simulation scenarios, we consider a laser linewidth of 1 MHz. The distributed PMD/PDL fiber model consists of 20 first-order PDL and DGD segments with random Jones rotations at the input and output of each element. The SOP drift is simulated by a linear increase of rotation angles of the Jones matrix. The polarization rotation angular velocity shown in the figures is normalized by the baud rate and is in units of rad/sym. The SOP rotation element is located in the middle of the link and the channel SNR in most cases is set to 5dB-7 dB depending on the amount of PDL on the link, corresponding to an ultra-long-haul transmission over a distance of 8,000 km.  to the moving average method. As N avg grows, we observe a better performance at low-speed SOP rotation region, at the cost of lower performance at higher rotation rates. As N avg increases, the channel estimation SNR improves for a fairly static channel. However, when the channel coherence time becomes considerably smaller than N avg FFT blocks, the channel estimator is not able to fully capture the transient response. For the rest of this study, we consider N avg = 48, as it achieves a good trade-off between tracking capability and performance. Fig. 7 plots the system SNR versus time, where a 10 −4 rad/sym transient occurs from 0.1 ms to 0.4 ms. We considered two cases:

B. FFEQ Parameter Optimization
r No frequency offset for the solid blue curve. r A large CFO ramp is applied from 10% to 90% of the simulation for the dotted green curve. As evident in the figure, there is no significant penalty associated with the CFO ramp case, which demonstrates the robust tracking performance against laser CFO. Fig. 8 compares the methods proposed for inter-FFT-block filtering of estimated channel response, as a function of the FF lag of the system. The channel SNR is 5 dB and an SOP transient of 2.10 −5 rad/sym is simulated. The y−axis is the required SNR (RSNR) penalty of the system where the penalty is calculated with respect to the case where the channel is known and there is no transient. For the moving average case, the number of FFT blocks that channel response is averaged over is set to two times the lag plus one, i.e., the calculated average is applied to the FFT block in the center of the MA window. For the fixed-lag Kalman smoother schemes, the lag parameter of (52) and (56) is swept. As evident in the figure, the moving average performance improves when the window length increases. However, beyond a certain value, we observe a slight performance degradation. We also observe a significant penalty, when the MA window length is less than 20. On the other hand, the Kalman smoother methods provide reasonable performance in the low-latency region. As the delay increases, the performance is improved and then it saturates. The first-order Kalman smoother provides superior performance compared to the zeroth-order. However, for the case of the causal filter, where the delay is 0, we observe the same performance for zeroth-and first-order Kalman smoother. Therefore, it seems the first-order smoother is only beneficial if the system is non-causal. It is worth mentioning that the causal zeroth-order Kalman smoother is equivalent to the LMS filter. Fig. 9 compares the performance of MMSE equalizer with ZF approach along with the three choices for the inter-FFT block averaging. The plot shows around 0.2 dB improved RSNR for the MMSE filter in low SOP rotation rate regime. The improvement gap increases at higher rotation speeds. The first-order smoother outperforms both the moving average and zeroth-order smoother for a significant range of SOP rates. Moving average is the worstperforming approach among the three. Fig. 10 illustrates the performance impact of the pilot symbols location. Let n denote the number of pilot symbols clusters uniformly distributed in a frame, and m denote the cluster size, i.e., m = N p /n. The following n × m distributions satisfy the optimality condition when N p = 16: 16 × 1, 8 × 2, and 4 × 4. As the number of clusters decreases and more pilot symbols are placed sequentially, we observe a slightly improved performance. This is due to the PS filtering algorithm proposed in Section II-D, which can filter out payload symbols more efficiently. Without PS filtering, all three cases would exhibit similar performances. Fig. 11 illustrates the impact of SOP transient location on the performance of the FFEQ. In general, the transient can occur  anywhere on the fiber. We consider three scenarios where the transient is applied: 1) In the middle of fiber, i.e., there are 10 segments of PMD/PDL before the transient and 10 segments after the transient. 2) Close to the receiver, i.e., after all PMD/PDL elements.
3) Close to the transmitter, i.e., before all PMD/PDL elements. As evident in the figure, the worst case is when the transient occurs before the PMD segments, and the best case is when SOP drift takes place after all the PMD elements. When the transient occurs at last, the FFEQ can compensate for the SOP transient, and any residual PMD response can be compensated through the slower equalizer. However, when a PMD element exists following the fast SOP transient, the FFEQ has to compensate for the combined response of PMD and polarization state change.

C. Performance Comparison With Benchmarks
The polarization drift tolerance of the FFEQ is compared against three benchmarks, the DD-LMS, the sliding window least squares (SW-LS) algorithm proposed in [12], and the radius-directed Kalman (RD-Kalman) filtering scheme in [13]. The step size of the DD-LMS is set to 2 −4 which is the maximum stable gain corresponding to the highest tracking capability. The DD-LMS is implemented in the frequency domain with the same FFT and IFFT sizes outlined in Sec. III-A. For the SW-LS algorithm, the sliding window size L and sliding stride ν are set to 128 and 32, respectively. The value of window size L is chosen such that the BER of the system is similar to other algorithms for a static channel. A lower value will improve the tracking at the cost of a higher penalty at slow transient region. For the RD-Kalman system, the process noise covariance matrix Q and measurement noise covariance matrix R are set to 10 −5 I 4×4 and 10 −1 I 2×2 , respectively. Since none of the benchmarks support PMD, to provide a fair comparison among the schemes, parameters W in (32) and N avg are set to 1 and 3, respectively, for the FFEQ. All the algorithms other than the Kalman filter work on a block basis whereas the RD-Kalman scheme operates in a per-symbol manner. Fig. 12 illustrates the polarization drift tolerance of the four tracking schemes for PDL = 2, 4, and 6 dB. Although the RD-Kalman lacks the tracking capability with PDL theoretically, it still shows good performance for the case of PDL = 2 dB. The Kalman filter struggles to converge when the PDL becomes significant. The FFEQ and SW-LS exhibit similar tolerances in all PDL cases whilst the FFEQ is slightly advantageous at higher values of PDL. The DD-LMS algorithm has the lowest tolerance of all four algorithms.
The achievable BER as a function of channel SNR is depicted in Fig. 13 for PDL = 4 dB and several values of SOP transient rate. The FFEQ and SW-LS demonstrate excellent equalization performance at all SNR values. Although the DD-LMS algorithm provides good performance at SOP = 10 −6 in low SNR regime, its effectiveness degrades at higher SNRs. Consistent with the results in Fig. 12, the RD-Kalman algorithm suffers from divergence in the presence of significant PDL.
The effect of distributed PMD on the efficiency of the proposed equalizer is examined in Fig. 14, where the RSNR penalty is plotted against SOP transient rate for PMD = 30 ps, 60 ps, and 90 ps. The DD-LMS algorithm is the only scheme from the benchmarks that supports simultaneous equalization of polarization drift and PMD. Nevertheless, the results for the SW-LS and RD-Kalman algorithms are also presented for comparison. The FFEQ shows good performance up to 60 ps of PMD and provides over 10 times drift tolerance improvement over the DD-LMS. As evident in the figure, both SW-LS and RD-Kalman schemes fail at operating under PMD.

D. Computational Complexity Comparison
Since the SW-LS algorithm has the closest tracking performance to the FFEQ, the computation complexities of these two designs are summarized in Table II. Note that only a high-level   comparison is presented taking into account the required number of real multiplications and real additions per dual polarization symbol. The FFEQ hardware complexity is less than half of the SW-LS.

IV. CONCLUSION
We proposed a frequency domain feed-forward adaptive equalizer that is capable of tracking very fast SOP rotation transients. Simulation results show excellent polarization drift equalization capability while being robust against fiber impairments such as high order DGD, PDL, residual CMD, carrier frequency offset estimation error, and laser phase noise. The effectiveness of the proposed algorithm is compared to several benchmarks, demonstrating an improvement by an order of magnitude against DD-LMS.

APPENDIX APPENDIX PILOT SYMBOLS LOCATION
The elements of Frank-Zadoff-Heimiller polyphase sequence [26]  We re-write the proof given by [24], [26] and then, extend it to prove that a zero sequence of length M can be interleaved into the original sequence without impacting the ideal circular auto-correlation. where α = e j2π/M . The original sequence D is constructed by writing down the matrix column-wise from top to bottom and then from left to right. Let sequence D i represent i-th row of the aforementioned matrix, where 0 ≤ i ≤ M − 1. The author of [24] proved that where is the circular cross-correlation operator. Assume a new row of all zeros is inserted into the matrix at row q. Given that D q D i = 0 1×M , (68) remains valid for the modified sequence. The circular auto-correlation of sequence D can be expressed as circular cross-and auto-correlation of the interleaved subsequences D i [24]. If D D = (r 0 , r 1 , . . ., r M (M +1)−1 ), the sub-sequence R i = (r i , r i+M +1 , r i+2M +2 , . . . , r i+M 2 +M ) of length M can be computed as Therefore, the modified sequence, where a zero is inserted every M elements, does not modify the ideal periodic auto-correlation property. Extending this, one can insert many rows of zeros into the sequence to reach a desired sequence length and mapping of non-zero elements. Note that this approach is not limited to the Frank-Zadoff-Heimiller sequence and may be applied to other perfect polyphase sequences of square length.