Carrier FOE Scheme Based on FSTS in Spatial Diversity PM Coherent FSO Communication

The efficient and cost-effective suppression of atmospheric turbulence effects in free-space optical (FSO) communication poses a significant challenge. To address this challenge, a carrier frequency offset estimation (FOE) scheme based on frame-synchronous training sequence (FSTS) is proposed, which can be applied to spatial diversity polarization multiplexing (PM) coherent FSO communication system. The proposed scheme integrates the functions of frame synchronization (FS) and FOE by designing distinct training sequences for the X and Y polarizations. Specifically, the FOE scheme achieves high accuracy, wide range, low complexity, and multi-format versatility while maintaining FS accuracy. Simulation results for PM 4/16-quadrature amplitude modulation (QAM) demonstrate the feasibility and generality of the proposed scheme. Moreover, when combined with the spatial diversity reception under strong and weak turbulence conditions, the proposed algorithm exhibits higher receiver sensitivity compared to the conventional training sequence (TS)-based FOE algorithm. The advantages of this scheme make it highly promising for high-speed coherent FSO communications across multiple modulation formats.


I. INTRODUCTION
F REE-SPACE optical (FSO) communication systems offer various advantages such as high bandwidth, low power consumption, high security, and tariff-free bandwidth allocation [1]. However, their performance is heavily reliant on the atmospheric channel's link conditions. Coherent detection and digital signal processing (DSP) combined with M-ary quadrature amplitude modulation (M-QAM) is a potent solution for high-speed FSO communications compared to intensity modulation direct detection (IM/DD) [2], [3]. Additionally, polarization multiplexing (PM) technology with coherent detection can effectively achieve large-capacity and high-speed information transmission in FSO communications due to the small or even negligible effect of atmospheric turbulence on laser polarization [4]. However, ensuring the power budget of PM coherent optical communication systems remains a significant challenge due to the effects of optical intensity scintillations, phase fluctuations, and fiber coupling efficiency fluctuations caused by atmospheric turbulence [5], [6], [7]. Spatial diversity [8], [9] can effectively suppress the effects of atmospheric turbulence, but it also increases system complexity. The combination of digital coherent receivers (DCRs) with spatial diversity further increases system complexity, especially when the DSP algorithm complexity is high. The carrier frequency offset estimation (FOE) algorithm is a critical component of the DSP module and its estimation performance directly impacts the communication quality of the system. Implementing it in a real-time FSO communication system is challenging if its complexity is relatively high. The carrier FOE algorithms can be classified into two main categories, namely blind estimation and insertion of training sequence (TS). Blind estimation [10], [11], [12], [13], [14], [15], [16], [17] is mainly based on the feedforward approach for FOE, which includes the time-domain 4th power estimation algorithm and the frequency-domain maximum spectral line search algorithm. Although the time-domain 4th power estimation algorithm [10] is easy to implement, it is less compatible and generally cannot be applied to high-order QAM systems. The phase difference-based 4th power FOE algorithm was modified in [15] to estimate the frequency offset of 16-QAM using quadrature phase-shift keying partition, referred to as QPSK Partition. However, the estimation accuracy is low, and the tracking speed is slow because only consecutive QPSK partition symbols can be utilized. The frequency-domain maximum spectral line search algorithm [11] based on the Fourth-power fast Fourier transform (4th-FFT) can be applied to a variety of modulation formats and has high estimation accuracy, but it also has high hardware complexity due to the FFT involved [16]. Additionally, the algorithm has limited spectral resolution when the estimated number of symbols is small [17], which results in poor stability. Moreover, the above algorithm uses the 4th power to eliminate the modulated phase of the received signal, resulting in a smaller estimation range and further increasing the complexity of the algorithm.
The TS-based FOE algorithm [18], [19] uses the TS to remove the modulated phase of the received signal and avoids the 4th power operation that is commonly employed in blind estimation algorithms. As a result, its complexity is lower and the estimation range is wider. Specifically, the theoretical estimation range reaches [−1/2T s , +1/2T s ], where T s is the sampling period. Moreover, this algorithm can be extended to support multiple modulation formats. However, the estimation accuracy of this approach depends on the length of the TS. If the TS is long, then the algorithm complexity increases, resulting in higher system overhead, which negatively impacts spectrum utilization. To improve system performance, some studies [20], [21] propose combining the TS with either the Fractional Fourier transform (FrFT) or the FFT, but this increases the algorithm complexity. Another critical issue in the TS-based approach is accurately locating the starting point of the TS to estimate the frequency offset correctly. Although an algorithm for joint frame and frequency synchronization has been proposed in [19], it does not perform well with a short number of training symbols for FOE. Therefore, developing a generalized FOE scheme that is highly performant, low complexity, and supports multiple formats remains a challenging task.
To solve the above problems, a carrier FOE scheme based on frame-synchronous training sequence (FSTS) is proposed in this paper and applied with a spatial diversity PM coherent FSO communication system. The proposed scheme is designed with different TS in X and Y polarization, making it available for both FS and FOE. The TS is firstly used for FS to locate the starting point of the TS on the one hand, and to align the diversity branches for merging on the other hand. Second, since the same local oscillator (LO) is shared, each diversity branch can jointly compensate for signal impairment after maximum ratio combining (MRC) without the need for separate DSP for each branch, which further reduces system complexity. Finally, TS is used again to complete the FOE. The validity and reliability of the FSTS scheme are verified by PM 4/16-QAM FSO spatial diversity reception simulation platform. The rest of this paper is organized as follows. Section II describes the TS design and the principles of FS and FOE, and analyzes the complexity of the proposed algorithms for comparison. Section III is the simulation platform setup. Section IV presents comprehensive numerical simulation results and discussion. Finally, the conclusions of this study are given in Section V.

A. TS Design
In PM coherent optical communication systems, the frequency offsets on the two polarizations are theoretically identical. After going through IQ imbalance compensation, FS, and polarization demultiplexing, the training sequences on both polarizations can be used to jointly estimate the frequency offset and compensate the estimated frequency offset value to both polarizations. To accurately estimate the frequency offset using TS, the starting point of the TS needs to be located precisely. Moreover, the FS of different branches plays a crucial role in the merging effect of a spatial diversity reception FSO communication system. By designing different TS structures on X and Y polarizations, they can work together for FS and FOE.
The structure of the training sequences for FS and FOE is shown in Fig. 1 and consists of T X_1st , T X_2nd , T Y _1st , and T Y _2nd , with the front and back parts of each polarization being mutually conjugate symmetric sequences. Each part of the TS is divided into B N /2 blocks, with each block containing B L symbols generated by a pseudo-random bit sequence (PRBS). Specifically, the i-th block in T X_1st is identical to the i+1-th block in T Y _1st , and the i-th block in T Y _1st is identical to the i+1-th block in T X_1st (i = 13, 5 . . . , B N /2 − 1, B N /2 is an even number). Furthermore, the symbols within the block have a similar arrangement as described above, e.g., the i-th symbol in T A is identical to the i+1-th symbol in T B , and the i-th symbol in T B is identical to the i+1-th symbol in T A (i = 13, 5 . . . , B L − 1, B L is an even number).  Fig. 1 is employed in two DSP modules. The first module is FS, which serves the purpose of removing relative delay between branches prior to merging, as well as determining the starting point of the TS. This proposed TS design is based on the FS algorithm initially introduced by Park et al. [22]. The FS algorithm is capable of accurately determining the symbolic starting point of the TS through identifying the peak of the timing metric curve. The timing metric function of the mth branch is mathematically formulated as

B. Principles of FS and FOE
where C m (d) denotes the summation of the product pairs of the training symbols of the mth branch, which can be obtained  through the following expression: The received energy within the first half of the TS is represented by P m (d) = The designed TS length will not be very short because it is used for both FS and FOE, so the complexity of FS can be reduced by setting a threshold value while ensuring accurate localization. As shown in Fig. 3, the Park algorithm [22] calculates M m (d) by sliding a window throughout the training cycle, followed by locating the maximum value to determine the correct starting point . However, the complexity of the sliding window calculation can be greatly reduced by setting a reasonable threshold value that deterministically finds the starting point when M m (d) is greater than the threshold value.
The second module, FOE, is responsible for processing the signal after MRC and polarization demultiplexing. The signal on X and Y polarization at this time is mainly influenced by the frequency offset and linewidth of the laser, as well as the atmospheric turbulence, which can be summarized as where γ denotes the photodiode responsivity. ϕ , η, and I represent the phase fluctuation, coupling efficiency fluctuation, and optical intensity scintillation caused by atmospheric turbulence, respectively, all of which are slow-varying quantities relative to the symbol rate of GHz [23]. P LO represents the LO laser power, SX(k) and SY (k) denote the modulated signal at the kth sampling moment on the X and Y polarizations respectively, T s is the sampling period, f denotes the transmitter laser deviation from the center frequency of the LO laser, θ k represents the phase noise caused by the linewidth of the laser, N x (k) and N y (k) denote the Gaussian noise introduced by the coherent receiver, and θ x and θ y are the overall phase shifts introduced by the tap coefficient setting in the polarization demultiplexing algorithm [24], [25]. The initial values of the equalizer tapping coefficients are usually set as follows: the middle coefficient is 1 and the other coefficients are 0. At this point, the overall phase shift introduced by the polarization demultiplexing algorithm can be ignored. When only the phase is considered, the kth symbol sampled values on the X and Y polarizations are represented as where θ sx,k and θ sy,k represent the modulation phases of the kth sample value on the X and Y polarizations. θ nx,k and θ ny,k represent the Gaussian noise phases introduced by the receiver or amplifier [26]. The conventional TS-based FOE algorithm may not completely eliminate Gaussian phase noise when the training symbols are insufficient, thus reducing the accuracy of the estimation.
To enhance the accuracy of FOE, a two-stage FOE approach is proposed. The flow of the two-stage FOE is illustrated in Fig. 2 using a black dashed line. In the first stage, the coarse FOE can be mathematically expressed as As can be seen from Fig. 1, the 2i-th symbol on the X(Y) polarization is the same as the 2i-1-th symbol on the Y(X) polarization(i = 12, 3, . . . , B L · B N /2), and the phase of the modulated signal is eliminated by conjugate multiplication. In addition, the phase noise introduced by the linewidth of the laser and atmospheric turbulence is slow relative to the symbol rate, and the effect on adjacent symbols can be regarded as the same, so it is also removed in this step. The effect of Gaussian noise is then removed by summation, and the angle is taken to obtain the estimated frequency offset. With one sample per symbol, the estimation range can theoretically reach [−R s /2, +R s /2], where R s is the symbol rate, T s = 1/R s . The TS block length is denoted by B L and the number of TS blocks by B N , respectively.
The second stage refines the residual frequency offset that was not fully estimated in the first stage. Specifically, the frequency offset value estimated in the first stage is compensated for the TS, and the resulting TS is used in the second stage to estimate the residual frequency offset Δ f 2 = f − Δf 1 . The fine FOE is expressed as The proposed algorithm uses fewer symbols for estimation in the first stage FOE due to the design of the TS structure, which has a lower complexity and estimation accuracy compared to the conventional TS-based algorithm. Conjugate multiplication using the same symbols with spacing B L on both polarizations without increasing the training overhead can also  [17], [26]. In the second stage of FOE, the estimated value 2πB L Δf 2 T s + Δθ n is divided by B L , reducing the effect of residual Gaussian phase noise Δθ n by B L times [26]. This stage involves the same number of symbols in estimation as the first stage, but it has a higher noise tolerance, thus improving the accuracy of estimation. However, its estimation range is reduced by B L times, so it's only applicable to the estimation of fine frequency offsets, and the size of B L needs to be chosen reasonably. Moreover, since frequency offset drifts slowly in practice, the coarse FOE made in the first stage can be saved in a buffer for continuous use and updated periodically, which will further reduce computational complexity. As shown in the following equation, the estimated frequency offset values of the two stages are summed to obtain the final estimated frequency offset value and compensated to the whole training cycle sequence.

C. Complexity Analysis and Comparison
The conventional 4th-FFT frequency offset estimation (FOE) algorithm provides high estimation accuracy but has high hardware complexity. Specifically, for each FOE on each polarization, the algorithm requires 2Nlog 2 N + 10N + 2 real multiplications and 3Nlog 2 N + 5N real additions when the symbol length used for FOE is N [27], [28]. However, since 4th-FFT and other blind FOE algorithms do not require frame synchronization and training overhead, we only compare the hardware complexity of the conventional TS algorithm with that of the proposed algorithm, as shown in Table I. The length of the TS on each polarization is represented as N, and the complexity is calculated for both polarizations for all algorithms [29]. As shown in Table I, in the worst-case scenario, i.e., when two stages of FOE are required, the complexity of the proposed algorithm is similar to that of the conventional TS algorithm. However, the proposed algorithm achieves the best complexity when obtaining coarse FOE from the buffer in the first stage, with a complexity reduction of about 25% compared to the worst case (considering only real multiplications).
The proposed algorithm is compared with a TS-based joint processing algorithm in [19] for the overall complexity of FS and FOE in Table II. In this case, the complexity of the FS algorithm considers only the computation of C m (d), while assuming a sliding window M times (M is generally the length of the entire training cycle of symbols without setting a threshold). The proposed algorithm has lower overall complexity compared to the algorithm in [19], as shown in Table II. Furthermore, the  proposed algorithm does not require the backup of the TS at the receiver side, unlike the conventional TS-based algorithm. Fig. 4 depicts the simulation platform for spatial diversity reception of the 10GBaud PM 4/16-QAM FSO system. Initially, the external cavity laser (ECL) with a linewidth of 50 kHz is divided into two orthogonally polarized optical signals via a polarization beam splitter (PBS). Subsequently, 4/16-QAM data symbols and training symbols are generated from PRBS, where the structure of the training symbols on the two polarizations is illustrated in Fig. 1. The four discrete data signals generated correspond to the real and imaginary parts of the X and Y polarizations. The four discrete data are converted to electrical signals by a digital-to-analog converter (DAC), which drives a dual-polarization IQ modulator to modulate two orthogonally polarized optical signals. The signals are then combined into a single signal beam using a polarization beam combiner (PBC) and emitted into a computer-simulated free-space turbulence channel.

III. SIMULATION SETUP
The Fourier transform-based phase screen model was utilized to simulate the free-space turbulent channel [30]. The simulation range of atmospheric turbulence is z = 10 km. It is assumed that the outer scale of atmospheric turbulence tends to infinity while the inner scale tends to zero. Inset (a) of Fig. 4 shows the light field distribution before passing through the phase screen, whereas insets (b) and (c) depict the light field distribution after passing through the phase screen with different turbulence intensities expressed by the refractive index structure parameter [31] as C 2 n = 1 × 10 −16 m −2/3 and C 2 n = 1 × 10 −14 m −2/3 , respectively. The energy of the light field distribution fluctuations increases with the turbulence intensity.
At the receiver end, multiple receiving telescopes receive independent, fading optical signals and couple them into a single-mode fiber, with the aperture of the receiving telescope set to 0.2 m. The average coupling efficiency under weak and strong turbulence conditions is 67.3012% (C 2 n = 1 × 10 −16 m −2/3 ) and 4.8395% (C 2 n = 1 × 10 −14 m −2/3 ), respectively. In each coherent receiver, the received signal light is mixed with the LO laser, and the photoelectric conversion process is completed using balanced photodiodes (BPDs). Each receiver branch shares a LO laser, and for each branch, the LO laser's linewidth is set to 50 kHz with an output power of 15 dBm. The photodiode's responsiveness is 0.8 A/W. Shot noise and thermal noise are also considered here. Finally, the analog-to-digital converter (ADC) digitizes the signals, which are then processed in an offline DSP module. The module includes I/Q imbalance recovery, FS, diversity branch phase correction [23], MRC, polarization demultiplexing, FOE, phase noise estimation, decision-directed least-mean-square (DD-LMS), QAM de-mapping, and BER counting.  When the TS length is 48 symbols, the peak of the timing metric curve is not sharp enough. The peak decreases with the decrease of the received power value and cannot even be found when the power is low. Increasing the length of the TS to 320 symbols can solve the above problem. Although the complexity will increase with it, the complexity can be greatly reduced while ensuring accuracy by setting the threshold value. Fig. 6 shows the performance of the four-branch receive MRC on FS accuracy under strong turbulence (C 2 n = 1 × 10 −14 m −2/3 ) with different receiving optical powers. It also compares the FS performance with different threshold settings. Each FS accuracy point is simulated 6400 times and then averaged. The higher the received optical power, the higher the FS accuracy. At different received optical power, the FS accuracy increases with the TS length but tends to level off gradually. For FS sequence lengths greater than 128 symbols, setting a reasonable threshold value can ensure its accuracy while reducing complexity. Generally speaking, the lower the threshold value, the lower the computational complexity may be, but it often requires a longer TS. Combining the TS length and FS accuracy, a threshold value of 0.2 or 0.3 is appropriate. Fig. 7 shows a plot of the TS length versus receiver sensitivity for a BER of 2e-2 in the strong turbulence condition (C 2 n = 1 × 10 −14 m −2/3 ). It also compares the performance of FS with a set threshold of 0.2 or 0.3. As the TS length increases, the receiver sensitivity is improved. For 4-QAM and 16-QAM, the receiver sensitivity is improved by 3.47 dB and 2.49 dB, respectively, for the TS length from 48 symbols to 320 symbols, and by only 0.48 dB and 0.23 dB for the TS length from 320 symbols to 800 symbols. Taking into account the TS overhead, threshold setting, and FS performance, the shortest TS length can be set to 320 symbols.

B. Performance Analysis and Discussion of FOE
The second role of the TS is to accomplish FOE. To evaluate the FOE performance of the proposed scheme, a simulation analysis is first performed using a 10GBaud PM 4/16-QAM optical back-to-back (B2B) platform. Fig. 8 shows the relationship between BER and normalized mean-square-error (MSE) for different received optical powers. The normalized MSE is defined as E[|Δf est · T s − Δf · T s | 2 ], which represents the estimation accuracy and also corresponds to the residual frequency offset after frequency offset compensation. A normalized MSE of 6.25e-8 (16-QAM) or 2.5e-7 (4-QAM) can be considered as the threshold value at which BER starts to increase. When it is less than this threshold value, the residual frequency offset can be fully compensated by the subsequent phase noise estimation algorithm. A normalized MSE of 2.25e-6 (16-QAM) or 6.25e-6 (4-QAM) can be considered as the threshold value at which BER deteriorates severely. Fig. 9 shows the relationship between the normalized MSE and the (training) symbol length for different received optical powers for the 4th power [10] (QPSK partition [15]), 4th-FFT [11], TS [18], and the proposed algorithm. The block length B L of the TS in the proposed algorithm is 20 symbols. The  frequency offset used for the simulation is a random value in (−1.1 GHz, 1.1 GHz) to ensure that it is within the estimation range of all algorithms. The estimation accuracy of the four algorithms improves with an increase in the length of (training) symbols, with the 4th power algorithm having the worst estimation accuracy. When the number of estimated symbols is small, the lower spectral resolution of the 4th-FFT algorithm leads to poor estimation accuracy. Increasing the number of symbols gives good performance, but the complexity will also be large. When the length of training symbols is greater than or equal to 160 symbols, the proposed algorithm has better estimation accuracy at different received powers and outperforms the other three algorithms in 16-QAM. It is slightly lower than the 4th-FFT   algorithm when the estimated number of symbols is longer in 4-QAM, but according to Fig. 8, the residual frequency offset at this time does not affect the BER performance. Compared with the conventional TS algorithm (10 −6 ∼ 10 −7 ), the proposed algorithm (10 −7 ∼ 10 −9 ) can improve the estimation accuracy by more than one order of magnitude. Considering the performance-complexity tradeoff, and combining the previous simulations of FS and the effect of residual frequency offset on BER, the TS length selection of 320 symbols can obtain excellent FOE performance while ensuring FS accuracy. Fig. 10 illustrates the estimation accuracy for different TS structures with fixed TS lengths. The horizontal axis represents the TS block length B L . As B L increases, the estimation accuracy of the conventional TS algorithm remains fixed, while the estimation accuracy of the proposed algorithm initially increases and may later decrease after reaching a certain value, since the size of B L impacts the estimation accuracy and estimation range in the second stage of the FOE algorithm. Moreover, the design of the TS structure is also affected by the modulation format, the received power, and the total length of the TS. Specifically, when the total length of the TS is 320, the TS structure is designed as B N = 16and B L = 20for 4-QAM, and B N = 8and B L = 40 for 16-QAM, as depicted in Fig. 10(a) and (b). Similarly, when the total length of the TS is 960, the TS structure is designed as B N = 24 and B L = 40 for 4-QAM, and B N = 16 and B L = 60 for 16-QAM, as shown in Fig. 10(c) and (d). Fig. 11 illustrates the normalized MSE versus frequency offset and received optical power for different TS lengths in 4-QAM. Specifically, Fig. 11(a) and (c) present the normalized MSE versus frequency offset for the four algorithms with different TS lengths at a received optical power of −43 dBm. The FOE algorithm based on the TS expands the estimation range by approximately 4 times compared with the blind FOE algorithm by avoiding the 4th power operation. Moreover, the proposed algorithm exhibits higher FOE accuracy and stronger robustness compared to the other three algorithms. When the estimated number of symbols is 320 symbols, the 4th-FFT algorithm displays severe fluctuations due to the limited spectral resolution. Compared with the conventional TS algorithm (10 −7 ), the proposed algorithm (10 −8 ∼ 10 −9 ) enhances the estimation accuracy by more than one order of magnitude. Fig. 11(b) and (d) exhibit the normalized MSE versus average received optical power for the four different algorithms. The proposed algorithm may exhibit lower estimation accuracy than the other algorithms when the received optical power is low. However, the residual frequency offset of the other algorithms is also large at this point, which cannot be compensated by the subsequent phase noise estimation algorithm, leading to a high BER. As the received optical power increases, the estimation accuracy of the proposed algorithm improves significantly, stabilizes, and outperforms the other three algorithms. Fig. 12 depicts the normalized MSE versus both frequency offset and received optical power for different TS lengths in 16-QAM. Fig. 12(a) and (c) indicate that at a received optical power of −37 dBm, the proposed algorithm (10 −8 ∼ 10 −9 ) enhances the estimation accuracy by more than one order of magnitude compared to the conventional TS algorithm (10 −7 ), which aligns with the simulation results of 4-QAM. Additionally, Figs. 11 and 12 demonstrate that the estimation performance of the four algorithms improves as the length of   the (training) symbols increases. In the above simulation, 800 simulations are averaged for each frequency offset test point to obtain a stable and reliable MSE. Moreover, the frequency offsets are randomly selected from (−1.1 GHz, 1.1 GHz) for the same received optical power and averaged after 800 simulations. Fig. 13 shows the curves of BER versus received optical power for the four algorithms in the PM 4/16-QAM optical B2B system. The BER performance of the proposed algorithm is better than that of the other algorithms. The 4th power algorithm exhibits poor performance with a symbol length of 320 symbols, and the performance improves as the symbol length increases to 960 symbols, but is still inferior to the other three algorithms. The performance of the 4th-FFT algorithm is worse than the proposed algorithm when the symbol length is 320 symbols, and when the estimated number of symbols increases to 960, the performance is similar to the proposed algorithm, but the complexity is also greater. Fig. 13(a) and (c) show that the receiver sensitivity is improved by 0.98 dB and 1.72 dB for PM 4-QAM and PM 16-QAM, respectively, compared to the conventional TS algorithm at the forward error correction (FEC) threshold of 3.8 × 10 −3 for a (training) symbol length of 320. Fig. 13(b) and (d) show that the receiver sensitivity of the proposed algorithm is improved by 0.25 dB and 0.39 dB for PM 4-QAM and PM 16-QAM at the FEC threshold of 3.8 × 10 −3 for a (training) symbol length of 960, compared to the conventional TS algorithm.
To further evaluate the performance of the proposed scheme, we conducted simulation analysis using a 10GBaud PM 4/16-QAM FSO spatial diversity reception platform. The frequency offset was set as a random value in the range of (−1.1 GHz, 1.1 GHz), and the FS threshold was set to 0.2. Figs. 14 and 15 depict the BER versus the average received optical power for single-branch reception under weak and strong turbulence, respectively. We only compared the proposed algorithm with the 4th-FFT and the conventional TS algorithm. The 4th-FFT algorithm exhibited poor performance when the symbol length was 320 symbols, and the BER performance deteriorated significantly, particularly in PM 16-QAM. However, when the estimated number of symbols increased to 960, the performance was similar to that of the proposed algorithm. At the FEC threshold of 3.8 × 10 −3 , compared with the conventional TS algorithm, the receiver sensitivity of the proposed algorithm for PM 4-QAM improves by 0.46 dB and 1.12 dB under weak turbulence and 1.03 dB and 1.17 dB under strong turbulence, for TS lengths of 960 and 320 symbols. Similarly, the receiver sensitivity of the proposed algorithm for PM 16-QAM is improved by 0.73 dB and 2.23 dB under weak turbulence and 1.24 dB and 3.11 dB under strong turbulence. Additionally, the proposed algorithm achieves similar or superior BER performance to that of the conventional TS algorithm using 960 training symbols, when using only 320 training symbols under turbulent conditions. Furthermore, the proposed algorithm's optimal complexity is reduced by approximately 75%, considering only real multiplications.
Figs. 16 and 17 depict the BER versus average received optical power curves for the four-branch received MRC under weak and strong turbulence. At the FEC threshold of 3.8 × 10 −3 , the proposed algorithm improves the receiver sensitivity of PM     14 dB under strong turbulence for TS lengths of 960 and 320 symbols. Moreover, the BER performance of the proposed algorithm using 320 training symbols is similar to or better than that of the conventional TS algorithm using 960 training symbols. Meanwhile, the performance of the 4th-FFT algorithm using 320 estimated symbols deteriorates significantly in PM 16-QAM, which leads to the same conclusion as in the case of single-branch reception. Furthermore, as shown in Fig. 8, the subsequent phase noise estimation algorithm can fully compensate for the residual frequency offset when the estimation accuracy reaches a certain threshold. Thus, the impact on the BER performance becomes less significant as the length of the TS increases from 320 to 960 symbols, and the BER performance of the proposed algorithm may be similar when using 320 and 960 training symbols. Fig. 18 shows the BER versus average received optical power curves for MRC with different diversity branch numbers under weak and strong turbulence. In this case, the proposed algorithm is only compared with the conventional TS algorithm with the TS length of 320 symbols. At the FEC threshold of 3.8 × 10 −3 , with the number of diversity branches of 1, 2, 4, and 6, the proposed algorithm improves the receiver sensitivity of PM 4-QAM by 1.12 dB, 1.43 dB, 1.14 dB, and 0.8 dB under weak turbulence and by 1.17 dB, 2.09 dB, 0.7 dB, and 1.72 dB under strong turbulence. For PM 16-QAM, the proposed algorithm improves the receiver sensitivity by 2.23 dB, 2.54 dB, 2.45 dB, and 1.93 dB under weak turbulence and by 3.11 dB, 3.41 dB, 2.14 dB, and 2.39 dB under strong turbulence. The sensitivity gain of the PM 16-QAM receiver is more significant compared to PM 4-QAM. Simulation results demonstrate that the proposed FSTS-based FOE algorithm can be applied to spatial diversity coherent FSO communication with good performance in both strong and weak turbulence conditions and low hardware complexity.

V. CONCLUSION
This paper proposes a carrier FOE scheme based on FSTS for spatial diversity PM coherent FSO communication systems. Simulation results under turbulence show that the proposed algorithm using 320 training symbols has similar or better BER performance than the conventional TS algorithm using 960 training symbols, while the optimal complexity is reduced by about 75%. When the TS length of both algorithms is 320 symbols, at the FEC threshold of 3.8 × 10 −3 , for PM 4-QAM and PM 16-QAM, the receiver sensitivity of the proposed scheme's two-branch MRC is improved by 1.43 dB and 2.54 dB in weak turbulence and by 2.09 dB and 3.41 dB in strong turbulence. The simulation results demonstrate that the proposed FOE scheme can obtain high estimation accuracy and a large estimation range with guaranteed FS accuracy, while having the advantages of low complexity and multi-format versatility. The results indicate the potential application of the proposed scheme for highspeed coherent FSO communication with multiple modulation formats.