Toward Practical Digital Phase Alignment for Coherent Beam Combining in Multi-Aperture Free Space Coherent Optical Receivers

In free space optical communication systems, multi-aperture coherent optical receivers based on digital coherent beam combining (D-CBC) technique can provide exceptionally high sensitivity and are more robust to the atmosphere turbulence compared with the single aperture receivers with the same collection area. D-CBC relies on the digital phase alignment algorithm (PAA) to align the different versions of signals in phase. However, due to the limited working frequency and space of the digital signal processing (DSP) circuits, the main obstacle to realizing real-time phase alignment of multiple high-speed optical signals is the computation complexity. Therefore, we need to minimize the computation complexity while guaranteeing a satisfactory performance. In this paper, we investigate the relationship between the computation complexity and the combining loss (CL) for both maximum ratio combining (MRC) and equal gain combining (EGC) based D-CBC. Universal analytical expressions are deduced that allow easy minimization of the computation complexity for both MRC and EGC based receivers according to the prescribed CL and input optical signal-to-noise ratios (OSNRs). The analytical expressions are validated by extensive numerical simulations. It is demonstrated that the computation complexity is mainly determined by the quality of the signal with a larger OSNR in MRC, while it is determined by the overall quality of the signals in EGC. When EGC is replaced with MRC, the computation complexity can be reduced by more than 55% at the same CL when the OSNR difference between the signals to be combined is above 10dB. The maximum computation complexity increases exponentially with decreasing input OSNR lower limit and the smaller the CL, the steeper the slope. Furthermore, when the prescribed CL is relaxed from 0.1 to 0.5dB, the maximum computation complexity can be reduced by about 80%. The results provide useful guidelines toward practical phase alignment on a real-time platform.


I. INTRODUCTION
Compared with microwave communications, free space optical communications (FSOC) can exploit the unregulated spectrum in the near-infrared band, and can provide terminals with greatly reduced size, weight and power (SWaP) profile, thanks to the higher beam-directionality [1]- [6]. For the long reach FSOC systems, such as the satellite-to-ground link, the ground terminals often require a large aperture optical antenna to collect sufficient optical power. However, such The associate editor coordinating the review of this manuscript and approving it for publication was Jiafeng Xie.
large optical antennas are difficult to build and very expensive [5]. Furthermore, due to wave-front distortion induced by the atmospheric turbulence, the light collected by such large antennas is difficult to be efficiently coupled to a single-mode fiber or detector, unless complex adaptive optics is employed [7], [8].
To solve these problems, multi-aperture coherent optical receivers based on coherent beam combining (CBC) techniques have been proposed recently [9]- [11]. They can expand the collection area by coherent combination of multiple optical signals received by an array of small apertures. Compared with the single large aperture with the same VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ collection area, the cost of the small aperture array is much lower, while the performance achieved with CBC can be nearly the same [9]. Furthermore, when the aperture diameter is reduced to below 5cm, adaptive optics correction beyond tip and tilt is rarely needed for a diffraction-limited singlemode receiver [9]. In addition, spatial diversity can be applied to mitigate the atmospheric turbulence effect if the small apertures in the array are spaced apart by an interval much larger than the atmosphere phase correlation length [9]- [11]. According to the phase alignment method, the CBC techniques proposed by now can be classified into two kinds [9]. The first kind of techniques relies on analog methods [12]- [16]. They use complex optical phase-locked loops (OPLL) [12]- [15] or feed-back controlled accurate fiber variable phase delays [16] to compensate for the optical path mismatch with an accuracy as high as a small fraction of a wavelength. As the optical complexity is high and analog component insertion loss is significant, the scalability to a large number of signals is difficult. The second kind of methods utilizes powerful digital signal processors (DSP) in the digital coherent receivers (DCR) [9]- [11]. In the DCR, the fields of optical signals have been recovered in the digital domain, so they can be readily added coherently after the time and phase alignment algorithms (PAA) are performed [9], [17]- [19]. Thanks to the powerful algorithms, the digital CBC (D-CBC) technique is more robust to noise and optical path mismatch and can deal with a larger number of optical signals. Recently, D-CBC of optical signals with power as low as photons per bit (PPB) = −15dB and optical path mismatch as large as 24cm has been realized successfully in experiments [9], [11]. However, by now, D-CBC is only carried out off-line. The main obstacle for the real-time practical D-CBC is the algorithm complexity because even the cutting-edge DSP circuits face a great challenge in processing multiple branches of high-speed optical signals.
As low computation complexity is important for practical real-time processing, it is desirous to find a method to minimize the PAA computation complexity while guaranteeing an expected combining gain. We note that, due to space limitation, an ideal time alignment is assumed and we only focus on the PAA. The interested reader is referred to Refs. [9], [18] and [19] for further details of the time alignment algorithms. In previous work, we only studied the equal gain combining (EGC) based digital phase alignment [17]. In this paper, we extend the analysis to both EGC and maximum ratio combining (MRC) based digital phase alignment. Generic analytical expressions are deduced to describe the relationship between the computation complexity and combining loss (CL) for both MRC and EGC based receivers. With these expressions, the computation complexity can be easily minimized for free space optical signals with a large optical signal-to-noise ratio (OSNR) dynamic range according to the prescribed CL. The analytical predications are validated by extensive numerical simulations. Some useful guidelines are deduced for the realization of practical real-time digital phase alignment. The paper is organized as follows. In section 2, we investigate the relationship between the computation complexity and the CL and deduce a set of universal analytical expressions to describe it for both MRC and EGC. In Section 3, we analyze the different cases when the input OSNRs vary by 33dB and are as low as −20dB. The results obtained by analytical expressions and numerical simulations are presented. The computation complexities are compared for different combining techniques, prescribed CLs, and input OSNR lower limits. Finally, conclusions are drawn in section 4.

II. RELATION BETWEEN THE COMPUTATION COMPLEXITY AND COMBINING LOSS
The schematic setup of the multi-aperture coherent optical receiver and the major algorithms performed by the embedded DSP are shown in Fig. 1. The free-space optical signal is received by an array of small aperture receivers. In each branch, the weak optical signal is first coupled into a single-mode fiber and then amplified by a low-noise preamplifier to compensate for the large free-space transmission loss. Optical band-pass filters (OBPFs) with bandwidth close to the optical signal bandwidth are used to mitigate the outof-band amplified spontaneous emission (ASE) noise generated by the optical amplifier. In the parallel digital coherent receivers, the different versions of the free-space optical signal are recovered linearly in the digital domain. To reduce the cost, the parallel coherent receivers use a common narrow linewidth local oscillator (LO). In the DSP, after front-end distortion compensation, the optical signals are aligned in time and phase and then added coherently. After the D-CBC is completed, the standard carrier recovery algorithm can be applied to recover the transmitted symbols [17]. Fig. 1 shows the detailed steps of the phase alignment and coherent combining process. In this process, the sampling rate is equal to the signal baud rate. To obtained a higher electrical signalto-noise ratio (SNR), the different versions of the optical signal are combined one by one in the order of their SNR [9]. The optical phase offset (OPO) (ϕ n ) between the two signals (b n , b n+1 ) to be combined can be estimated by the following equations Here, b n [m] and b n+1 [m] stand for the m-th symbols of the two signals to be coherently combined, respectively. The upper bar of the symbol in Eq. (1) indicates conjugate. M represents the number of symbols required by the estimation. ϕ n and ϕ n stand for the estimated OPO and the corresponding estimation error, respectively. N represents the total number of signals to be combined. As showed in Fig. 1, phase alignment can be realized by multiplying the signal b n+1 with exp −j ϕ n to mitigate the OPO. In practice, ϕ n is not equal to zero, and thus, the practical combining gain is smaller than the ideal one when ϕ n = 0. The difference between the practical and ideal combining gain (in dB) is called CL [9], [17].
According to Eqs. (1) and (2), to balance the two competing requirements of low computation complexity and high estimation accuracy, M is the key parameter that should be chosen appropriately. To investigate the relation between ϕ n and M , we first investigate the D-CBC of two optical signals. The root-mean-square error (RMSE) defined as follows is used as the metric of the OPO estimation accuracy.
Here, · represents the expectation of the operand. When N = 2, the RMSE of the OPO estimation is approximately equal to (see Appendix) Here, A 1 and A 2 are the amplitudes of the pair of signals to be combined, respectively. σ 2 1 and σ 2 2 are the variances of the real and imaginary parts of the ASE noise appearing in the two signals, respectively, which follow Gaussian distribution N (0, σ 2 i ) (i = 1, 2). The variance σ 2 i is related to the SNR i and OSNR i by the following equation [20] Here, γ is a calibration factor related to the signal bandwidth and modulation format. When ϕ 1 = 0, the amplitude of the signal U 2 obtained after combining can be written as follows Here, k 2,1 = A 2 A 1 , w 1 is the weighting factor. We note that The corresponding total noise power appearing in U 2 is equal to According to Eqs. (5), (6) and (7), the practical SNR (OSNR) obtained with D-CBC has the following form Thus, the corresponding CL is given by Eq. (9) shows that CL is a function of ϕ 1 . Let's assume that P ϕ 1 is the probability distribution function of ϕ 1 . The averaged CL can thus be written as In summary, with Eqs. (4) and (10) we can predict ϕ rms 1 and the averaged CL according to the value of M when two optical signals are combined with MRC or EGC. For the (n-1)-th combining, the analytical expression of ϕ rms n−1 can be obtained by using the above method recursively. After some straight-forward algebra and simplification, the expression of ϕ rms n−1 can be written as ϕ rms Here, w i , k i,1 and ϕ rms i are defined as Here, A i is the amplitude of the i-th signal to be combined, while w i is the weighting factor used in the i-th combination as showed in Fig. 1. In Eq. (14) θ U p represents the angle between the vector U p and U p−1 given by Thus i p=1 θ U p represents the angle between vector U i and the x-axis (coinciding with U 1 ). The amplitude and noise appearing in the signal obtained after the (p-1)-th D-CBC can be written as By using Eq. (16), (17) recursively, the averaged CL for the combination of N optical signals can be obtained by Therefore, with Eqs. (11)-(18) we can predict ϕ rms n−1 and the averaged CL according to the value of M when N branches of optical signals are coherently combined with MRC or EGC.
As the effort for a complex multiplier is much higher than for an adder, the number of complex multiplications required is used as the figure of merit to measure the complexity of the PAA in this paper. Eq. (1) shows it is equal to M . We note that to remove the OPO, one additional complex multiplication is required for each symbol, but it is neglectable compared with the total complex multiplications per symbol required by the adaptive compensation and demodulation algorithms, and thus is not considered.

III. NUMERICAL SIMULATIONS
In this section, we will investigate the performance of D-CBC in various scenarios and compare the results obtained by analytical and numerical methods. Without loss of generality, we carried out the numerical simulations in a 10Gbps non-return to zero binary phase-shift keying (NRZ-BPSK) transmission system. The receiver setup is similar to that showed in Fig.1. To emulate the optical phase mismatch between different versions of optical signals, a random OPO between −π and π is introduced in each branch of optical signals by an optical delay line. Because the optical signal powers received by different apertures with intervals larger than the atmosphere phase correlation length may include more than 30dB fades due to atmospheric turbulence [11], the dynamic range of the input OSNR is set to be 33dB via ASE noise loading before the receiver. The minimum OSNR is as low as −20dB. Although such low OSNR is unlikely to encounter in optical fiber communication systems, but is of particular interest for the multi-aperture receivers, because with the D-CBC technique the signal may still be correctly demodulated [11]. To mitigate the out-of-band ASE noise, a fourth-order Gauss OBPF with 20GHz 3-dB bandwidth is used before the DCR. To overcome the thermal noise, a LO with 10dBm power is used and the laser frequency offset (LFO) and total laser linewidth are assumed to be 500MHz and 10kHz, respectively.
We first investigate the cases when N = 2 and the prescribed CLs are 0.1 and 0.5 dB, respectively. Fig. 2(a) and (b) show the variations of the allowable OPO estimation RMSE ( ϕ rms 1 ) for CL = 0.1dB obtained by Eq. (10) as a function of the OSNRs (in dB) of the two input signals when EGC and MRC are adopted, respectively. As we can see, for both EGC and MRC, the contour lines are parallel to the line represented by y = x + b, where b stands for the OSNR difference between the two signals. This is because, as can be seen from Eq. (10), for a given averaged CL, ϕ rms 1 is only determined by k 2,1 = A 2 A 1 which actually stands for the OSNR difference between the two signals when we assume that optical amplifiers used in each branch are the same. Meanwhile, for a given OSNR difference, the allowable ϕ rms 1 is larger when MRC is adopted. Fig. 2(c) shows the difference between Fig. 2(a) and (b). For example, when the OSNR difference is 5dB, the allowable ϕ rms 1 for MRC and EGC is 20 and 18 degrees, respectively. The difference is only about 2 degrees. But when the OSNR difference is increased to 22dB, the allowable ϕ rms 1 is 146 and 34 degrees, respectively. The difference is as large as 112 degrees. This is because, in MRC, the signal with a larger OSNR is multiplied with a weighing factor larger than 1, thus reducing the impact of phase misalignment. When CL is large enough, the allowable error may even be as large as 180. In this case, the PAA in the DSP flow can even be skipped. As k 2,1 = 1 w 1 , we can get the following equation from Eqs. (5) and (10) From Eq. (19) we can conclude that when the OSNR difference is larger than 22.4dB, phase alignment is not required when CL = 0.1dB is permitted, which are corresponding to the blank regions in the upper left and lower right corners in Fig. 2(b) and (c). Fig. 2(d) shows the variations of M obtained by Eq. (4) for CL = 0.1dB as a function of the OSNR values when EGC is adopted. As we can see, the contour lines are approximately parallel to the line represented by y = −x + c, where c stands for the sum of x and y. Thus, we can conclude that, for EGC, M is determined by the overall quality of the two input signals to be combined [17]. Fig. 2(e) shows the corresponding results obtained when MRC is adopted. As we can see, the contour lines have a right-angle type structure. The parts above and below the line y = x are nearly perpendicular to the y-axis and x-axis, respectively. It means that M is mainly determined by the quality of the signal with a larger OSNR in MRC. The difference between Fig. 2 Fig. 2(f). The smallest difference occurs in the region around the line y = x. This is because, when the two OSNRs are the same, MRC degenerates into EGC. For the regions further away from y = x, the difference becomes larger and M required by MRC is much smaller than that required by EGC. The difference in Fig. 2(f) is as large as 500 when the two input OSNRs are about −12dB and −19dB, respectively. Fig. 3(a) shows the real ϕ rms 1 obtained by Monte Carlo simulations when M is set to be the corresponding values given in Fig. 2(e) and MRC is adopted. To obtain the results, 500 times of Monte Carlo simulations are carried out for each data point under random ASE noise patterns. As we can see, the contour lines are also approximately parallel to the line y = x +b which is similar to Fig. 2(b). Fig. 3(b) shows the difference between them. As we can see, the difference is lower than 5 degrees in most areas except the upper left and lower right corners where ϕ rms 1 is larger than 80 degrees. This is because of the assumption that MA 1 A 2 l > s and Eq. (4) is not valid when ϕ rms 1 is too large (see Appendix). However, in these two corners, the OSNR difference is very large (about 22dB) and CL is about 0.1dB even when PAA is not carried out. Fig. 3(c) shows the real CL obtained by Monte Carlo simulations when M is set to be the corresponding values given in Fig. 2(e) and MRC is adopted. As we can see, the real CL is close to the expected value (0.1dB) in most areas. The deviation of the simulation from the analytical results is shown in Fig. 3(d). In most areas, the deviation is smaller than 0.01dB except the upper right corner due to the rounding up error (M must be an integer). However, as we can see from Fig. 3(c), in this corner the real CL is smaller than the prescribed value (0.1dB) because of the rounding up operation, and thus, can still satisfy the CL requirement. Fig. 4(a) and (b) show the variations of the allowable ϕ rms 1 and the corresponding M obtained by the analytical expressions when CL is increased from 0.1 to 0.5dB and MRC is adopted. The contour lines in the two figures have similar characteristics as those in Fig. 2(b) and (e). But as the prescribed CL of 0.5dB is larger than that in Fig. 3, the allowable ϕ rms 1 is also larger. It can be proved from Eq. (19) that in this case phase alignment is not required when the OSNR difference between the two optical signals is larger than 15.4dB. For this reason, the blank regions on the upper left and lower right of Fig. 4(a) are larger than those in Figs. 2 and 3. As the tolerance to phase alignment error is larger, the maximal M on the lower left corner is reduced from 5000 (see Fig. 2(e)) to 1000 (see Fig. 4(b)). Fig. 4(c) shows the real CL obtained by Monte Carlo simulations when M is set to be the corresponding values given in Fig. 4(b) and MRC is adopted. The deviation of the real CL from the expected CL of 0.5dB is shown in Fig. 4(d). In most areas, the deviation is smaller than 0.02dB except the upper right corner due to the rounding up error. However, as we can see from Fig. 4(c), in this corner the real CL is actually smaller than the prescribed value (0.5dB) because of the rounding up operation, and thus, can still satisfy the CL requirement.

(d) (M required by EGC) and (e) (M required by MRC) is shown in
The relaxation of the CL from 0.1 to 0.5dB can greatly reduce the computation complexity. Fig. 5 (a) and (b) show the reduction when EGC and MRC are adopted, respectively. As we can see, in most cases, the computation complexity can be reduced by about 70∼80%. The reduction is smaller in the upper right corner because M required in this high OSNR region is already very small. Fig. 5(c) and (d) show the computation complexity reduction when EGC is replaced with MRC and CL = 0.1 and 0.5, respectively. As we can see, the reduction is smaller in the regions around the line y = x at which MRC degenerates into EGC. But when the OSNR difference is above 10dB (represented by white dashed lines), the reduction is about 55% and 60% for CL = 0.1 and 0.5dB, respectively.   numerical results for MRC are presented in Fig. 6(a) for comparison. Here the variation range of M is chosen to adjust CL from 0.1 to 0.5dB. In the numerical simulations, the averaged CL is obtained from 500 times of Monte Carlo simulations with different ASE noise patterns. As we can see, the averaged CL decreases with increasing M . Thus, selecting VOLUME 8, 2020 a proper M value can satisfy different CL and computation complexity requirements. For example, for case 1, the computation complexity can be reduced by about 80% when the allowable CL is relaxed from 0.1 to 0.5dB. As we can see, the analytical and numerical results agree very well for all of the cases. The difference between the real CL and the prescribed CL is shown in Fig. 6(b). The maximal error is lower than 0.05dB when M ≥ 10. Furthermore, as we can see, the error is smaller than 0.01dB at the tails of the curves where the CL is reduced to 0.1dB (see Fig. 6(a)). This is because when CL is smaller, the allowable OPO estimation error is smaller, and thus, the analytical expressions are more accurate.
The combining gains obtained with EGC and MRC as a function of M are shown in Fig. 6(c). As expected, the combining gain increases with M. For cases 1, 2, and 3, the curves flatten out when M is larger than 40, 20, and 20, respectively. For the same M and computation complexity, the MRC combining gain is always larger than that of EGC. The improvement is about 0.1dB for case 1, while it increases to about 0.6dB for cases 2 and 3 because the OSNR difference in the latter two cases is larger.
With the above method, the computation complexity can be minimized in an on-the-fly manner according to the input OSNRs. For the real-time platform, the hardware resources are often limited, so a specific platform may not be able to combine optical signals with arbitrarily low OSNR. In other words, the upper limit of M allowed by a specific real-time platform is limited. Thus, we also investigate the relationship between the input OSNR lower limit and the maximal computation load. Since the actual OSNR may vary in a very large range, we need to find the maximal M required in the worst case when all input OSNRs are equal to the lower limit. It is noteworthy that in this case, MRC degenerates into EGC. Fig. 7 (a) and (b) shows the maximal M required as a function of the OSNR lower limit when N = 2 and 4, respectively. In both figures, the allowable CL is set to be different values ranging from 0.1 to 1dB. The results are obtained with analytical expressions. As we can see, the computation complexity increases with decreasing OSNR exponentially and the smaller the CL is, the steeper the curve. In Fig.7 (a), when the lower limit is 0dB, 8, 2 and 1 complex multiplications are required for CL = 0.1, 0.5 and 1dB, respectively. While when the lower limit is −10dB, 224, 46 and 24 complex multiplications are required, respectively. Fig. 7(b) shows the results obtained when N = 4. When the lower limit of 0dB, 11, 3 and 2 complex multiplications are required for CL = 0.1, 0.5 and 1dB, respectively. While when the lower limit is −10dB, 269, 57 and 30 complex multiplications are required, respectively. We can observe that, for the same CL, the M required is much larger compared with the case with N = 2. This is because there are more stages of combining when N = 4 and thus the allowable CL in each stage is smaller. The increase of M is shown in Fig. 7(c). Fig. 7(d) and (e) show the reduction of computation complexity when CL is relaxed from 0.1 to 1dB when N = 2 and 4, respectively.  The results are obtained from Fig.7 (a) and (b). The lower limit of the input OSNRs is set to be 0, −5, and −10dB. As we can see, the curves are very steep when the prescribed CL is smaller than 0.5dB and begin to flatten out after 0.5dB. The reduction before and after 0.5dB is about 80% and 8%, respectively. Therefore, a prescribed CL of 0.5dB is preferred to balance the computation complexity and performance. It is noteworthy that the curves in Fig. 7(d) and (e) almost overlap with each other, and thus, the reduction in percentage is mainly determined by the prescribed CL.

IV. CONCLUSION
In this paper analytical expressions are deduced to describe the relationship between the computation complexity and CL. The results obtained with the analytical expressions agree well with those obtained by numerical simulations. With these expressions, the computation complexity can be minimized according to the prescribed CL and input OSNRs. The analysis shows that for MRC the computation complexity mainly depends on the quality of the signal with a larger OSNR. In contrast, it depends on the overall quality of the signals for EGC. The PAA computation complexity difference between MRC and EGC is more than 55% when the input signals have a OSNR difference larger than 10dB and the prescribed CL is higher than 0.1dB. The computation complexity increases as an exponential function of the input OSNR lower limit and the smaller the CL, the faster it increases. Furthermore, the maximal computation complexity can be reduced by about 80% when CL is relaxed from 0.1 to 0.5dB, while it is only 8% when CL is relaxed from 0.5 to 1dB. Thus, to balance the competing requirements of high performance and low computation complexity, a CL of 0.5dB is preferred. These results provide useful guidelines to realize practical digital phase alignment on a real-time platform with limited hardware resources.
To realize real-time digital phase alignment and coherent combining, we still face a great challenge to design the DSP chip which is able to deal with multiple branches of high-speed optical signals, simultaneously, under a much lower working frequency. We are now developing the real-time D-CBC system working in a highly parallel and pipe-lined manner based on a FPGA chip and the results will be reported in the near future.

APPENDIX
Here we briefly describe how to deduce Eq. (4) from Eqs. (1) and (2). Fig. 8 shows the statistical distribution of the complex random variable C sum on the complex plane. Without ASE noise, C sum is located at point P. However, when ASE noise is present, C sum will deviate from the point P to a random point R on the complex plane. Here r represents the distance from R to P, while d represents the distance from point R to the line OP. The angle between OP and the x-axis is equal to the OPO (ϕ 1 ), while the angle between OP and OR represents the OPO estimation error ( ϕ 1 ). Let's assume that MA 1 A 2 r > d and thus ϕ 1 can be estimated by Here,θ is the angle between r and OP. The RMS of r is equal to [17] r rms = 2M A 2 1 σ 2 2 + A 2 2 σ 2 1 + 2σ 2 1 σ 2 2 .
Thus, d rms can be written as d rms = ∞ 0 2π 0 r 2 sin 2 θ P θ dθ P r d r Here, P θ and P r stand for the probability distribution functions of θ and r, respectively. It is noteworthy there is a mistake in the deduction of the above expression in Ref. [17]. The constant factor should be 1/