Simplified Soft-Output Direct Detection FTN Algorithm for 56-Gb/S Optical PAM-4 System Using 10G-Class Optics

In this paper, a simplified soft-output direct detection faster than Nyquist (SO-DD-FTN) algorithm is proposed to eliminate severe inter-symbol interference in optical interconnects. The feasibility of the proposed algorithm is verified in a C-band 56-Gb/s 4-ary pulse amplitude modulation (PAM-4) system using 10G-class optics, and the 3-dB system bandwidth is only 5.5 GHz. Compared to conventional SO-DD-FTN, the simplified SO-DD-FTN not only reduces 90% multiplication complexity and 80% addition complexity, but also has almost no performance degradation. Experimental results show that, for the back-to-back case, the achieved receiver sensitivity using simplified SO-DD-FTN has 4.5-dB improvement compared with conventional direct detection faster than Nyquist (DD-FTN). As for 20-km standard single-mode fiber (SSMF) transmission application, since the system suffers from both optical/electronic bandwidth limitation and dispersion-induced power fading, the conventional DD-FTN cannot achieve the bit error rate (BER) threshold of 10−3, while the simplified SO-DD-FTN can achieve the BER threshold at received optical power of −14.5 dBm. In conclusion, the proposed simplified SO-DD-FTN algorithm achieves superior performance with low computational complexity and has the potential for applying to optical interconnects.


I. INTRODUCTION
With the rapid development of bandwidth-thirsty services such as 4K/8K high-definition video, interactive games and social media applications, the traffic of data center is growing dramatically and high-speed short-reach optical interconnects are required [1]- [3]. Next-generation Ethernet is envisioned to be 800 G and/or 1.6 T in 5 years [4].
The associate editor coordinating the review of this manuscript and approving it for publication was San-Liang Lee .
To realize the economical and high-capacity optical interconnects, the optical/electronic bandwidth limitation is a main challenge. On this account, advanced modulation formats, including orthogonal frequency-division multiplexing (OFDM), carrier-less amplitude and phase (CAP) and 4-ary pulse amplitude modulation (PAM-4), are considered as potential schemes in research and commercial fields [5]- [7]. Among them, PAM-4 has been widely studied and adopted for the 200-Gb/s and 400-Gb/s standard IEEE P802.3bs Task Force [8]- [12]. Single-wavelength 50-Gb/s/λ is a main option to realize 200-Gb/s and 400-Gb/s optical interconnects by 4 × 50-Gb/s and 8 × 50-Gb/s wavelength division multiplex (WDM) systems [13]- [16]. Due to the cost-sensitive characteristic, 50-Gb/s/λ short-reach optical interconnects tend to adopt re-using off the shelf low-cost but bandwidth-limited 10G-class optics [16]- [20]. However, inter-symbol interference (ISI) caused by insufficient system bandwidth is one of main channel distortions for high-speed transmission systems with low-cost and low-bandwidth optics. Therefore, digital signal processing (DSP) algorithms have been widely studied to eliminate severe ISI.
Recently, the high-speed transmission systems with bandwidth-limited devices have been investigated using the advanced DSP algorithms [21]. Feed-forward equalization (FFE) is a commonly used DSP algorithm to eliminate ISI, whereas FFE enhances the in-band noise resulting in the bit error rate (BER) performance degradation [22]. Hence, direct detection faster than Nyquist (DD-FTN) algorithm is proposed, which consists of FFE, post-filter and maximum likelihood sequence estimation (MLSE) with hard-decision output [23]. The post-filter is applied after FFE to suppress enhanced noise, and MLSE based on Viterbi algorithm is used to further eliminate residual ISI [24]. However, the performance with soft-decision output is better than that with hard-decision output in terms of decoding gain [25]. To acquire additional decoding gain by soft-decision forward error correction and achieve better system performance, a soft-output faster than Nyquist (SO-FTN) algorithm is proposed in coherent detection systems, in which BCJR algorithm with soft-output information is used after post-filter [26]- [28]. Referring to BCJR in SO-FTN algorithm, MLSE in DD-FTN algorithm can be modified to BCJR for obtaining better performance in direct detection systems, and the modified joint algorithm can be called soft-output direct detection faster than Nyquist (SO-DD-FTN). However, the BCJR algorithm involves a mass of multiplication in calculating the posterior probability, while a mass of multiplication results in high computational complexity. If the SO-DD-FTN is applied to direct detection systems, because of the consideration of low-cost and low-power-consumption characteristics in short-reach direct detection optical interconnects, high computational complexity of SO-DD-FTN is an urgent issue to be solved.
In this paper, to the best of our knowledge, a simplified SO-DD-FTN algorithm, which consists of FFE, post-filter and max-log-BCJR, is first proposed and experimentally demonstrated. The main contributions of this paper are as follows: • A simplified SO-DD-FTN algorithm is proposed to eliminate severe ISI for application in low-cost and low-power-consumption optical interconnects. Without performance degradation, the multiplication complexity and addition complexity of simplified SO-DD-FTN reduce by 90% and 80% compared with conventional SO-DD-FTN, respectively.
• We experimentally demonstrate a C-band 56-Gb/s PAM-4 system using 10G-class optics to verify the feasibility of proposed simplified SO-DD-FTN algorithm. Moreover, detailed comparisons of simplified SO-DD-FTN with FFE and DD-FTN are made in terms of performance for BTB case and 20-km standard single-mode fiber (SSMF) transmission. The remainder of this paper is organized as follows. In Section II, the principles of conventional and simplified SO-DD-FTN for PAM-4 systems are introduced, and the computational complexity is analyzed. In Section III, the experimental setup for a 56-Gb/s PAM-4 system is presented, and experimental results with different algorithms are discussed. Finally, we conclude this paper in Section IV.

II. PRINCIPLE AND COMPUTATIONAL COMPLEXITY
The flow diagram of DSP is shown in Fig. 1. At the transmitter, u t is the transmitted pseudo-random binary sequence (PRBS), and N 2 is the length of PRBS. After 1/2 convolutional code, the length of coded bits is 2N 2 , and b i is convolutional coded bit. After puncture, the length of coded bits is 2N 1 . and −1 denote the interleaver and de-interleaver, respectively. b j is the bit after interleaver, and c k is the PAM-4 symbol. At the receiver, conventional or simplified SO-DD-FTN algorithm is used to eliminate ISI. The algorithm structure is shown in Fig. 1(a), which consists of FFE, postfilter, and (simplified) BCJR (i.e. BCJR or simplified BCJR) with turbo iteration. The system performance can be improved by iterative exchanges of soft extrinsic information between (simplified) BCJR equalizer and (simplified) BCJR decoder, where the soft information is the log-likelihood ratio (LLR).

A. CONVENTIONAL SO-DD-FTN
For conventional SO-DD-FTN algorithm, the BCJR algorithm is adopted, which requires the accurate channel information. Post-filter is a method of providing channel information without a large number of training sequences by shaping the unknown channel response into a known intermediate state. The post-filter has three functions: a) suppress the in-band noise enhanced by FFE; b) shape the signal with a known and short-memory-length channel response; c) provide the known channel information for BCJR equalizer. In our work, the post-filter is designed by partial response. The transfer function of the post-filter in z-transform is mathematically expressed as where L denotes the number of taps. The output of post-filter is related to the current symbol and (L −1) previous symbols. The tap coefficient of post-filter is [h 0 , h 1 , · · · , h L−1 ], and 0 ≤ h ≤ 1. Generally, h 0 is equal to 1. If [h 1 , · · · , h L−1 ] is a zero matrix, the post-filter is equivalent to an all-pass filter. After FFE and post-filter, the signal is first sent to BCJR equalizer in the iterative loop. For PAM-4 signal with two bits per symbol, the output of BCJR equalizer can be expressed by LLR of the most significant bit (MSB) and the least significant bit (LSB), which are expressed by where b is the transmitted coded bits after puncture and interleaver, and the elements consist of 0 and 1. c k is the transmitted PAM-4 symbol at time k with the elements of {−3, −1, 1, 3}. y is the received symbol sequence. P(c k |y) is a crucial parameter, which is the probability of the transmitted symbol c at time k in the condition of the received symbol y. P(c k |y) can be derived as [29] P(c k |y) = α k−1 s · γ k s , s · β k (s) In Eq. (4), s and s represent the state of the trellis diagram at time k − 1 and k, respectively. α k−1 s means the forward probability, which is expressed as β k (s) means the backward probability, which is expressed as γ k s , s denotes the branch transition probability,which is given by where x k is the constellation. h n is the channel information provided by post-filter. Pr (c k ) is priori probability, which is obtained by the prior information L D pr b j in iterations. The extrinsic information L ext b j is then calculated in iterative loops as follows [30].
After de-interleaver and de-puncture, the probability of coded bits P(b ) is calculated from L deint ext (b i ) and sent to the BCJR decoder. BCJR decoder has an only difference from BCJR equalizer in the calculation of branch transition probability γ t . For BCJR decoder, Afterwards, one output L D pos b i of BCJR decoder is fed back to be as the prior information of BCJR equalizer in the iterative loop. After all iterations, the other output L D pos (u t ) of BCJR decoder is sent to hard-decision module. The process of hard decision is shown as Eq. (10).ũ t denotes the output of hard decision.ũ Take a PAM-4 system with one-memory-length channel for example (M = 4, L − 1 = 1), the iterative implementation procedure in conventional SO-DD-FTN is described in Table 1. The trellis diagram for BCJR equalizer is shown as Fig. 2. α k , β k , and γ k can be calculated according to the trellis diagram. The number of states is equal to M L = 16, and the states are denoted by S0∼S15. As seen from Table 1, the calculations involve a mass of matrix multiplications, and the computational complexity is closely related to the modulation order M and the taps number of post-filter L. Therefore, the performance gain from conventional SO-DD-FTN is at the cost of both more logic gate resource and power consumption, which limits its application for short-reach optical interconnects.

B. SIMPLIFIED SO-DD-FTN
To reduce the power consumption of DSP algorithms, maxlog-BCJR is employed to simplify SO-DD-FTN algorithm. The principle is that the computational process is transferred to the logarithmic domain, so the multiplications are converted to additions. Also, the Jacobian logarithm is utilized so that the operations of logarithm and exponent are converted to maximum operations.
The forward probability is expressed as Similarly, the backward probability is expressed as The output LLR used Jacobian logarithm is expressed as It is worth noting that the forward recursion α k (s) of Eq. (11) is equivalent to the calculation of path metric in Viterbi decoder of DD-FTN. Apart from FFE and post-filter, the multiplication complexity of DD-FTN derives from the calculation of γ k s , s . The multiplication complexity is M 2L . For max-log-BCJR in simplified SO-DD-FTN, besides calculating the forward recursion, the backward recursion β k (s) is employed and calculated in the backward direction by the same way, while the use of backward recursion is one of the main reasons that simplified SO-DD-FTN has better performance than conventional DD-FTN. Similar to DD-FTN, the multiplication complexity of simplified SO-DD-FTN derives from the calculation of γ k s , s . As shown in Eq. (11) and Eq. (12), the addition complexity of β k (s) is same as that of α k (s). Therefore, the simplified SO-DD-FTN has the same multiplication complexity as DD-FTN, and the addition complexity is approximately twice that of DD-FTN.
For conventional SO-DD-FTN, the multiplication complexity is from the calculations of α k (s), γ k s , s , β k (s), and P (b k |y) as shown in Table 1. We supposed that an exponent operation is treated as a multiplication. The multiplication complexity is (M 2L + 2 * M 2L + M 2L + M * 2 * M 2L ) = (4 + 2M ) * M 2L . Therefore, the multiplication complexity of conventional SO-DD-FTN is (4 + 2M ) times greater than DD-FTN.
Apart from FFE and post-filter, the computational complexity of conventional DD-FTN, conventional SO-DD-FTN, and simplified SO-DD-FTN are compared in Table 2. For the PAM-4 system with one-memory-length channel(M = 4, L − 1 = 1), thanks to the transformation of multiplication to addition, the simplified SO-DD-FTN can reduce 90%((3072 − 256)/3072 ≈ 90%) multiplication complexity and 80%((2656 − 512)/2656 ≈ 80%) addition complexity compared with conventional SO-DD-FTN, respectively. Afterwards, the performance of these algorithms will be compared and analyzed in the next section.

III. EXPERIMENTAL SETUP AND RESULTS
The experimental setup is described in detail, and the experimental results are shown and analyzed in this section.  employed according to Ref. [31] to improve coding efficiency, and 20% overhead is used [32]- [34]. The coded bits are mapped into PAM-4 symbols after interleaver, which is used to avoid burst errors. The frame structure consists of 500 synchronization symbols, 1000 training sequences, and 40360 payload symbols. The signal is up-sampled and then shaped by a digital root-raised cosine (RRC) filter, whose different roll-off factors will be analyzed in the following subsection for the optimal performance. Afterwards, the resampling is employed for matching the digital to analog converter (DAC) sampling rate, and the data is uploaded into the 64-GSa/s DAC with 8-bit resolution and 3-dB bandwidth of 15 GHz. Then, the electrical amplifier (EA) is applied to amplify the electrical PAM-4 signal to ∼ 3-V peak-to-peak voltage, and the external cavity laser (ECL) with 1550.12-nm center wavelength is used to generate the optical carrier. The electrical PAM-4 signal is modulated on the optical carrier by a 3-dB bandwidth of 10-GHz Mach-Zehnder modulator (MZM, KG-SN-357807K) with +5-V bias voltage, which is to ensure the modulation in the linear regions as shown in Fig. 3(a). The generated 56-Gb/s optical PAM-4 signal with 6-dBm launch power is sent into 20-km SSMF.

A. EXPERIMENTAL SETUP
At the receiver, a variable optical attenuator (VOA) is applied for adjusting the received optical power (ROP). Afterwards, the optical signal is sent to an avalanche photodiode (APD) with a trans-impedance amplifier (TIA), which has the 3-dB bandwidth of ∼ 7 GHz. The electrical signal after APD is sampled by a 100-GSa/s digital phosphor oscilloscope (DPO) with 20-GHz bandwidth (Tektronix MSO72004C). Normalized power spectrums and transfer functions for BTB and 20-km SSMF are illustrated as Fig. 3(b) and Fig. 3(c). Due to device bandwidth limitation, the overall system bandwidth is about 16 GHz.
The equivalent 3-dB bandwidth of the BTB and 20-km system are approximately 5.5 GHz. For the BTB case, the power at the highest-frequency region is approximately 38 dB lower than that at the lowest-frequency region owing to bandwidth-limited experimental devices. As for 20-km SSMF transmission, the high-frequency distortion is more severe due to the power fading caused by fiber dispersion. Finally, the off-line DSP is performed on the sampling electrical signal by MATLAB, including resampling, matched filter, time synchronization, simplified SO-DD-FTN and BER counting.

1) OPTIMIZATION OF PULSE SHAPING IN OPTICAL BTB CASE
Pulse-shaping filter with different roll-off factors η lead to the signal with different bandwidth, and it can be used to alleviate the effect of device bandwidth limitation on the signal. The relationship between signal bandwidth and roll-off factor is expressed as where BW and R s denote signal bandwidth and baudrate, respectively. The range of roll-off factor η is from 0 to 1. When the roll-off factor is equal to 0, the output of filter is the ideal Nyquist signal. Meanwhile, pulse-shaping filter also causes a variation of peak to average power ratio (PAPR), which is closely related to modulation efficiency. Therefore, the optimization of roll-off factor is a tradeoff between device bandwidth limitation and modulation efficiency. Fig. 4 illustrates complementary cumulative distribution function (CCDF) curves of PAPR for different roll-off factors η of pulse-shaping filter. PAPR is not monotonic decrease with the increase of roll-off factor due to the use of a RRC filter rather than a raised cosine (RC) filter for pulse-shaping. At a probability of 10 −3 , the PAPR of the ideal Nyquist signal is approximately 9.8 dB. The PAPR reduces to ∼ 6.5 dB with the roll-off factor increasing. Until the roll-off factor is equal to 0.4, the PAPR tends to be stable and decreases by 3.3 dB.
Furthermore, Fig. 5 shows the pre-BER versus roll-off factor η with the ROP of −11/−13/−15 dBm using a 31-taps linear equalizer. The roll-off factor is 0.4 when the BER is lowest. The reason is that when the roll-off is less than 0.4, the low modulation efficiency caused by high PAPR dominates in performance degradation. And when the roll-off is greater than 0.4, the device bandwidth limitation dominates in performance degradation. Therefore, the optimized roll-off factor of 0.4 is adopted.

2) EXPERIMENTAL RESULTS
In our experiment, the pre-BER performance of simplified SO-DD-FTN is demonstrated and compared with FFE, conventional DD-FTN, as well as conventional SO-DD-FTN in the BTB case and 20-km SSMF transmission.
In the BTB case, the tap coefficient of post-filter is analyzed. Table 2 indicates that the computational complexity increases exponentially with the increase of tap number. Therefore, in our work, the tap number L is set as 2 after taking account of computational complexity. The tap coefficient is [1 h 1 ]. Fig. 6 shows the pre-BER performance by scanning h 1 when the ROP is −14/−15/−16 dBm, respectively. For all of three cases, the best performance can be achieved when h 1 is 0.7. Afterwards, the pre-BER performance of  different algorithms are analyzed and compared as shown in Fig. 7. The solid node on the solid line represents the performance using conventional SO-DD-FTN, and the hollow node on the dashed line represents the performance using simplified SO-DD-FTN. Fig. 7 indicates that simplified SO-DD-FTN using max-log-BCJR can achieve almost the identical performance with conventional SO-DD-FTN using BCJR in the severe-ISI system. However, [35]- [37] show that max-log-BCJR algorithm has worse performance than BCJR due to making an approximation and is not suitable for severe-ISI systems. The reason why the results of this paper are different from the references is that joint FFE, post-filter and max-log-BCJR algorithm (i.e. simplified SO-DD-FTN algorithm) is applied to the severe-ISI system in this paper. FFE is used to eliminate severe ISI, and 2-tap post-filter is applied to suppress enhanced noise by FFE and shape the unknown channel response into a known and one-memory-length channel response. The max-log-BCJR algorithm is then used to address the known slight ISI caused by 2-tap post-filter. Therefore, the simplified SO-DD-FTN is suitable for the bandwidth-limited system with severe ISI. Besides, the ROP is −10 dBm by FFE with the taps of 31, where the recursive least squares (RLS) is adopted for quick convergence. The receiver sensitivity using DD-FTN has 3.5-dB improvement compared with FFE, while the simplified SO-DD-FTN can achieve better performance. The receiver sensitivity of simplified SO-DD-FTN without iterations is 1.5-dB greater than that of DD-FTN. When the simplified SO-DD-FTN with three iterations is used, the ROP is −18 dBm. The achieved received sensitivity improves 8 dB and 4.5 dB compared with FFE and DD-FTN, respectively. The experimental results illustrate that the simplified SO-DD-FTN algorithm can effectively eliminate severe ISI caused by optical/electronic bandwidth limitation.
For 20-km SSMF transmission application, the tap coefficient of post-filter is also analyzed. As shown in Fig.8, h 1 = 0.95 can achieve the best performance. Because the post-filter is applied to match the channel response as far as possible, the coefficient h 1 of post-filter is larger than that of BTB case when the high-frequency distortion is more severe after 20-km SSMF. Afterwards, the pre-BER performance of simplified SO-DD-FTN is analyzed for 20-km SSMF transmission in Fig. 9. Consistent with the BTB case, simplified SO-DD-FTN has almost the same performance as conventional SO-DD-FTN. Compared to FFE and DD-FTN, though simplified SO-DD-FTN without iterations has slight performance improvement, it still cannot achieve the BER threshold of 10 −3 . The performance degrades due to the combination of both optical/electronic bandwidth limitation and power fading caused by fiber dispersion. The equivalent 3-dB bandwidth of the system is only ∼ 5.5 GHz, and high-frequency distortion is more severe. By means of iterative equalization, the simplified SO-DD-FTN with one iteration and two/three iterations can achieve the BER threshold of 10 −3 at the ROP of −12/−14/−14.5 dBm, respectively. The experimental results depict that the simplified SO-DD-FTN algorithm can effectively compensate the channel distortions caused by both optical/electronic bandwidth limitation and chromatic dispersion of optical fiber.

IV. CONCLUSION
In this paper, a simplified SO-DD-FTN algorithm is proposed to compensate the channel distortions for low-cost and low-power-consumption optical interconnects. We analyze the computational complexity of proposed algorithm, which reduces the multiplication complexity and addition complexity by 90% and 80% compared with conventional SO-DD-FTN. Afterwards, the simplified SO-DD-FTN algorithm is experimentally demonstrated in a C-band 56-Gb/s PAM-4 system using 10G-class optics. Experimental results indicate that the system performance using conventional and simplified SO-DD-FTN is almost identical. The simplified SO-DD-FTN has better performance than conventional FFE and DD-FTN algorithm, and more performance gains can be obtained relying on multiple iterations. Compared to conventional FFE and DD-FTN, for the BTB case, the receiver sensitivity using simplified SO-DD-FTN has 8-dB and 4.5-dB improvement, respectively. For 20-km SSMF transmission, simplified SO-DD-FTN algorithm also shows superior performance, which can achieve the BER threshold of 10 −3 at the ROP of −14.5 dBm. Therefore, we believe that this work is promising in short-reach optical interconnects.