Data-Driven Method for Nonlinear Optical Fiber Channel Modeling Based on Deep Neural Network

Recently, data-driven fiber channel modeling methods based on deep learning have been proposed in optical communication system simulations. We investigate a new data-driven method based on the deep neural network (DNN) to model the nonlinear fiber channel with the characteristics of attenuation, chromatic dispersion, amplified spontaneous emission noise, self-phase modulation (SPM), and cross-phase modulation (XPM). Demonstration in multiple dimensions, including constellations, optical waveforms, spectra, and the normalized mean square error, shows that DNN can approach the transfer function of the fiber channel accurately. Additionally, the DNN shows good generalization for modulation formats and wavelength schemes. Besides, the time complexity of DNN-based method for modeling nonlinear fiber channel is reduced significantly (96.5%) compared to the conventional model-driven method, which is based on the split-step Fourier method. This work demonstrates that the DNN can model accurately the nonlinear fiber channel that takes account of both SPM and XPM. Therefore, it can contribute to the application of data-driven methods in modern optical communication system simulations and designs.


I. INTRODUCTION
S IMULATIONS are vital in optical communication system designs [1], [2]. Conventional optical communication system simulations are based on a series of blocks that are characterized by rigorous numerical models, including a laser, modulator, fiber channel, optical amplifier, filter, detector, and analyzer. [3] Therefore, for the model-driven method, it is a systematic engineering task that requires expert knowledge to construct a comprehensive and complete optical communication system simulation. This is why business optical communication simulation software are usually non-open and expensive. Furthermore, the computation complexity of conventional simulations can be very high due to the nested-function structure and the repeated iterative operations, especially the split-step Fourier method (SSFM) which is performed to model the fiber channel by solving the nonlinear Schrödinger equation (NLSE) [1], [4], [5]. Therefore, a method for optical fiber channel modeling with relatively low computation complexity is quite valuable. Deep learning (DL) is a powerful tool that has dramatically improved the state-of-the-art in many domains, such as speech recognition, visual object recognition, object detection, and drug discovery [6]. DL has also been widely applied in the field of optical communication [7], including optical performance monitoring [8], [9], nonlinearity compensation [10], [11], equalizer [12]- [14], predistortion [15], [16], software-defined networking [17], [18], and photonic device design [19], [20]. Recently, some methods that are based on DL have been proposed for optical fiber channel modeling. These methods can be divided into two categories. One is the principle-driven method [21]- [23], and the other is the data-driven method [1], [2]. The principledriven method views the fiber modeling problem as a partial differential equation solving problem and fully considers the prior knowledge including the essential mathematical equations, physical theories, and the corresponding constraint conditions of the target problems, meaning that it requires much human expertise. In addition, it is usually applied in the pulse evolution task rather than the signal transmission task which is the focus of this work [23]. Thus, the data-driven method that is designed for the signal transmission task is taken into our consideration in the following content. The data-driven method which regards the fiber modeling problem as a data regression task is proposed based on the fact that the deep neural network (DNN) can be considered as a universal approximator for both linear and nonlinear functions [24]. DNN can approximate the channel transfer functions after being trained on the data set which is composed of the channel input and output signals. The data-driven method does not require complex mathematical theories along with expert knowledge and has a distinct advantage in computation complexity for the fact that no complicated operation, such as fast Fourier transform (FFT), is involved.
The data-driven fiber channel modeling method based on DL is first introduced by Danshi Wang [1]. In their work, bidirectional long short-term memory (BiLSTM) neural networks are built to model optical fiber channels for on-off keying (OOK) and pulse amplitude modulation 4 (PAM4) signals. The BiL-STM has learned the approximate transfer function of the fiber channel, and the computation time of fiber channel modeling by their data-driven method is reduced by 80% compared with the model-driven method. However, the modulation format is limited to OOK and PAM4, meaning that no advanced modulation format is studied. In [2], Hang Yang builds a generative adversarial network (GAN) to learn the distribution of the fiber channel transfer function. In their investigation, many channel effects have been taken into consideration, including attenuation, chromatic dispersion (CD), self-phase modulation (SPM), and amplified spontaneous emission (ASE) noise which is induced by erbium-doped fiber amplifier (EDFA). GAN has successfully learned the accurate transfer function of the fiber channel and reduces the complexity of the fiber channel modeling remarkably. However, with only SPM involved, other nonlinear effects, such as cross-phase modulation (XPM) which is a quite important impairment during transmission in the modern optical communication system, have not been studied. Apart from this, GAN can be difficult to train and it is often observed in practice that gradient descent based GAN optimization does not lead to convergence [25].
In this research, we choose the DNN built with fully connected neural networks, which can be trained much easier than GAN, to model the nonlinear optical fiber channel with the characteristics of attenuation, CD, ASE caused by EDFA, SPM, and XPM. The capability of DNN to approach the transfer function of fiber channel is demonstrated in multiple dimensions, including constellations, optical waveforms, spectra, and the normalized mean square error (MSE). Results show that the DNN can model the nonlinear fiber channel accurately. Besides, after being trained on the training data set that contains only 16 quadrature amplitude modulation (QAM) symbols, the DNN-based fiber model made a precise prediction of the fiber channel output signals with other modulation formats, such as quadrature phase-shift keying (QPSK), indicating that the DNN has a good generalization ability for modulation formats. The analysis also shows that the DNN generalizes well for wavelength schemes. In addition, the time complexity of the DNN-based method is analyzed as well, indicating that the computing time is relatively reduced by 96.5% compared with the SSFM-based method. Therefore, this method can be an auxiliary tool for future optical communication system simulations.

II. OPTICAL FIBER COMMUNICATION SYSTEM STRUCTURE
Throughout this work, we focus on several impairments in the optical fiber channel. Two effects of Kerr nonlinearity, SPM and XPM effects, are among our consideration. SPM is an intrachannel nonlinear effect, while the inter-channel XPM effect involves two-channel interactions [5]. Considering that the nonlinear interference contributions of multiple wavelength division multiplexing (WDM) channels can be added up independently, the analysis is performed with only two channels [26]. [27]. A WDM system, whose setup is shown in Fig. 1, is simulated to demonstrate the capability of DNN to model the nonlinear fiber channel. The system consists of transmitters, (de)multiplexers, an optical fiber channel, and receivers. The optical fiber channel is affected by the attenuation, CD, ASE noise, SPM, and XPM. The transmitter is assumed to use 16QAM, so all the symbols and samples in this system are complex-valued. In the transmitter, following the modulation is five times up-sampling and root raised cosine (RRC) filter which is to shape the signal. Then power normalization is used to control the power of the optical signal, and for simplicity, the power values of the two channels are set to be the same. The laser phase noise, which is modeled by a wiener process [28], is loaded before the two optical carriers being multiplexed into the optical fiber channel that is composed of standard single-mode fiber (SSMF) and EDFA. The main fiber channel parameters are shown in Table I.
When two optical fields propagate simultaneously inside the fiber, they interact with each other through XPM. On condition that the wavelengths of the two optical beams are so close to each other that the group-velocity mismatch is negligible (i.e., vg1≈vg2), such as in DWDM systems, the propagation of two optical fields through a single-mode fiber can be governed by the following set of two coupled NLSE [5]: (1) where A 1 and A 2 are the complex envelope of the slowly varying optical fields and z is the propagation distance. Parameter α, β 21 , β 22 , and γ represent the propagation attenuation, dispersion of the two wavelengths, and nonlinear coefficient, respectively. SSFM is the most common numerical approach to solving the NLSE [1], [2], [5]. It can be expressed by: whereD i andN i denote the linear operator and nonlinear operator of ith (i = 1, 2) optical field, respectively; h denotes the step size. After each span, an EDFA is applied to compensate for the attenuation, which would introduce ASE noise simultaneously.
In the receiver, after demultiplexing, the laser phase noise is loaded, following which a matched RRC filter is used, and then multichannel digital backward propagation (DBP) algorithm is performed to compensate for the CD, SPM, and XPM. Then down-sampling is performed and carrier phase recovery (CPR) is used to compensate for the laser phase noise, which is followed by demodulation. Note that, the channel transfer function has no analytical expression, therefore the differences between the nonlinear channel modeled by SSFM and DNN cannot be compared directly. As a result, similar to the work of Hang Yang et al. [2], the DBP compensation is utilized in this work as an auxiliary verification method. Let's denote the transfer functions of the nonlinear fiber channel modeled by SSFM and DNN as f() and g() respectively. DBP compensation can be regarded as the inverse function of the channel transfer function f −1 (). If the DBP compensated output by DNN is similar to the channel input, g() and f −1 () can be regarded as a pair of inverse functions, meaning that g() and f() can be regarded as equivalent, which proves that the DNN has learned the linear and nonlinear characteristics of the fiber channel and can estimate the distribution of the channel transfer function.

III. DEEP NEURAL NETWORK ARCHITECTURE
A DNN is defined and trained to model the nonlinear fiber channel. As is shown in Fig. 2, the input vector of DNN is defined to improve modeling accuracy and flexibility, which is similar to the definition of the condition vector in [2]. First of all, two channels are simulated in the optical communication system, therefore both of them must be taken into consideration. Secondly, as for each channel, considering inter-symbol interference (ISI) caused by CD, the input vector must include not only the current transmitted symbol but also the preceding symbols and subsequent symbols. The number of the preceding symbols and subsequent symbols, which is denoted by n in Fig. 2, is set to eight for each span in this paper, after taking account of the ISI and transmission rate of the signal. Thirdly, considering that the up-sampling rate is 5 in simulation, each symbol in the input vector of DNN includes five samples. The real part and imaginary part of the samples, i.e., the in-phase (I) and quadrature (Q) parts of the fiber channel input signals are concatenated so that the input vector of DNN can be real-valued, as is shown in Fig. 2. Finally, the optical launch power is also considered and its value is appended at the rear of the input vector of DNN. The length of the output vector of DNN is 20, meaning that one symbol per channel is generated by DNN.
The architecture of the DNN is shown in Fig. 2 as well. The input vector must be normalized before being fed to the neural network to avoid slowing down the convergence of the model and control the average value of the input around 1 [2]. Adam optimizer is used and the learning rate is set to 0.001. The loss function is MSE Loss. He normalization is applied to initialize the weights of the linear layers, which would help with the convergence of deep models with ReLu-like activation functions [29]. All the biases are initialized to 0. The batch size is set to 500.

IV. DEMONSTRATION AND RESULTS
In the simulation, dispersion and nonlinearity coefficient are kept constant, while transmission distance and power are changed to control the dispersion and nonlinear intensity. To prove the validity of the simulation by DNN, a comparison between the optical signals generated by SSFM-and DNN-based method is made in multiple dimensions, including constellations, optical waveforms, spectra, and the normalized MSE. The constellations after DBP compensation are plotted to present the characteristics of dispersion and nonlinearity, which can verify the accuracy of the channel transfer function modeled by DNN [2]. Optical waveforms and spectra are to demonstrate the accuracy of the simulation by DNN in the time-and frequency-domain. The normalized MSE is to evaluate the similarities between the two simulation methods quantitatively. MSE denotes the average of squares of the amplitude errors, i.e., the average squared deviation between the amplitude values of SSFM-generated and DNN-generated waveforms or spectra. Considering that optical communication systems were simulated with different optical launch powers and that the absolute MSE may increase with the power values, the normalized MSE rather than the absolute one is adopted [1]. The normalized MSE is defined as below: where m is the sample size,ȳ is the output label signal, andȳ is the output signal generated by DNN. As is mentioned in [1], [2], the acceptable upper limit of MSE_nor is set to 0.02. To model the optical fiber channel by DNN, the data set is built by collecting the channel input and output when simulating via SSFM. The input vector. i.e., the training sample for DNN, is generated from the data of fiber input. Then it is fed to DNN, aiming to get an output that approximates the corresponding optical signals of fiber output. The training data set size is 1 × 10 6 .

A. The Nonlinear Fiber Channel Modeling Capability of DNN
We first studied the capability of DNN to model the nonlinear fiber channel. As a demonstration, WDM systems whose setup is shown in Fig. 1 were simulated. Then we trained DNNs with the corresponding data sets. For comparison purposes, BiLSTM, the DL algorithm which has been employed in single-channel modeling [1], is also implemented by PyTorch. The input size of BiLSTM is 20 (5 × 2 × 2), which means a symbol of channel 1 is concatenated by another symbol of channel 2. Then every 17 adjacent symbols constitute a group. These groups are sent into BiLSTM in chronological order. The hidden size is 100. Finally, the output of BiLSTM is mapped by a fully connected layer with 20 neurons into 2 symbols, one of which for channel 1 and the other for channel 2.
The losses of DNN and BiLSTM during training are shown in Fig. 3. As is implied in Fig. 3, the losses of DNN and BiLSTM both decrease rapidly. However, the DNN achieves a faster convergence rate. This result is quite different from that of Wang's work, in which the BiLSTM achieves a faster convergence rate [1]. This may attribute to the unsatisfying capability of BiLSTM to learn the XPM effect of fiber channel.
After DNN and BiLSTM are trained, the constellations after DBP compensation of SSFM-generated signals and DNN/BiLSTM-generated signals are plotted for comparison, as is shown in Fig. 4. Note that, the system transmission distance is 80 km. The optical launch power of the two channels is 4 dBm. The format for both channels is 16QAM. The constellations of DNN-generated signals are very similar to that of SSFM-generated signals for both channels, meaning that DNN can mimic the effect of CD, SPM, XPM, and EDFA ASE noise to approach the transfer function of the nonlinear fiber channel. However, the constellations of BiLSTM-generated signals are not very similar to that of SSFM-generated signals. Thus it can be concluded that BiLSTM can not approach the transfer function of the nonlinear fiber channel which takes account of both SPM and XPM as well as DNN.
To demonstrate more diversely and accurately, Fig. 5 shows amplitudes of the optical waveforms and spectra of SSFMgenerated signals and DNN-generated signals. Note that, the system transmission is 240 km. The optical launch power is 0 dBm. The format for both channels is 16QAM as well. Considering that the two channels have the same transmission distance, optical launch power, and modulation format, we just present the results of one channel, as is shown in Fig. 5. It can be observed that the two waveforms are virtually identical from an overall perspective. Even after zooming in, they are still substantially identical. The MSE_nor can measure the distance between the two waveforms or spectra, therefore, we calculated the MSE_nors for quantitative analysis. As for channel 1, MSE_nor in the time-domain is 0.0032, which is the same as that of channel 2. The two MSE_nors are much less than the upper limit of 0.02. The situation is very similar for the optical spectra. Fig. 5 also shows the spectra of SSFM-generated signals and DNN-generated signals within 30GHz which is the same as the transmission baud rate. The two spectra are nearly the same in both overall view and enlarged view. And the MSE_nor in the frequency-domain of channel 1(2) is 0.0004(0.0005), which is quite less than 0.02 as well. The waveforms (or spectra) of SSFM-generated signals and DNN-generated signals overlap highly, meaning that the DNN has learned the time-and frequency-domain characteristics of signals. That is to say, the fiber channel has been modeled accurately by the DNN.

B. Generalization of DNN
In this section, the generalization of the trained DNN is also studied. The ability of DNN to generalize means how good the trained neural network is when applying the learnt information from training data sets to make accurate predictions on new, previously unseen data. This is quite important for the practical applications of fiber channel modeling. We analyzed the generalization of our model for the input with different modulation formats and wavelength schemes.
The training data set contains only 16QAM symbols. Therefore, other modulation formats, including QPSK, 8PSK, and 8QAM, are taken into account in our test data set. Fig. 6 shows the constellations after DBP compensation of SSFM-generated signals and DNN-generated signals with different modulation formats. In Fig. 6, the left ones are quite similar to the right ones, indicating visually that the DNN model has good generalization for modulation formats.
We also calculate the MSE_nors between the SSFMgenerated signals and DNN-generated signals in both time-and frequency-domain, as is shown in Table II. In addition, different wavelength schemes are also considered in the test data set and the corresponding MSE_nors are shown in Table II as well. Note that, when testing the generalization for different modulation formats, we used the DNN model that is trained on the data set which is built under the case (transmission distance = 80 km, optical launch power = 4 dBm, format = '16QAM', λ1 is 1549.32 nm and λ2 is 1550.12 nm). To demonstrate that the DNN model which is trained under a different case has good generalization ability as well, we apply the DNN model that is trained on the data set which is built under another case (transmission distance = 160 km, optical launch power = 4dBm, format = '16QAM', λ1 is 1549.32 nm and λ2 is 1550.12 nm) to test the generalization for different wavelength schemes. As is illustrated in Table II. the MSE_nors are all much less than the upper limit of 0.02. Although the MSE_nors increase when the wavelengths become farther from 1549.32(1550.12) nm, they are still far less than 0.02. Therefore, we believe that the DNN model has a good generalization ability for the input with different modulation formats and wavelength schemes.

C. Time Complexity
Lastly, the time complexity of the DNN-based method is analyzed. In the SSFM-based method, the running time of simulation can be very long because the FFT operation, which is the principal calculation amount of SSFM [2], is performed repeatedly. In the DNN-based method, however, no FFT operation is performed, and only multiplication between neurons is involved. Therefore, the major advantage of DNN-based method compared with SSFM-based method is its low time complexity.
We measured the computing time of SSFM-based fiber model and DNN-based fiber model. Note that, the computing time is only the time consumed by simulating the nonlinear fiber channel, excluding the time consumption of transmitter, receiver, etc. In fairness, we running the codes of SSFM-based method and DNN-based method on the same device (intel@ core i5-10210U). The absolute computing time is strongly dependent upon the hardware performance and computing resources. Therefore, it is reasonable to compare the relative computing times of the two methods. The computing time of DNN-based method with 2 16 symbols at 80 km is selected as the basic reference, meaning that all computing time data are normalized by this time value. It is acknowledged that the advantage of  DNN-based method can be enlarged if the codes are run on the graphics processing unit. However, it is deemed sufficient to illustrate the advantage in the time complexity of DNN-based method with the relative computing time on the same device, though that device is a central processing unit.
As is shown in Fig. 7, with the transmission distance and symbol length increasing, the computing time of SSFM-based method grows rapidly. Nevertheless, the computing time of DNN-based method rises slowly with the symbol length and even keeps almost stable when the transmission distance  increases. It is worth mentioning that, as for the best case (distance = 240 km, symbol length = 2 18 ) shown in Fig. 7, the computing time is relatively reduced by 96.5% with the DNN-based method.

V. CONCLUSION
This work demonstrates the capability of DNN to model the nonlinear fiber channel. Based on the comprehensive analysis of the constellations after DBP compensation, optical waveforms, spectra, and the normalized MSE, it is concluded that the DNN can learn well the linear and nonlinear effects in the fiber channel, such as CD, attenuation, ASE noise, SPM, and XPM. Additionally, it is also found that DNN has a good generalization ability for modulation formats and wavelength schemes, which provides flexibilities and versatility for fiber channel modeling.
Besides, the simulation with DNN-based method can reduce the time complexity significantly (96.5%) compared to SSFM-based method. Therefore, the DNN is a good candidate to model the optical fiber channel in optical communication system simulations. This work is an initial exploration of the DNN-based method applied in the WDM system simulations. Considering that there is no limitation on the input signals of DNN proposed in this work, this DNN-based method is capable of modeling a more complicated communication system with more channels. However, a new input vector structure and parameters of DNN may be required for the reason that the inter-channel interferences become more complex. Moreover, since DNN can be viewed as a universal approximator for both linear and nonlinear functions [24], it has the latent capacity theoretically to approximate the channel transfer function with a longer transmission distance after being trained on the corresponding data set which is composed of the channel input and output signals. Thus, this DNN-based method has the potential to be extended to model a more complicated optical communication system that takes account of polarization effect or more wavelengths or has a longer transmission distance.