Parameter Selection of Direct Modulation Semiconductor Laser for Shaping Current Based on Convolutional Neural Network

The shaping current technology can efficiently and low-costly suppress the relaxation oscillations (ROs) of the direct modulation semiconductor laser (DML) for the high-performance optic system. The parameter selection is the key problem to precisely constructing the injection current form to obtain the desired output waveform. A novel framework based on convolutional neural network (CNN) is proposed to predict the shaped current parameters avoiding the time-consuming and computationally complex problems with analytical solutions. In the network training, batch and min-max normalizations are adopted to optimize neural networks, which aim to accelerate the convergence and improve their approximation ability. The trained inverse CNN named by feeding into the desired data samples from DML output waveform is used to achieve parameter selection for constructing the injection current. Also, the trained forward CNN would verify the validity of selected parameters responding to the output waveform, and get the unique corresponding relationship between them. Simulation results own high agreement with the theoretical values and show that the CNN models provide a powerful tool to select parameters of shaped current with accurate and fast capabilities.


I. INTRODUCTION
S EMICONDUCTOR laser is a key element in optical communication transmission systems whose performance can determine the quality of the whole system [1], [2]. In order to transmit optical communication information, semiconductor lasers usually adopt either external modulation (EM) or direct modulation (DM). EM refers to the method that modulates the output intensity of semiconductor laser with the external modulator based on the external signal. This method has the advantages of high transmission bit rate and long transmission distance, but it needs expensive external equipment [3]. DM alters the injected current of the laser to generate the time-dependent output in the optical intensity, which is simple and low cost. But, the relaxation oscillations (ROs) produced by this method would cause an increasing bit error rate in optical communication [4]. To overcome the ROs, many techniques have been presented, including external electrical circuit [5], outside optical feedback [6], light injection [7], and the modification of the physical laser structure [8]. Compared with these methods, shaping current is easier to achieve and cheaper. The ROs are suppressed by injecting the switching shaped current of direct modulation laser (DML) [3], [9], [10]. However, the switching shaped current has the characteristic of discontinuity.
To overcome the discontinuity, our team has presented an improved continuous shaping current [11], which is a more attractive solution with the characteristic of flexibility. The ROs in the desired output light pulse are determined by the parameters of shaped current, which are difficult to obtain. For the traditional numerical method, a random set of parameters is given according to the parameter ranges to calculate the output waveform, and it is compared with the target waveform to compute the mean square error. If the target precision or the maximum number of iterations is achieved, a set of parameters is output. This method is limited by iterations, so it tends to get the problem of local optimum and huge calculation. Particularly for the semiconductor laser that involves nonlinear processes and multi-physics interactions, the traditional numerical method is very time-consuming.
Because of the flexibility and the learning ability of artificial neural network (ANN), it has been applied in the inverse design [12], [13] and tackles the non-uniqueness problem in the training. For the inverse design of semiconductor laser, P. Feng and Z. Ma have proposed a method that combines This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ ANN with particle swarm optimization to improve the accuracy and speed compared with traditional method [14], [15]. As a special neural network, convolutional neural network (CNN) introduces convolution and pooling operations to generate deep features, thereby enhancing the ability of pattern recognition [16]. CNN has been used in signal extraction [17] and analysis [18] in photonics, which is reliable and effective. Under this background, we propose a scheme to select the parameter for constructing the shaped current waveforms based on CNN.
In this tutorial, the novelty of the proposed solution introduces the CNN model instead of the DML to construct the complicated relationship between the injection current and output waveform by forward and inverse intelligently artificial training. Initially, the theoretical injection current and desired output waveform are derived from the rate equations to get training samples. The batch and min-max normalizations are embedded in neural networks to get better performance. Afterward, CNN models are trained to select and verify the parameters, while the loss function is used to evaluate the performance of CNNs. By this way, the CNN overcomes the problem of time-consuming and huge calculation of traditional numerical method. Moreover, a series of input tests for CNNs indicate that our proposed scheme is useful for the shaping current technology and has the advantages of low cost, high-efficient and accuracy.

A. Dimensionless Rate Equations
The rate equations are an important theory for the study of semiconductor lasers. The semiconductor laser is assumed to be a single-mode laser. In this case, the laser dynamics can be expressed by a system of two dimensional ordinary differential equations driven by carrier density and photon density [4] as follows: where N and S represent the carrier density and photon density, J 0 is the bias current, J(t) is the modulation around the bias, G(N (t), S(t)) is the optical gain coefficient including nonlinear effects,Γ is the restriction factor, d is the active layer thickness of the laser, γ c and γ s is the decay rate of photon and spontaneous carrier. It is necessary to use dimensionless rate equations for achieving the purposes of numerical calculation and analysis. In the absence of modulation (J(t) = 0), the dimensionless quality is carried out at the fixed point (S 0 , N 0 ) of the laser. The nonlinear G(N (t), S(t)) around this point is expanded to the first order as: where G 0 is the optical gain constant, G n = ∂G / ∂N , and G p = ∂G / ∂S . At the time of dS/dt = dN /dt = 0, (S 0 , N 0 ) is submitted for (1) and (2) and then we can obtain the following equations The dimensionless photon density, the dimensionless carrier density, and the current can be defined by: And the dimensionless gain g = G(t)/G 0 is equivalent to where γ n = G n S 0 and γ p = −ΓG p S 0 . The angular frequency of relaxation oscillation is ω R = √ γ c γ n + γ s γ p and the dimensionless time is τ = tω R , then the dimensionless photon density and dimensionless current density can be expressed in terms of To obtain the input current and corresponding output light pulse of DML, it is necessary to derive the dimensionless rate equations. The dimensionless qualities and parameters are shown in Table I, which are based on the works of Lucas Illing [4].

B. Shaping Current Technology Based on Dimensionless Rate Equations
The laser is equated to a nonlinear system. s = e y(τ ) is introduced to get the output waveform function (s), then the differential equation can be derived as ds dτ = e y(τ ) y (τ ) and combine it with ds/dτ in the formula (10). The n(τ ) can be derived as the following expression: can be deduced, after that we solve it and ds/dτ in the (10) simultaneously to obtain the driving current (J M ). The relationship between driving current and output waveform can be expressed as: (13) This formula relates a given waveform s, via y(τ ), to the injection current (J S = 3 + J + J M ≥ 0, the relaxation oscillations in the desired output waveform can be suppressed by injecting J S ) that causes it, which involves complex nonlinear problems. The desired output waveform (s) is selected as an approximate rectangular pulse due to the symmetry of rectangular wave. Then, s is divided into two symmetrical parts, i.e., rising edge (S on ) and failing edge (S of f ).
To realize the effect of steady approximation to rectangular pulse, we use the high-order Fourier series y(τ ) to construct s and y(τ ) can be defined as follows [11]: where N = 2. a k is the constant parameter that determines and resolves whether s approximates to an ideal rectangular pulse. P(x) is a seventh-order polynomial, which can be expressed in the rising edge as: (15) Here x = τ T and T is the dimensionless transition time. To minimize the transition time between the rising and falling in s, the f-function is given by where the value a 0 is 2. In the paper, we only examine the S on . The parameters (a 1 , a 2 ) are the key to constructing injection current since the current can differ greatly even with small changes of them. The selection of parameters in injection current, according to the desired output waveform, is a problem of multi-parameter non-linearization [19]. The parameter selection and obtaining J S with numerical methods is still challenging, as in time-cost and vast calculation. Therefore, we propose a method based on CNN to acquire the parameters and injection current with fast speed and accuracy. Also, the forward neural networks are trained in advance to verify the selected parameters.

A. Artificial Neural Network Architectures for Parameter Verification
To obtain the ANN that can predict the S on corresponding to parameters (a 1 , a 2 ) and J S , a CNN and a DNN are constructed as shown in Fig. 1. The two architectures contain 203 input units, which are used to receive the parameters (a 1 , a 2 ) and the sampling points for J S .
The values of the parameters are randomly selected in the specified ranges of a 1 = [0.13, 0.33] and a 2 = [0.1, 0.23], which are obtained from the shaping current derivation based on single mode rate equations and a large number of scanning using the calculation software.
The DNN architecture consists of an input layer, 20 hidden layers, and an output layer, and its hyper-parameter setting is shown in Table II. The DNN based on the back-propagation algorithm is trained by sampled data [20], [21]. The hidden layers and output layer process the sign, and the ultimate result (S on ) is exported by the output layer neurons [21].
The CNN architecture is composed of convolutional, pooling, and fully connected layers and is a kind of neural network with local connection and weight sharing [22]. The filter in the convolutional layer is used to scan the input data to seek features, and the size we employed is 1 × 3 [18]. After the convolutional layer, the maximum pooling is implemented by 2×2 kernel, and the size of the object is reduced by a factor of two [18], [23]. Before training, the input signal containing parameters (a 1 , a 2 ) and J S is converted to the form of 1×203. Therefore, the filter and maximum pooling used in the CNN are two dimensional. In addition, zero padding is applied to all the convolutional and pooling layers whose strides are set to 1 and 2 respectively. Table III presents the hyper-parameter setting of CNN architecture.
The two models are trained to map the semiconductor laser behavior, which produce S on in response to given parameters (a 1 , a 2 ) and injection current. The CNN model is defined as CNN1. During the process of training, the loss function convergences faster and stabilizes at smaller value when the DNN and CNN adopt the Tan Hyperbolic and Rectified Linear Unit activation functions, respectively. Therefore, to better realize the forward modeling of semiconductor laser based on DNN and CNN, the two models employ different activation functions.

B. The Training of Neural Networks
49999 data samples are generated before the training of neural networks. 44999 samples are used for training, while the validation and test sets contain 2500 samples respectively. In this paper, the loss function adopts the form of mean squared error (MSE) which can be written by: where y i is the i th theoretical value, y i is the i th predicted figure, and n is the sample size. In the training process, the trainable parameters are continuously updated with the reference target of achieving the smallest possible MSE. The dropout method is adopted to randomly abandon neurons which aims to avoid overfitting and improve the generalization ability of the networks. Each neuron is reserved with a fixed probability 0.7 in the hidden layer of DNN and fully connected layer of CNN. The success of the neural network can be testified by comparing the neural network predicted values with the theoretical values of shaping current technology.

C. Neural Network Optimization for Parameter Verification
Since DNN model includes many hidden layers, the data distribution will be changed in the training process and the convergence speed of the model will be affected to some extent. Thus, batch normalization (BN) layer as a special layer is added before the input layer and each hidden layer of DNN. The input data of these layers can be normalized to the standard normal distribution so that the values are transformed into the input-sensitive region. The net input values processed by BN can be defined as where z (l) is the net input value of layer l, E[z (l) ] is the mean of z (l) , var(z (l) ) is the variance of z (l) , and ε is a constant to prevent the denominator equal 0. At the same time, in order not to cause negative effects on the neural network, scaling and shifting are added to change the value range.ẑ where γ is scaling parameter and β is shifting parameter. They can be adjusted continuously during the training of the neural network.
To make the training easier, min-max normalization is adopted to normalize the input data of inverse CNN into the same scale. Since most of the gradient direction is approximate to the optimal search direction, the training efficiency is greatly improved [22]. The min-max normalization converts the inputs of the input layer to the range of [01], and the transformation formula is defined as where x i is the input data, x max and x min are the maximum and minimum values of features x i in all data samples. Fig. 2 shows the training and validation loss curves comparison between DNN and CNN1, the convergence of MSE generated by CNN1 is much faster and the value is much smaller. For better achieving the forward modeling, the MSE between theoretical and predicted S on should be less than 0.001. It can be seen that the training curve of DNN drops drastically and gradually stabilizes around 10 −4 , while the training curve of CNN1 tends to be steady and finally stabilizes around 10 −6 .
In a word, CNN1 should be more accurate to predict the S on corresponding to each combination parameter and J S .
To further demonstrate the consistency between the theoretical values of S on and the predicted values of the neural networks, a sample is selected randomly from the test dataset. DNN and CNN1 are injected the same J S , a 1 and a 2 , but Fig. 3(a) and (b) illustrate the diverse output. Fig. 3 indicates the favorable approximation ability of DNN and CNN1, meantime the ROs are completely suppressed in S on . In Fig. 3(b), the stars are highly coincident with the blue dotted line. However, in Fig. 3(a) the stars fluctuate around the theoretical dotted line. The MSE (calculated by (17)) between the predicted values of neural networks and theoretical values in Fig. 3(a) and (b) are 2.17 × 10 −4 and 6.44 × 10 −6 respectively.
Tables IV and V show partial values of Fig. 3. S 0 represents the theoretical value of the output waveform in the rising edge in Fig. 3. S 1 and S 2 are output waveform values predicted by DNN and CNN1, which are indicated by black squares in Fig. 3(a) and (b). E 1 and E 2 are the absolute errors between the predicted value and theoretical value. RE 1 and RE 2 are the relative errors between the theoretical and the predicted values.
As shown in Tables IV and V, the value of E 2 is much smaller than the value of E 1 , and also the RE 2 is much smaller than RE 1 . The prediction time of a sample by DNN is 0.0473 s, which is 5.5 times that of CNN1. In comparison, the forward CNN   achieves better performance than DNN in terms of speed and accuracy with fewer optimization algorithms. Therefore, CNN1 is adopted to verify the parameters selected by inverse CNN.
To demonstrate the applicability of CNN1, 100 sets of parameters and J S with Gaussian noise are injected into it. The MSE between the predicted and theoretical S on is 2.7×10 -4 , which still meets the quantitative evaluation. Fig. 4 shows the comparison between the theoretical and predicted values of semiconductor laser output waveform. The trends of red and black lines are similar, which illustrate the robustness of CNN1.

A. Parameter Selection of Shaping Current Technology
An inverse CNN model is utilized to realize parameter selection (produces a 1 , a 2 and the sampling points for J S response to S on ) to construct shaped current of semiconductor lasers, as shown in Fig. 5. The CNN model is designed to have 201 input units. To address the problem of difficult training, the min-max normalization is adopted to process the input data of inverse  CNN, which is defined as CNN2. The hyper-parameter setting of CNN2 is the same as CNN1, as shown in Table III, also the filter size, pooling method, padding, and stride setting. For CNN2, the input signal (S on ) is transformed to the form of 1×201. Fig. 6 demonstrates the MSEs (the difference between the theoretical and predicted ones of J S and parameters) of CNN2 during the training and validating, the tendencies of them are similar. It can be seen that the MSEs decline sharply in the first few epochs and finally converge around 10 -3 and 10 -2 . We can conclude that CNN2 is no overfitting problem since the gap of MSEs between the training and validation sets is small. To ensure the validity of CNN2, the model should meet the quantitative evaluation of MSE less than 0.04.
Moreover, to prove the effectiveness and stability of CNN2, 100 desired S on randomly selected from the test set are put to it. The MSE (calculated by (17) of the predicted parameters (a 1 , a 2 ) and the theoretical values derived from these 100 samples are 4.12 × 10 −5 , 2.21 × 10 −5 , respectively. The MSE between the theoretical and predicted J S derived from the 100 samples is 0.003, which illustrates the approximate ability of CNN2.
To more intuitively indicate the reliability of CNN2, four S on samples are selected from the test set, as well as the predicted and theoretical values are presented. Fig. 7 demonstrates the comparison between the predicted J S generated by CNN2 with the theoretical values of the shaping current technology. The MSEs (calculated by (17) between the theoretical and predicted values of J S are 0.00268, 0.00216, 0.0045, and 0.005, which exhibit a good approximation ability of CNN2 from the inverse modeling of semiconductor laser. The absolute and relative errors in Table VI are calculated from sample 1, which concretely show the difference between the theoretical and predicted parameters.
As shown in Fig. 7 and Table VI, the parameters and J S predicted by CNN2 are highly coincident with the theoretical values, so we conclude that CNN2 can overcome the challenges in the parameter selection of shaping current technology.
To confirm the robustness of inverse CNN, the relative Gaussian noise is added to the S on . 100 samples S on are randomly selected from the test set with noise and put into the trained inverse CNN. The MSE results between the theoretical and predicted values of the parameters (a 1 , a 2 ) and J S are 0.00086, 0.0011, and 0.0151, which still satisfy the evaluation. Fig. 8 compares the J S resulting from the S on with noise, which exhibits a good prediction ability of inverse CNN.

B. The Validity Verification of the Selected Parameters
Since the generalization capacity of CNN models, they are combined to select and verify the parameters of shaping current. A set of desired values of S on (different from train, validation, and test sets) is inputted into CNN2. Then the predicted parameters and J S are put into CNN1. Fig. 9 depicts the J S predicted by CNN2, and the predicted a 1 and a 2 are 0.3296627 and 0.1970667. Then they are inputted into CNN1 to acquire a sample of S on . As shown in Fig. 10, the predicted values of CNN1 and desired S on are highly fitted, and the MSE (calculated from (17) between them is 1.68×10 -6 . Also, it proves the high approximation capabilities of the two CNN models.
Table VII demonstrates some of the data in Fig. 10. S 3 is the desired output waveform value. S 4 represents S on value obtained by inputting predicted J S and parameters into trained CNN1. E 3 and RE 3 (calculated from the difference between S 3 and S 4 ) are the absolute and relative errors, respectively. As can be seen    Fig. 9. The predicted injection current of CNN2.
from Table VII, the values of E 3 and RE 3 are small, then the approximation and generalization abilities of CNN models are further proved. Comparison among the several groups of experimental results, we can conclude that the CNNs can accurately realize the parameter selection of shaping current technology with fewer resources.

V. CONCLUSION
In the construction of shaped current, the novel parameter selection scheme based on CNN trained by data samples from dimensionless rate equations is suggested. The BN and min-max normalizations successfully solve the difficult training problems of convergence and reduce the size of neural networks. With normalizing data, the proposed inverse CNN works well in realizing the parameter selection for shaping current technology. Moreover, the trained forward CNN is efficiently utilized to verify the parameters corresponding to S on , which is faster and more accurate than DNN. By combining the results of two CNNs, the parameters selected are very suitable for the theoretical values, and also the generalization ability of the CNNs is demonstrated. The simulation results indicate that the trained CNNs have strongly computing ability and high accuracy to realize the parameter selection and verification.
The proposed scheme would give insights to the development of shaping current technology and can be used as practical design guidelines for suppressing relaxation oscillations of DML. A potential future study is using the CNN predicted injection current to modulate a real semiconductor laser, and analyzing the output waveform.