Performance Enhancement of a Delay-Based Reservoir Computing System by Using Gradient Boosting Technology

Gradient boosting technology has been proved to be an effective scheme for enhancing the performances of spatially distributed reservoir computing (RC) systems. In this work, a gradient boosting scheme by combining two reservoirs is proposed and numerically investigated in a delayed-based RC system. The original reservoir in the delayed-based RC system is a vertical-cavity surface-emitting laser (VCSEL) under polarization-rotated optical feedback (PR-OF), and it is trained on the desired output. The other VCSEL under PR-OF or polarization-preserved optical feedback (PP-OF) is supplemented to be an extra reservoir, which is trained on the remaining error of the original reservoir. Via Santa-Fe time series prediction task and 10th-order nonlinear autoregressive moving average (NARMA10) task, the performances of the delay-based RC system are evaluated before and after supplementing the extra reservoir, and then the effectiveness of the gradient boosting technology in the delayed RC system can be analyzed. The simulated results demonstrate that adopting gradient boosting technology is effective in a delay-based RC system. Comparatively speaking, the enhanced effect is more obvious under taking a VCSEL with PR-OF as the extra reservoir.


I. INTRODUCTION
With lots of information and data abounding in the current age, various efficient information processing technologies have been successively explored [1]. Artificial neural networks (ANNs), based on the reaction mechanism of human's brain on the external information, have aroused wide attention owing to their high efficiency for processing complex tasks such as time series prediction and spoken digit recognition [2]- [4]. Since optical information processing possesses some unique virtues including parallelism and high speed, ANNs implemented in the optical domain exhibit enormous superiority in today's age of big data. As an important branch of photonic ANNs, photonic reservoir computing (RC) has attracted special attention due to its simple training The associate editor coordinating the review of this manuscript and approving it for publication was Vivek Kumar Sehgal . procedure, high computational performance and fast rate [5]- [7]. Photonic RCs includes two categories: spatially distributed RC and delay-based RC. In 2014, Vandoorne et al. proposed a spatially distributed RC system composed of optical waveguides, splitters and combiners, and experimentally performed a XOR task and a head recognition task with a rate up to 12.5 Gbit/s [8]. Very recently, Freiberger et al. demonstrated that the performance of such spatially distributed photonic RC system can be further enhanced by combining two (or more) reservoirs into a gradient boosting approach [9], where only the first reservoir is trained on the desired output and the others are trained to correct the remaining errors of the front reservoirs. Although spatially distributed photonic RC systems possess good processing ability, the requirement for a large number of physical devices as nodes results in a challenge for practical implementation due to relatively high costs and the difficulty to collect the node states. In order to overcome these issues, photonic delay-based RC systems are proposed and investigated, in which the reservoirs are based on a single nonlinear node under time-delayed feedback [10]- [12]. Since semiconductor lasers (SLs) possess relatively high relaxation oscillation frequency and can exhibit rich dynamical states under time-delayed optical feedback, they can be utilized as one kind of the most promising nonlinear nodes in high-speed photonic delay-based RC systems [13]- [18].
As one type of SLs, vertical-cavity surface-emitting lasers (VCSELs) possess some virtues such as small size, low threshold current, single-longitudinal-mode emission, and easiness to integrate [19], [20]. Moreover, two orthogonal polarization components (named as X-PC and Y-PC) may simultaneously exist in a VCSEL under suitable operation parameters, and then richer nonlinear dynamic states can be achieved under external perturbations. Therefore, in a photonic delay-based RC system, adopting a VCSEL as the reservoir is helpful for mapping the input signal into a higher dimensional state space [21]- [25], and the better performance may be achieved in a VCSEL-based RC system. Vatin et al. theoretically and experimentally investigated a RC system based on a VCSEL under optical feedback, and the results demonstrated that the RC system can yield better performance when X-PC and Y-PC in the VCSEL are simultaneously stimulated [26].
At present, the performance enhancement in delay-based RC systems are almost depend on optimizing the operating parameters or the structure of the reservoir. In order to further improving the performance of delay-based RC system, it is necessary to exploit some new technologies. Considering that the gradient boosting technology has been proven to be an effective method for improving the performances in spatially distributed RC systems, we will inspect whether such a technology is also be effective for enhancing the performance of delay-based RC systems in this work. Furthermore, taking into account the superiority of the VCSEL, we take the VCSELs with optical feedback as the reservoirs in the timedelayed RC system. The gradient boosting is implemented by two VCSEL-based reservoirs, where one is named as original reservoir trained on the desired output and the other is named as extra reservoir trained to correct the remaining error of the original reservoir. The optical feedback schemes are polarization-rotated optical feedback (PR-OF) in the original reservoir, and PR-OF or polarization-preserved optical feedback (PP-OF) in the extra reservoir, respectively. The simulated results demonstrate that adopting gradient boosting technology is also effective in a delay-based RC system. Comparatively speaking, the enhanced effect is more obvious under taking a VCSEL with PR-OF as the extra reservoir. Fig. 1 is a schematic diagram of a delay-based RC system using gradient boosting technology. In this RC system, VCSEL1 subject to PR-OF and optical injection is utilized as the original reservoir, and VCSEL2 subject to PR-OF (or PP-OF) and optical injection is utilized as the extra reservoir. VCSEL1 is the nonlinear node in the original reservoir and maps input information into a high-dimensional state space. The fading memory is provided by an optical feedback loop composed of an optical circulator (OC1), a variable optical attenuator (VOA1), a polarization controller (PC1) and a fiber delay line. The feedback strength and polarization direction are controlled by VOA1 and PC1, respectively. Here, we only consider the case that VCSEL1 is subjected to PR-OF. The construction of the extra reservoir is similar with that of the original reservoir except two different optical feedback frames including PR-OF and PP-OF are examined respectively.

II. SYSTEM AND METHODS
A time-dependent input signal is firstly converted into U (n) through sampling the signal and holding each sampling point with an operation-time T . Then, U (n) is converted into a masked input signal S(t) after multiplying a temporal mask signal Mask(t) and a scaling factor γ . Through modulating the injection light output from a drive SL via a Mach-Zehnder modulator (MZM), S(t) is loaded into the injection light. The output of the MZM is split into two parts by a fiber coupler (FC5). One part is injected into the original reservoir after passing through PC3, and the other is injected into the extra reservoir after passing through PC4. Here, we consider the case that the input signal is only injected into the X-PC of two VCSELs, which can be achieved through adjusting PC3 and PC4, respectively.
The role of the mask is to ensure the variability of an input signal over different virtual nodes where the information are read out. The period of Mask(t) is equal to the sampling time interval T , which is divided into N sub-intervals of duration θ = T /N . In the original (or extra) reservoir, the total output intensity |E| 2 = |E x | 2 + |E y | 2 (E x and E y are the electric field of the X-PC and Y-PC, respectively) of VCSEL1 (VCSEL2) with an interval of θ is interpreted as a state of a virtual VOLUME 8, 2020 network node. For an input database including M data, the state matrix X o of the original reservoir and the state matrix X e of the extra reservoir include M × N matrix elements, respectively.
As mentioned above, for gradient boosting technology, the original reservoir is trained on the desired outputȳ(n), and the remaining error is expected to be reduced by adding an extra reservoir. Thus, the training process for the extra reservoir is different from that for the original reservoir. For training the original reservoir, the testing database is adopted as the input signal to acquire X o , and then the output weight of the original reservoir W o,out can be calculated through a Moore-penrose pseudo-inverse method, i. e. Through calculating the normalized mean square error (NMSE) and mean absolute error (MAE), the performances of the RC system before and after introducing the extra reservoir can be examined, and therefore the effectiveness for the performance enhancement by the gradient boosting technology can be evaluated. NMSE and MAE are respectively expressed as [27], [28]: where L is the length of the testing data, var(ȳ) represents the variance.
In this work, two benchmark tasks including Santa-Fe time series prediction task and 10th-order nonlinear autoregressive moving average (NARMA10) task are utilized to evaluate the effectiveness of gradient boosting technology for the performance enhancement of a delay-based RC system.

III. THEORY MODEL
Based on the spin-flip model (SFM), under the optical feedback and optical injection, the rate equations modeling the nonlinear dynamics of the VCSEL1 in the original reservoir and VCSEL2 in the extra reservoir can be described by [20]: where superscripts x and y represent the X polarization component (X-PC) and Y polarization component (Y-PC), respectively, and subscripts 1 and 2 represent VCSEL1 and VCSEL2, respectively. E is the slowly varied complex amplitude of the electric field, whose decay rate is κ. Q accounts for the total carrier inversion between conduction and valence bands, and q is the difference between carrier inversions for the spin-up and spin-down radiation channels. α is the linewidth enhancement factor, γ N stands for the decay rate of Q, γ s is the decay rate of the spin-flip rate, γ α represents the linear dichroism, and γ p expresses the linear birefringence.
µ is the injection current normalized to the threshold, where µ takes the value 1 at threshold. For simplicity, the intrinsic parameters for two VCSELs are assumed to be identical. The third term in the Eqs. (3) and (4) describes the influence of optical feedback, in which k d stands for the feedback strength. Considering two different feedback cases of PP-OF and PR-OF, the feedback electric fields are described as [29]: PR-OF : PP-OF : where ω 0 is the central angular frequency of VCSEL. Here, we assume the central angular frequencies of two VCSELs are identical and taken as 2.22 × 10 15 Hz, which corresponds the central wavelength of 850 nm. τ is the feedback time.
In this work, the synchronization scheme is adopted, i. e. τ = T [30], [31]. The fourth term in Eq. (3) represents the injection optical field, in which k inj is the injection strength and ε(t) is the injection field output from the MZM. ε(t) is described as [32]: where |ε 0 | is the amplitude of the injection field, and ω is the angular frequency detuning between SL and VCSEL, which is set at 0 in this work. S(t) is the masked input data, which can be described as follows: where γ is the input scaling factor, Mask(t) is the mask signal, and U (n) is the sampled and held input signal.
The last terms in the Eqs. (3) and (4) are the spontaneous emission noises which are described as Langevin sources [33]: where β sp is the spontaneous emission coefficient and is set at 10 −6 ns −1 , and ξ is the complex Gaussian white noises with unit variance and zero mean. We use the fourth-order Runge-Kutta method to numerically solve Eqs. (3)-(6) with a step of 2 ps by MATLAB software. During the simulations, the parameters of the two VCSELs are [34]: γ = 1, α = 3, |ε 0 | = 2/3, κ = 300 ns −1 , γ N = 1 ns −1 , γ s = 50 ns −1 , γ α = 0.1 ns −1 , γ p = 10 ns −1 and µ = 1.01. In this work, we use a random signal with two discrete levels (1, -1) as the mask signal Mask(t) for both tasks. θ is set as 10 ps for Santa-Fe time series prediction task and 22 ps for NARMA10 task, respectively, and the number of the virtual nodes N is fixed at 400 for both two tasks.

IV. RESULTS AND DISCUSSION
A. SANTA-Fe TIME SERIES PREDICTION TASK Santa-Fe time series prediction task is a one-step ahead prediction for a chaotic time series experimentally recorded from a far-infrared laser operating at a chaotic state. A Santa-Fe data set contains 9000 points, and the first 4000 points are adopted in the simulation in which the front 75% and the remaining 25% are used for training and testing, respectively. For this task, the system is considered to perform well if NMSE ≤ 0.1 [35]. First, we discuss the performance of the delay-based RC system before introducing the gradient boosting technology, i. e. the RC system only includes the original reservoir. Fig. 2  which is the corresponding value for a point with (k d 1 , k inj 1 ) = (32 ns −1 , 16 ns −1 ) at the boundary line in Fig. 3(a). Through comparing the Fig. 2(a) and Fig. 2(b), one can find that the varying trends for NMSE and MAE are basically similar. Therefore, in the following discussion, only NMSE is adopted to assess the performance of the RC system.
Next, we select two points (point A and point B) in Fig. 2(a) to examine whether the RC performance can be improved after introducing an extra reservoir. For point A and point B, the operation parameters (k inj 1 , k d 1 ) are (10 ns −1 , 36 ns −1 ) and (31 ns −1 , 30 ns −1 ), and the corresponding NMSEs are 0.2 and 0.0173, respectively. Obviously, the performance of RC system is poor at point A and is well at point B. Fig. 3 displays the corresponding results after introducing the extra reservoir. The top (below) row corresponds the original reservoir operating at point A (B) in Fig. 2(a), and the left (right) column is for the PR-OF (PP-OF) frame adopted in the extra reservoir. Here, the injection strength k inj 2 and feedback strength k d 2 of the extra reservoir are considered as variable parameters. For the original reservoir of the RC system operating at point A, the value of NMSE is 0.2, and therefore the performance of the RC system is poor. Under this case, we focus on the required parameter spaces of the extra reservoir for achieving well performance (NMSE ≤ 0.1), which is the region surrounded by the white dashed line in the top row of Fig. 3. Obviously, adopting PR-OF frame in the extra reservoir, the parameters region to realize a well performance is wider, and the minimal NMSE of the RC system can be decreased to 0.08 under (k inj 2 , k d 2 ) = (35 ns −1 , 45 ns −1 ). For only the original reservoir operating at point B, the NMSE is 0.0173. Under this case, we inspect whether using gradient boosting technology can improve the performance of the RC system to exceed the best performance of the RC system with VOLUME 8, 2020  For comparison, Table 1 summarizes some typical results for processing Santa-Fe prediction task via different methods including RC with chaotic mask [14] and RC based on deep learning structure (deep DFR) [36]. One can find that, after adopting gradient boosting technology, the performance of this delay-based RC system for predicting Santa-Fe time series can be improved to a level comparable with the other two RC systems.

B. NARMA10 TASK
NARMA10 task is one of the most widely used benchmark tasks for RC systems. For this task, the input u k of the system is composed of scalar random numbers drawn from a uniform distribution in the interval [0, 0.5], and the target y k+1 is given by [37]: y k+1 = 0.3y k + 0.05y k ( 9 i=0 y k−i ) + 1.5u k−9 u k + 0.1 (13) A sequence including 4000 data is generated, where the first 3000 data and the next 1000 data are used for training and testing, respectively.
First, we check the performance of the delayed-based RC system with only the original reservoir. The NMSE values in the parameter space of k d 1 and k inj 1 are given in Fig. 4, in which the region surrounded by the white dashed lines is considered to satisfy well performance, i. e. NMSE ≤ 0.2. The region for that the RC system possesses a well performance is relatively small, and the minimum NMSE is 0.12 located at (k   For the original reservoir operating at the point A in Fig. 4, before adding the extra reservoir, the NMSE of the RC system is 0.3, which means that the performance of the RC system is poor. After adding the extra reservoir, the NMSEs of the RC system in the parameter space of k inj 2 and k d 2 are shown in Fig. 5(a) and (b). Comparatively speaking, the parameters region required for achieving a well performance is wider for the extra reservoir adopting PR-OF frame, and the minimal NMSE of the RC system can be decreased to 0.13 under (k inj 2 , k d 2 ) = (35 ns −1 , 35 ns −1 ). For the original reservoir operating at point B in Fig. 4, the corresponding results are presented in Fig. 5(c) and (d) after introducing the extra reservoir. The white dashed line represents NMSE = 0.12, which is the minimum value of NMSE before introducing the extra reservoir. Obviously, under the parameters of the extra reservoir located within the region surrounded by the white dashed line, the performance of the RC system can be improved to exceed the best performance of the RC system with only the original reservoir. The minimal NMSE of the RC system is 0.09 obtained under PR-OF with (k inj 2 , k d 2 ) = (31 ns −1 , 33 ns −1 ). For comparison, Table 2 summarizes some typical results for processing NARMA10 task via different methods including recurrent neural network (RNN) [37], long-lasting short-term memory (LSTM) system [38], optoelectronic RC system [39], and our proposed system in this work. From this table, it can be seen that adopting the gradient boosting technology is effective for enhancing the performance of a delay-based RC system. Although the minimum NMSE is slightly higher than those obtained in RNN system and LSTM system, it may be further decreased through combining much more reservoirs into the gradient boosting approach.

V. CONCLUSION
In summary, via processing Santa-Fe time series prediction task and 10th-order nonlinear autoregressive moving average task (NARMA10), we have numerically demonstrated the effectiveness of gradient boosting technology for improving the performances of a delay-based reservoir computing (RC) system. In this work, the gradient boosting is implemented by introducing an extra reservoir to correct the error of the original reservoir in the RC system. After taking into account the unique virtues of VCSELs, a VCSEL under PR-OF is taken as the original reservoir and a VCSEL under PR-OF or PP-OF is used as the extra reservoir. The simulated results show that for both the two tasks, after adopting the gradient boosting technology, a poor RC performance can be raised to a well level while a well performance level can be further enhanced. Comparatively speaking, the enhanced effect is more obvious under taking the VCSEL with PR-OF as the extra reservoir.