Fast Nonlinear Fourier Transform Algorithms Using Higher Order Exponential Integrators

The nonlinear Fourier transform (NFT) has recently gained significant attention in fiber optic communications and other engineering fields. Although several numerical algorithms for computing the NFT have been published, the design of highly accurate low-complexity algorithms remains a challenge. In this paper, we present new fast NFT algorithms that achieve accuracies that are orders of magnitudes better than current methods, at comparable run times and even for moderate sampling intervals. The new algorithms are compared to existing solutions in multiple, extensive numerical examples.


I. INTRODUCTION
T HE fast Fourier transform (FFT) is a well-known success story in engineering. From a numerical point of view, this is quite surprising since, at first sight, the FFT provides a mere first-order approximation of the discrete-time Fourier transform one is actually interested in. Upon closer inspection, it however turns out that approximations based on FFTs perform much better once the underlying signal is smooth [17]. Recently, nonlinear Fourier transforms (NFTs) have been gaining much attention in engineering areas such as fiberoptic communications [3], [35] and coastal engineering [12], [24]. NFTs are generalizations of the conventional Fourier transform that allow to solve specific nonlinear evolution equations in a way that is analog to how Fourier solved the heat equation [1]. The evolution of the nonlinear Fourier spectrum is, exactly like in the linear case, much simpler than the evolution of the original signal. NFTs also have unique data analysis capabilities that enable the detection of particlelike signal components known as solitons [31]. Recently, a nonlinear variant of the FFT has been derived [41], [42]. These type of fast NFTs (FNFTs) can provide up to second-order accuracy [35]. Unfortunately, unlike for the FFT, the accuracy of the FNFTs in [35], [41], [42] remains (at most) second-order even if the underlying signal is smooth. As a result, engineers currently have to strongly oversample even smooth signals in order to get reliable numerical results ( [16,Sec. 4]). Several authors have proposed NFT algorithms with higher orders of accuracy, utilizing either Runge-Kutta [13], [45] or implicit Adams methods [36]. However, even though these methods have higher accuracy orders, they require very small sampling intervals in order to actually perform better than standard second-order methods such as [10]. For practically relevant sampling intervals, they are typically not the best choice as they are slower and may even perform worse in these regimes. Numerical methods that provide better complexity-accuracy trade-offs in practically relevant regimes have been an open problem until recently. In [14], the authors introduced a new numerical method that can compute the NFT with accuracies that are orders of magnitudes better than those of standard methods while having comparable run times. In this paper, the following three contributions are made.
1) Improved variants of the numerical NFT in [14] with even better complexity-accuracy trade-offs are presented and investigated in detailed numerical examples. 2) Fast versions of the new numerical NFTs are derived.

3) A series acceleration technique is investigated to im-
prove the complexity-accuracy trade-off even further. To give an illustration of the achieved gains, we point out that in one of our numerical examples, the best-performing of our new methods achieves an accuracy that is hundred million times better than the standard second-order method in [10] at a comparable run time. 1 The paper is structured as follows. In Sec. II, we derive improved versions of our recently proposed numerical NFT in [14], and compare them with both conventional second-order and other higher-order NFT algorithms in multiple numerical examples. Then, in Sec. III, we demonstrate how some of our new numerical NFTs can be made fast. The fast versions are compared with their slow counterparts. Next, in Sec. IV, we investigate how series acceleration techniques can improve the complexity-accuracy trade-off of the fast NFT methods even further. Sec. V deals with a special part of the NFT that occurs only in some cases. The paper is finally concluded in Sec. VI. column and jth row: [·] i,j ; Fourier transform of function f (t), f (ξ) = ∞ −∞ f (t)e −itξ dt : F (f (t)); Inverse Fourier transform of functionf (ξ), f (t) = 1 2π ∞ −∞ f (t)e itξ dξ : F −1 (f (ξ)).

A. Nonlinear Fourier Transform (NFT)
In this section we describe the mathematical machinery beyond the NFT. Let q(x, t) denote the complex envelope of the electric field in an ideal optical fiber, whose evolution in normalized coordinates is described by the nonlinear Here, x ≥ 0 denotes the location in the fiber and t denotes retarded time. The parameter κ determines if the dispersion in the fiber is normal (-1) or anomalous (+1). When κ = +1 Eq. 1 is referred to as the focusing NSE and for κ = −1 as the defocusing NSE. The NFT that solves the NSE (Eq. 1) is due to Zakharov and Shabat [46]. It transforms any signal that vanishes sufficiently fast for t → ±∞ from the time-domain to the nonlinear Fourier domain through the analysis of the linear ordinary differential equation (ODE) where q(t) = q(0, t), subject to the boundary conditions The term λ ∈ C is a spectral parameter similar to s in the Laplace domain. The matrix V (t, λ) is said to contain the eigenfunctions since Eq. 2 can be rearranged into an eigenvalue equation with respect to λ [1]. One can view the eigenfunctions V (t, λ) as being scattered by q(t) as they move from t → −∞ to t → ∞. Hence Eq. 2 is referred to as the scattering problem [46]. (Many problems in signal processing can be expressed through such scattering problems [11].) For Eq. 2 subject to boundary conditions Eq. 3, there exists a unique matrix called the scattering matrix such that [1] φ(t, λ)φ(t, λ) = ψ (t, λ) ψ(t, λ) S(λ).
The components a(λ), b(λ),b(λ) andā(λ) are known as the scattering data. The components a(λ) and b(λ) are sufficient to describe the signal completely. Their evolution along the x dimension (along the length of the fiber) is simple [1, Sec. III] The reflection coefficient is then defined as ρ(λ) = b(λ)/a(λ) for λ ∈ R. The nonlinear Fourier spectrum can also contain a so-called discrete spectrum in the case of κ = 1. It corresponds to the complex poles of the reflection coefficient in the upper half-plane, which are given by the zeros λ k ∈ H of a(λ). It is known that there are only finitely many (N ) such poles. The poles λ k are also referred to as eigenvalues and a corresponding set of values b k := b(λ k ) are known as norming constants. There are different ways to define a nonlinear Fourier spectrum. One possibility is {ρ(λ)} λ∈R , (λ k , ρ k : [18]. In this paper we are primarily interested in computation of ρ(λ) but some notes regarding computation of b(λ) and the λ k will also be given. Although we will illustrate our algorithms by applying them to the specific case of NFT of NSE with vanishing boundary condition, it should be noted that we are primarily presenting algorithms for solving equations similar to Eq. 2 [1,Eq. 2]. Hence the algorithms presented in this paper can be extended to compute the NFTs w.r.t to other nonlinear evolution equations such as the Korteweg-de Vries equation [26].

II. NUMERICAL COMPUTATION OF NFT USING HIGHER ORDER EXPONENTIAL INTEGRATORS
In this section we will start by outlining some assumptions required for the numerical methods that will be presented. We will give a brief overview of one of the approaches for computing the NFT and then talk specifically about our proposal to use a special class of so-called exponential integrators. To evaluate the methods, we describe examples and performance criteria. We will finally show and compare the results for various methods applied to the mentioned examples.

A. Assumptions
The numerical computation of the NFT is carried out with discrete data samples on finite precision machines. Hence, we need to make the following assumptions: 1) The support of the potential q(t) is truncated to a finite interval, t ∈ [T − , T + ], instead of t ∈ (−∞, ∞). The values T ± are chosen such that the resulting truncation error is sufficiently small. The approximation is exact if We assume that the signal is sampled at the midpoints of each subinterval t n = T − + (n+ 0.5)h, n = 0, 1, . . . , D − 1 such that q n := q(t n ).

B. Numerical Scattering
The main step in numerically computing the NFT is to solve the scattering problem in Eq. 2 for φ(T + , λ) for different values of λ. We can view the D subintervals as layers which scatter the eigenfunction φ(t, λ) as it moves from t = T − to t = T + . Using numerical ODE solvers we solve for an approximationφ(T + , λ) of φ(T + , λ). By taking ψ(T + , λ) equal to the limit in Eq. 3 at t = T + , we can compute a numerical approximation of the scattering data and ultimately the reflection coefficient.

C. Exponential Integrators
Almost any numerical method available in literature for solving ODEs can be used to solve for φ(T + , λ) [13], [37]. However, we are particularly interested in so-called exponential type integrators. These methods have been shown to provide a very good trade-off between accuracy and computational cost in several numerical benchmark problems while being fairly easy to implement, see [7] and references therein. We propose to use a special sub-class known as commutatorfree quasi-Magnus (CFQM) exponential integrators as some NFT algorithms based on these integrators turn out to have the special structure [41] that is needed to make them fast. We show this in Sec. III. For rest of the paper we adopt the notation used in [8]. The results in [8] provide the following iterative scheme to compute a numerical approximationφ(T + , λ) of φ(T + , λ): where (8) By iterating from n = 0, 1, . . . , D−1, we obtain the numerical approximation of φ(T + , λ) that we need to compute the NFT. Denoting CF J (t n , λ) in short by G n (λ) we can writê For an integrator CF J , r is the order and J is the number of matrix exponentials required for each subinterval. An integrator of order r has local error (error in each subinterval) of O(h r+1 ). Over D (∝ 1/h) such subintervals i.e., over the interval [T − , T + ], the global error will be O(h r ). This distinction of local and global error will become important when we define the error metric used to compare the various algorithms in Sec.II-D. K is the number of points within each subinterval where the potential value needs to be known. The coefficients c k and a jk are unique for each combination of r and J. We refer the reader to [8] for their derivation. The integrator CF is also sometimes referred to as the exponential midpoint rule. It was used in the context of NFT for the defocusing NSE (κ = −1) by Yamada and Sakuda [43] and later by Boffetta and Osborne [10]. For CF [2] 1 , Eq. 7 reduces toφ n+1 (λ) = G n (λ)φ n (λ), where G n (λ) = expm(hC n (λ)), This is applied repeatedly as in Eq. 9 to obtainφ(T + , λ).
In [14] we investigated the possibility of using CF [4] 2 (first introduced in [34]) to obtainφ(T + , λ). We were able to show its advantage over CF [2] 1 when considering the tradeoff between an error and execution time. Here we investigate further in this direction and evaluate CF [4] 3 , CF [5] 3 and CF [6] 4 , which are fourth-, fifth-and sixth-order methods respectively. The CFQM exponential integrators require multiple nonequispaced points within each subinterval. However, it is unrealistic to assume that signal samples at such non-equispaced points can be obtained in a practical setting. Hence in [14] we proposed the use of local cubic-spline based interpolation to obtain the non-equispaced points from the mid-points of each subinterval. We will refer to the samples at these midpoints as the given samples. Although the use of a local cubic-spline based interpolation was found to be sufficiently accurate for CF [4] 2 , it is insufficient for higher-order methods. Here, we propose to combine bandlimited interpolation with the timeshift property of Fourier transform, i.e., to obtain the irregularly spaced samples. This interpolation rule is accurate when q(t) is sampled in accordance with the Nyquist criterion. As we are working with discrete signal samples, the interpolation can be implemented efficiently using fast Fourier transform (FFT).

D. Error Metric and Numerical Examples
In this section, we compare the performance of CFQM exponential integrators CF [2] 1 , CF [4] 2 , CF [4] 3 , CF [5] 3 and CF [6] 4 , the two-step Implicit-Adams method (IA 2 ) introduced in [36] and the fourth-order Runge-Kutta method [13] (RK 4 ) for computation of the reflection coefficient. The fourth-order Runge-Kutta method (r = 4) was the first fourth-order method used for computation of reflection coefficient in [13], [45]. We include the third-order two-step Implicit-Adams method (r = 3) here as it was the first time a higher-order method was introduced in the context of fast nonlinear Fourier transform. The meaning of "fast" will be made precise in Sec. III. We are interested in evaluating the trade-off between the increased accuracy and execution time due to use of higherorder methods. We asses the accuracy of different methods using the relative L 2 -error where ρ(λ) is the analytical reflection coefficient,ρ(λ) is the numerically computed reflection coefficient and λ n are M equally-spaced points in [−λ max , λ max ]. E ρ is a global error and hence it is expected to be O(h r ) for an integrator of order r as explained in Sec. II-C. We fixed M = 2 9 to ensure acceptable execution times for the numerical tests. 1) Example 1: Hyperbolic Secant, κ = 1: As the first numerical example we chose the signal q(t) =qe −2iλ0t sech(t). The closed form of the reflection coefficient is given by applying the frequency-shift property [44, Sec. D] to the analytical known reflection coefficient of the secant-hyperbolic signal [29], where Γ(·) is the gamma function. The discrete spectrum is We setq = 5.4, λ 0 = 3, λ max = 10, and chose [T − , T + ] = [−32, 32] to ensure negligible truncation error.
2) Example 2: Rational Reflection Coefficient with one pole, κ = 1: The signal is given by [28] where α, β are scalar parameters and γ = αα * + β 2 . We used α = 1 and β = −1. The corresponding reflection coefficient is then known to be We set λ max = 60 and chose [T − , T + ] = [−12, 0]. As the signal in Eq. 17 has a discontinuity, it cannot be interpolated well using bandlimited interpolation. We hence assume only in this example that we can sample the signal at exactly the points that we require.
3) Example 3: Hyperbolic Secant, κ = −1: The signal is given by where F , L and Q are scalar parameters. We used F = 1.5, L = 0.04 and Q = 5.5. The corresponding reflection coefficient is known to be [25] where Γ(·) is the gamma function, The numerical methods were implemented in MATLAB using the exact representation of a 2 × 2 matrix exponential in [6] for the CFQM exponential integrators. The execution times for each method were recorded using the MATLAB stopwatch function (tic-toc). The execution times are of course implementation-specific, but we tried to make the implementations as comparable as possible. Fig. 1 shows the error measure E ρ , as defined in Eq. 12, for Example 1 for a range of relatively large sampling intervals h. These results suggest that the higher-order methods can always be preferred over the lower-order methods. The error measure E ρ for smaller sampling intervals h for Example 1, 2 and 3 are shown in Figs. 2, 3 and 4 respectively. To ensure that the discontinuity in Example 2 is faithfully captured, we use a slightly different time grid for the Runge-Kutta method and the Implicit-Adams method, compared to the description in Sec. II-A. For all three examples, the slopes of the error-lines are in agreement with the order r of each method except for IA 2 . For smooth signals IA 2 is seen to have an error of order four rather than the expected three. This observation is in agreement with [36,Fig. 2]. However, for the discontinuous signal of Example 2 we see third-order behavior as expected. We can also see that a higher r generally corresponds to better accuracy (lower E ρ ) for the same h. However, that is not necessarily obvious as seen in Fig. 2, where CF [5] 3 is more accurate than CF [6] 4 for larger h. The advantage of using three exponentials (J = 3) in CF [3] 4 instead of two in CF [2] 4 is also clear from the figures. The thirdorder Implicit-Adams method (IA 2 with r = 3) and fourthorder Runge-Kutta method (RK 4 ) may be more accurate than CF [1] 2 depending on the signal and other parameters, but have lower accuracy compared to CF [2] 4 and CF [3] 4 . The error E ρ reaches a minimum around 10 −12 and can start rising again as seen in Fig. 4 for CF [4] 6 . To understand this behavior, note that the error per layer in Eq. 7 is actually O(h r+1 +ε), where ε is a small constant due to finite precision effects that can normally be neglected. The total error is thus O(h r + εh −1 ). As h is becoming smaller and smaller, the second component also known as the arithmetic error, becomes dominant and eventually causes the total error to rise again [22].
For the CFQM exponential integrators, computation of the transfer-matrix H(λ) in Eq. 9 for each λ requires JD multiplications of 2 × 2 matrices (Eq. 7) for D(∝ 1/h) given samples. If the reflection coefficient is to be computed at M points then the overall computational complexity will be of the order O(DM ), because J ≪ D. In Fig. 5 we plot the execution times of all the methods for Example 1. These execution times are representative for all examples. We can see that the execution time scales linearly with D as expected since M is kept constant in the numerical examples. The execution time of the CFQM exponential integrators is also a linear function of J. The IA 2 , RK 4 and CF [2] 4 methods have similar execution times. To evaluate the trade-off between the execution time and accuracy, we plot the execution time on the x-axis and the error on the y-axis in Fig. 6 for Example 1. To read such trade-off plots we look at the error achieved by each method for a given amount of time. For Example 1 it turns out that CF [3] 5 provides the best trade-off, but we can conclude that extra computation cost of the higher-order methods is generally justified by increased accuracy.
If M is of the same order as D then the computational complexity of the NFT will be O(D 2 ). Although performing matrix multiplications of 2 × 2 matrices is fast, the total cost of the NFT is significantly higher when compared to its linear analogue, the FFT, which has a complexity of only O (D log D). So the natural question to ask is; Can the complexity be reduced?     In this section, we demonstrate how the complexity of some of them can be reduced to O(D log 2 D) by integrating them into the framework in [41], [42].

A. Fast Scattering Framework
In the framework proposed in [41], each matrix G n (λ) is approximated by a rational function matrixĜ n (z), where z = z(λ) is a transformed coordinate. By substituting these approximations in Eq. 9, a rational function approximation H(z) of H(λ) is obtained.
We want to compute the coefficients of the numerator and denominator polynomials, respectively. A straightforward implementation of the matrix multiplication where each entry Hence it is referred to as fast scattering. The resulting rational function approximationĤ(z) is explicitly parametrized in z and hence Eq. 9 is reduced to polynomial evaluations for each z. To elaborate, we again restrict ourselves to G n (λ) of the form Eq. 7. Hence for CF [2] 1 , we need to approximate G n (λ) = expm(hC n (λ)). The matrix exponential can be approximated to various orders of accuracy using rationals [9] or using splitting-schemes such as the well-known Strangsplitting. Higher-order splitting-schemes were developed in [26] that map λ ∈ R, the domain of the reflection coefficient, to z = exp(iλh/m) on the unit circle, where m is a real rational. Such mappings have a certain advantage when it comes to polynomial evaluations which we cover in Sec. III-B. For a higher-order CF [r] J integrator, each G n (λ) is a product of J matrix exponentials. For example let us look at CF [4] 2 . We can write G n (λ) = expm(hC 2 n (λ)) expm(hC 1 n (λ)), Each of the two matrix exponentials can be approximated individually using a splitting-scheme from [26].Ĥ(z) can then be obtained as in Eq. 21. However, there are a few caveats which prevent extension of this idea to higher-order methods. The splitting-schemes in [26] should not be applied to CFQM exponential integrators with complex coefficients a jk . Complex coefficients mean that λ ∈ R is no longer mapped to z on the unit circle. Such a mapping is undesirable for polynomial evaluation as will be explained in Sec. III-B. In addition, we do not even obtain a polynomial structure if there exists no z such that exp(iλh K k=1 a j,k ) is an integer power of this z for all j. Furthermore, if such a z exists but only for high co-prime integer powers,Ĝ n (λ) will consist of sparse polynomials of high degree, which can significantly hamper the computational advantage of using the approximation. Due to these reasons we restrict ourselves to fast implementations of CF and FCF [4] 2 . The FCF [2] 1 algorithm is already available as a part of the FNFT software library [40]. However, accuracy and execution times for it haven't been published formally anywhere in literature. The FCF [4] 2 algorithm is completely new. For both FCF [2] 1 and FCF [4] 2 we use the fourth-order accurate splitting [26,Eq. 20].

B. Fast Evaluation
Once we obtain the rational function approximationĤ(z) of H(λ) in terms of numerator and denominator coefficients, we only have to evaluate the numerator and denominator functions for each value of z = z(λ) in order to compute the reflection coefficient. The degree of the polynomials to be evaluated will be at least D which can be in the range of 10 3 -10 4 . It is known that evaluation of such high-degree polynomials for large values of z can be numerically problematic [42, Sec. IV.E]. However, by choosing the mapping z = exp(iλh/m), which maps the real line to the unit circle, the polynomials need to be evaluated on the unit circle where evaluation of even high-degree polynomials is numerically easier. The higher-order splitting schemes in [26] were developed with such a mapping in mind allowing for approximations of the matrix exponentials as rational functions in z. Evaluating any polynomial of degree D using Horner's method has a complexity of O(D). Hence for M values of z, the total cost of fast scattering followed by polynomial evaluation will be O(D log 2 D + M D). Mapping λ ∈ R to z on the unit circle has an additional computational advantage. Let p(z) = p N z N + p N −1 z N −1 + · · · + p 0 be a polynomial in z of degree N . Evaluation of p(z) at a point z k can be written as For M points z k , k = 1, · · · , M on the unit circle, this is equivalent to taking the Chirp Z-transform (CZT) of the polynomial coefficients along an appropriately defined contour. The CZT can be computed efficiently using the algorithm in [27] which utilizes FFT's. We can also see Eq. 23 as a non-equispaced discrete Fourier transform (NDFT) of the polynomial coefficients which allows us to utilize efficient NFFT algorithms in [20] for evaluating the polynomial. If the number of evaluation points M is in the same order of magnitude as D, the complexity of evaluation becomes O(D log D) and hence the overall complexity of the fast nonlinear Fourier transform (FNFT) is O(D log 2 D).

C. Numerical Examples
We implemented FCF [2] 1 and FCF [4] 2 using the fourth-order accurate splitting scheme [26, Eq. 20] and the fast polynomial  multiplication algorithm [41, Alg. 1]. We used the NDFT routine from [20] for evaluating the polynomials. Now we compare the implementations of CF [2] 1 and CF [4] 2 presented in Sec. II-D and their fast versions FCF [2] 1 and FCF [4] 2 . We plot the error versus the execution time for Example 1 in Fig. 7, for Example 2 in Fig. 8 and for Example 3 in Fig. 9. In the three figures we can see that the fast FCF algorithms achieve similar errors as their slow CF counterparts in a significantly shorter time. From the other viewpoint, for the same execution time, the FCF algorithms achieve significantly lower errors compared to CF algorithms. The trade-off shifts further towards FCF algorithms as the number of evaluation points M increases. FCF [4] 2 outperforms FCF [2] 1 in the trade-off for both the examples which again highlights the advantage of using higher-order CFQM exponential integrators. FCF [4] 2 reaches a minimum error around 10 −11 and then starts rising. This is again attributed to the arithmetic error becoming the dominant source of error. We remark that in numerical tests the CZT was found to perform equally well as the NDFT before the error minimum but the error rise thereafter was significantly steeper.
Since the NFT is a nonlinear transform, it changes its form under scaling, and computing it typically becomes increasingly difficult when being scaled up [37]. Hence it is of interest to study scaling of error with increase in signal energy. To test the scaling we use Example 1 and sweep the signal amplitudeq from 0.4 to 10.4 in steps of 1.0 while keeping all other parameters the same as before. As the time-window remains the same, scaling the signal amplitude leads to directly proportional scaling of the signal energy. We compute the error E ρ for decreasing h for each value ofq for the CF and FCF algorithms. We plot E ρ versus the sampling interval h on a log-scale for CF algorithms in Fig. 10 and for FCF algorithms in Fig. 11. Instead of plotting individual lines for each value ofq, we represent the amplitude using different shades of gray. As shown in the colourbar, lighter shades represents lowerq and darker shades represent higherq. The stripes with a higher slope are the higher-order methods. The piecewise constant signal approximation becomes less accurate with increasing signal amplitude, which directly translates to a higher error E ρ for larger values ofq. All the four algorithms i.e., CF [2] 1 , CF [4] 2 , FCF [2] 1 and FCF [4] 2 show similar trends for the scaling of error with signal energy. The CF [2] 1 algorithm was compared with other methods in [37] (where it is referred to as BO), and they conclude that CF [2] 1 scales the best with increasing signal amplitude. Hence the results shown in Fig. 10 are very motivating as CF [4] 2 is seen to scale almost as well as CF [2] 1 . The error of approximations used in the FCF algorithms also depends onq. However, comparing Fig. 10 and Fig. 11 we can see that the contribution of the approximation error is negligible. These results combined with the results in the trade-off plots (Figs. 7, 8 and 9) make a strong case for the FCF [4] 2 algorithm.
As an efficient FNFT algorithm can be combined with the recently proposed b-modulation [19], [21], [39] scheme to develop a complete NFT based fiber-optic communication sys- The fourth-order CF [4] 2 algorithm is seen to have gradual increase in error with increase in amplitude similar to the second-order CF [2] 1 algorithm. tem, accurate and fast computation of the scattering coefficient b(λ) is of interest. Hence to test the performance of both the FCF algorithms in computation of b-coefficient, we define where b(λ) is the analytically known andb(λ) is the numerically computed scattering coefficient. For the numerical test we again use the signal from Example 1 as b(λ) is known. We plot the error E b for both the FCF methods for decreasing sampling interval h in Fig. 12. FCF [4] 2 clearly outperforms FCF [2] 1 even after considering the additional computational cost.
From the results of the numerical tests presented in this section it is clear that approximating H(λ) in Eq. 9 using rational functions provides a significant computational advantage. However, as mentioned earlier we have had to restrict ourselves to fourth-order method FCF [4] 2 . To further improve the accuracy and order of convergence while restricting ourselves to a fourth-order method, we explore the possibility of using series acceleration techniques in the next section.

IV. SERIES ACCELERATION
Series acceleration methods are techniques for improving the rate of convergence of a series. In particular, they were used to improve an inverse NFT algorithm for the defocusing case in [15]. Series acceleration was also used to improve the accuracy of the split-step Fourier algorithm [32] commonly used to solve the NSE directly. Here we study the applicability of a specific series acceleration technique known as the Richardson extrapolation [30] to the FCF algorithms.

A. Richardson Extrapolation
Given an r th -order numerical approximation methodρ(λ, h) for ρ(λ) that depends on the step-size h, we can write We assume thatρ(λ, h) has series expansion in h, In Richardson extrapolation [30], we combine two numerical approximationsρ(λ, h) andρ(λ, 2h) as follows, Using the series expansion, we find that the order of the new approximationρ [RE] (λ, h) is at least r + 1: We apply this idea to FCF [2] 1 and FCF [4] 2 to obtain FCF RE and FCF RE [4] 2 algorithms. We remark that series acceleration can also be applied to the slow algorithms in Sec. II-C.

B. Numerical examples
We test FCF RE for all three examples. Since Richardson extrapolation requires us to compute two approximations, which increases the computational complexity, we again evaluate the complexity-accuracy trade-off. We plot the error versus execution time curves for the three examples in the Figs. 13 to 15. In all figures we can see that the FCF RE algorithms achieve slopes of r + 2 rather than the expected slope of r + 1. This is an example of superconvergence [33]. Specifically, the error of FCF RE [2] 1 decreases with slope four and that of FCF RE [4] 2 decreases with slope six. As seen before in Sec. II-D, the arithmetic error starts to dominate after a certain point and causes the error to rise. Although the executions times of FCF RE algorithms are higher, the error achieved is almost an order of magnitude lower even for large h. From the other viewpoint, for the same execution time, the FCF RE algorithms achieve significantly lower errors compared to FCF algorithms. FCF RE [4] 2 outperforms FCF RE [2] 1 in the tradeoff for all the three examples again highlighting the advantage of using higher-order CFQM exponential integrators. These results suggest that Richardson extrapolation should be applied to improve the considered FNFT algorithms. The FCF RE [4] 2 algorithm provides the best trade-off among all the algorithms presented in this paper.

V. COMPUTING EIGENVALUES
Recall that for the case of focusing NSE (κ = 1), the nonlinear Fourier spectrum has two parts: a continuous and a discrete part. In this section, we are concerned with the numerical computation of the discrete part. We first mention some of the existing approaches and then show how one of them can be extended to work with the new fast higher-order  NFT algorithms. We will also show that Richardson extrapolation can be applied to improve the accuracy at virtually no extra computational cost.

A. Existing approaches
Finding the eigenvalues consists of finding the complex upper half-plane roots of the function a(λ). Most of the existing approaches can be classified into four main categories. 1) Search methods: Newton's method.
2) Eigenmethods: Spectral methods based on the solution of a suitably designed eigenproblem [44]. 3) Gridding methods: They find λ k using iterative methods or optimized grid search [13], [44]. Recently a method based on contour integrals was proposed [38]. 4) Hybrid methods: Any combination of the above. Eigenmethods with rougher sampling can e.g. be used to find initial guesses for a search method [4]. Our proposed method will be a hybrid of a eigenmethod and a search method in the spirit of [4], where an eigenproblem is solved to obtain initial guesses for Newton's method.

B. Proposed method
Remember that the discrete spectrum consists of eigenvalues, which are the zeros of a(λ) in the complex upper halfplane (H), and their associated norming constants. We start with discussing an approximation of a(λ) that will be useful for locating the eigenvalues. From Eqs. 3, 4 and 5 we can write Over the finite interval [T − , T + ] using Eq. 9 we can see that Hence we hope that the zeros of H 1,1 (λ) are approximations of the zeros of a(λ) if the signal truncation and discretization errors are small enough. In Sec. III-A we explained how H(λ) can be approximated by a rational function in a transformed co-ordinate z. Hence we can further write whereâ num (z) andâ den (z) are polynomials in z(λ). Let a num (z) =â N z N +â N −1 z N −1 + . . . +â 0 . Thusâ num (z) will have N zeros. These zeros or roots ofâ num (z) can be found using various methods [23]. Of these N zeros, there should be K (typically, K ≪ N ) values that are approximations of zeros of a(λ) in H. Example: We would like to add clarity through a visual representation of the roots. We choose the signal from Example 1 with D = 2 9 . We plot all the zeros ofâ num (z) of FCF [2] 1 with 'x' in Fig. 16. Here z = e iλh . We can then map these zeros back to obtain values of λ. These are plotted with 'x' in Fig. 17. From the definition of discrete spectrum, we can filter out all the values that are not in H. We then filter further and keep only values of λ such that |Re(λ)| < 0.9π/h which is close to a limit imposed by the chosen mapping from z → λ. The filtered roots are plotted in Figs. 17 and 16 with 'o'. For the chosen value ofq = 5.4 the set of eigenvalues is Λ = {3 + 4.9i, 3 + 3.9i, 3 + 2.9i, 3 + 1.9i, 3 + 0.9i}. From Fig. 17 we see that the values marked with 'o' are indeed approximations of the values in set Λ. However, there is no guarantee that we will always be able to locate approximations for all values in Λ as that depends on several factors, some of which are the signal magnitude q o , signal interval [T − , T + ], step-size h and values of the eigenvalues themselves.
For the example chosen in the visual demonstration, the number of zeros is N = 1024 and the number of eigenvalues is K = 5. For the chosen mapping from λ → z, the K values of interest will always lie inside the unit circle in Fig. 16 and most other spurious zeros ofâ num (z) will lie on the unit circle. Even with the best eigenmethods available for polynomial root-finding, which have a complexity of O(N 2 ) [5], execution time grows very steeply, making this approach infeasible for large N . To reduce the complexity, it was suggested in [4] to sub-sample the given signal to reduce the dimensionality of the root-finding problem. The algorithm is summarized below.
Input: Samples q 0 , . . . , q D−1 , D sub , [T − , T + ] Output: Estimated eigenvaluesλ k 1: Subsample the given samples → q sub n := q n⌊D/Dsub⌉ . Build a polynomial approximationâ num (z) using the D sub subsampled samples. Apply a fast eigenmethod to find the roots ofâ num (z). Filter the roots. 2: Refine the roots via Newton's method using all samples.
Filter the refined roots. 3: Apply Richardson extrapolation to the unrefined and refined roots.
We now discuss the three stages of the algorithm in detail.

1) Root finding from subsampled signal
The given signal q n is subsampled to give q sub n with D sub samples. The corresponding step-size is h sub . There are no results for the minimum number of samples that guarantee that all eigenvalues will be found. One choice can be based on limiting the overall computational complexity to O(D log 2 D), which is the complexity for the reflection coefficient. For a root-finding algorithm with O(D 2 ) complexity, we choose to use D sub = round D log 2 D samples. The polynomialâ num (z) is then built from these D sub samples. For FCF [4] 2 , the nonequispaced samples should be obtained from the original D samples and not the D sub samples. An eigenmethod is then used to find all zeros ofâ num (z). We used the algorithm in [5]. The values of z are mapped backed to λ and filtered to remove implausible values.

2) Root refinement using Newton method
The Newton method based on the slow CF methods is used for root refinement. The derivative ∂a(λ)/∂λ is calculated numerically along with a(λ) as in [10] using all the given samples q n . The values of λ that remain after filtering in the previous step are used as the initial guesses for the Newton method. We chose to stop the iterations if the change in value goes below 10 −15 or if a maximum of 15 iterations is reached. The resulting roots are filtered again.

3) Richardson extrapolation
We pair the roots resulting from the Newton step, λ Newton k , with the corresponding initial guessesλ init k . Then, we apply Richardson extrapolation: λ k is then an improved approximation and constitutes the discrete part of the FCF RE algorithm. It may so happen that more than oneλ init k converge to the samê λ Newton The numerical algorithms may not find particular eigenvalues or find spurious ones. LetΛ be the set of approximations found Note that the first term in the outer maximum grows large if an algorithm fails to approximate a part of the set Λ while the second term becomes large if an algorithm finds spurious values that have no correspondence with values in Λ. EΛ is expected to be of order r for an algorithm of order r.

C. Numerical Example
In this section, we compare different variants of our proposed algorithm using Example 1. We compute the error EΛ for the following three types of algorithms: 1) Discrete part of FCF algorithms. An eigenmethod is applied to the approximationâ num (z) built using all samples. No sub-sampling is used.
2) Discrete part of FCF algorithms with sub-sampling. Only steps 1 and 2 of the algorithm mentioned above. 3) Discrete part of FCF RE algorithms. All the three steps mentioned above. To demonstrate the effect of sub-sampling, we show in Fig.  18 the errors for the second-and fourth-order algorithms of types 1 and 2. For h > 0.3 the errors are high either due to failure to find approximations close to the actual eigenvalues or spurious values. For h ≤ 0.3, FCF [4] 2 of type 1 and FCF [2] 1 of type 2 find exactly five values that are close approximations of the values in Λ. However FCF [2] 1 of type 1 and FCF [4] 2 of type 2 find good approximations only for h ≤ 0.06. The error of FCF [2] 1 algorithms decreases with slope two and that of FCF [4] 2 algorithms decreases with slope four as expected from the order of the underlying numerical schemes.
In Fig. 19 we show the errors for the second-and fourthorder algorithms of type 2 and 3 to indicate the advantage of the extrapolation step. The extrapolation step improves the approximation significantly for FCF RE [2] 1 while adding negligible computation cost to the algorithm. However, there is only minor improvement in case of FCF RE [4] 2 over FCF [4] 2 .
In Fig. 20 we plot the execution times for the FCF algorithms of types 1 and 2. The execution times of algorithms of type 3 are almost the same as those of type 2. For type 1 algorithms, these times include the time required to build a num (z) and the time taken by the root-finder. For algorithms of type 2, the additional time required for root-refinement by Newton's method is also included. Even with sub-sampling, we see that the execution times are an order of magnitude higher than the execution times for the continuous part. The FCF RE algorithms seem to provide the best trade-off between accuracy and computation cost similar to the case of continuous part. The overall computational complexity may be decreased by using alternative methods to find the initial guesses.

VI. CONCLUSION
In this paper, we proposed new higher-order nonlinear Fourier transform algorithms based on a special class of exponential integrators. We also showed how some of these algorithms can be made fast using special higher-order exponential splittings. The accuracy of these fast algorithms was improved even further using Richardson extrapolation. Numerical demonstrations showed that the proposed algorithms are highly accurate and provide much better complexity-accuracy trade-offs than existing algorithms. In the future we plan to integrate the algorithms from this paper into the open source software library FNFT [40].   [4] 2 and FCF [4] 2 algorithms for computing eigenvalues of Example 1.