Capacity Lower Bounds of the Noncentral Chi-Channel With Applications to Soliton Amplitude Modulation

The channel law for amplitude-modulated solitons transmitted through a nonlinear optical fiber with ideal distributed amplification and a receiver based on the nonlinear Fourier transform is a noncentral chi-distribution with <inline-formula> <tex-math notation="LaTeX">$2n$ </tex-math></inline-formula> degrees of freedom, where <inline-formula> <tex-math notation="LaTeX">$n=2$ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">$n=3$ </tex-math></inline-formula> correspond to the single- and dual-polarisation cases, respectively. In this paper, we study the capacity lower bounds of this channel under an average power constraint in bits per channel use. We develop an asymptotic semi-analytic approximation for a capacity lower bound for arbitrary <inline-formula> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula> and a Rayleigh input distribution. It is shown that this lower bound grows logarithmically with signal-to-noise ratio (SNR), independently of the value of <inline-formula> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula>. Numerical results for other continuous input distributions are also provided. A half-Gaussian input distribution is shown to give larger rates than a Rayleigh input distribution for <inline-formula> <tex-math notation="LaTeX">$n=1,2,3$ </tex-math></inline-formula>. At an SNR of 25 dB, the best lower bounds we developed are approximately 3.68 bit per channel use. The practically relevant case of amplitude shift-keying (ASK) constellations is also numerically analyzed. For the same SNR of 25 dB, a 16-ASK constellation yields a rate of approximately 3.45 bit per channel use.

have undergone a long process of increasing engineering complexity and sophistication [1]- [3]. However, the key physical effects affecting the performance of these systems remain largely the same. These are: attenuation, chromatic dispersion, fibre nonlinearity due to the optical Kerr effect, and optical noise. Although the bandwidth of optical fibre transmission systems is large, these systems are ultimately band-limited. This bandwidth limitation combined with the ever-growing demand for data rates is expected to result in a so-called "capacity crunch" [4], which caps the rate increase of errorfree data transmission [4]- [7]. Designing spectrally-efficient transmission systems is therefore a key challenge for future optical fibre transmission systems.
The channel model used in optical communication that includes all three above-mentioned key effects for two states of polarisation is the so-called Manakov equation (ME) [7, eq. (1.26)], [8,Sec. 10.3.1]. The ME describes the propagation of the optical field for systems employing polarisation division multiplexing. The ME therefore generalises the popular scalar nonlinear Schrödinger equation (NSE) [6]- [9], used for single-polarisation systems. In both models, the evolution of the optical field along the fibre is represented by a nonlinear partial differential equation with complex additive Gaussian noise. 1 The accumulated nonlinear interaction between the signal and the noise makes the analysis of the resulting channel model a very difficult problem. As recently discussed in, e.g., [10,Sec. 1], [11], [12], exact channel capacity results for fibre optical systems are scarce, and many aspects related to this problem remain open.
Until recently, the common belief among some researchers in the field of optical communication was that nonlinearity was always a nuisance that necessarily degrades the system performance. This led to the assumption that the capacity of the optical channel had a peaky behaviour when plotted as a function of the transmit power. 2 Partially motivated by the idea of improving the data rates in optical fibre links, a multitude of nonlinearity compensation methods have been proposed (see, e.g., [16]- [21]), each resulting in different discrete-time channel models. Recently, a paradigm-shifting approach for 1 The precise mathematical expressions for both channel models are given in Sec. II-A. 2 However, nondecaying bounds can be found in the literature, e.g., in [10] and [13] (lower bounds) and [14] and [15] (upper bounds). This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/ overcoming the effects of nonlinearity has been receiving increased attention. This approach relies on the fact that both the ME and NSE in the absence of losses and noise are exactly integrable [22], [23].
One of the consequences of integrability is that the signal evolution can be represented using nonlinear normal modes. While the pulse propagation in the ME and NSE is nonlinear, the evolution of these nonlinear modes in the so-called nonlinear spectral domain is essentially linear [24], [25]. The decomposition of the waveform into the nonlinear modes (and the reciprocal operation) is often referred to as nonlinear Fourier transform (NFT), due to its similarity with the application of the conventional Fourier decomposition in linear systems [26]. 3 The linear propagation of the nonlinear modes implies that the nonlinear cross-talk in the NFT domain is theoretically absent, an idea exploited in the so-called nonlinear frequency division multiplexing [24], [27]. In this method, the nonlinear interference can be greatly suppressed by assigning users different ranges in the nonlinear spectrum, instead of multiplexing them using the conventional Fourier domain.
Integrability (and the general ideas based around NFT) has also lead to several nonlinearity compensation, transmission and coding schemes [28]- [38]. These can be seen as a generalisation of soliton-based communications [8], [9], [39,Ch. 5], which follow the pioneering work by Hasegawa and Nyu [40], and where only the discrete eigenvalues were used for communication. The development of efficient and numerically stable algorithms has also attracted a lot of attention [41]. Furthermore, there have also been a number of experimental demonstrations and assessments for different NFT-based systems [33]- [38]. However, for systems governed by the ME, the only results available come from the recent theoretical work of Maruta and Matsuda [32].
Two nonlinear spectra (types of nonlinear modes) exist in the NSE and the ME. The first one is the so-called continuous spectrum, which is the exact nonlinear analogue of the familiar linear FT, inasmuch as its evolution in an optical fibre is exactly equivalent to that of the linear spectrum under the action of the chromatic dispersion and the energy contained in the continuous spectrum is related to that in the time domain by a modified Parseval equality [26], [31]. The unique feature of the NFT is, however, that apart from the continuous spectrum, it can support a set of discrete eigenvalues (the nondispersive part of the solution). In the time domain, these eigenvalues correspond to stable localised multi-soliton waveforms immune to both dispersion and nonlinearity [8]. The spectral efficiency of the multiple-eigenvalue encoding schemes is an area actively explored at the moment [29], [42], [43]. Multi-soliton transmission has also received increased attention in recent years, see, e.g., [44] and [45] and references therein. Finding the capacity of the multieigenvalue-based systems in the presence of in-line noise that breaks integrability still remains an open research problem. If only a single eigenvalue per time slot is used, the problem is equivalent to a well-known time-domain amplitude-modulated soliton transmission system. 4 In this paper, we consider this simple set-up, where a single eigenvalue is transmitted in every time slot. The obtained results are applicable not only to classical soliton communication systems, but also to the novel area of the eigenvalue communications.
Although the set-up we consider in this paper is one of the simplest ones, its channel capacity is still unknown. Furthermore, the only results available in the literature [29], [42], [43], [46]- [49] are exclusively for the NSE, leaving the ME completely unexplored. In particular, previous results include those by Meron et al. [48], who recognised that mutual information (MI) in a nonlinear integrable channel can (and should) be evaluated through the statistics of the nonlinear spectrum, i.e., via the channel defined in the NFT domain. Using a Gaussian scalar model for the amplitude evolution with in-line noise, a lower bound on the MI and capacity of a single-soliton transmission system was presented. The case of two and more solitons per one time slot was also analysed, where data rate gains of the continuous soliton modulation versus an on-off-keying (OOK) system were also shown. A biterror rate analysis for the case of two interacting solitons has been presented in [50]. The derivations presented there, however, cannot be used straightforwardly for information theoretic analysis. Yousefi and Kschischang [29] addressed the question of achievable spectral efficiency for single-and multi-eigenvalue transmission systems using a Gaussian model for the nonlinear spectrum evolution. Some results on the continuous spectrum modulation were also presented. Later in [42], the spectral efficiency of a multi-eigenvalue transmission system was studied in more detail. In [43], the same problem was studied by considering the correlation functions of the spectral data obtained in the quasi-classical limit of large number of eigenvalues. Achievable information rates for multi-eigenvalue transmission systems utilising all four degrees of freedom of each scalar soliton in NSE were analytically obtained in [46]. These results were obtained within the framework of a Gaussian noise model provided in [29] and [47] (non-Gaussian models have been presented in [51] and [52]) and assuming a continuous uniform input distribution subject to peak power constraints. The spectral efficiency for the NFT continuous spectrum modulation was considered in [53]- [55]. Periodic NFT methods have been recently investigated in [56].
In [49], we used a non-Gaussian model for the evolution of a single soliton amplitude and the NSE. Our results showed that a lower bound for the capacity per channel use of such a model grows unbounded with the effective signal-to-noise ratio (SNR). In this paper, we generalise and extend our results in [49] to the ME. To this end, we use perturbationbased channel laws for soliton amplitudes previously reported in [51] and [52] (for the NSE) and [57] (for the ME). Both channel laws are a noncentral chi (χ) distribution with 2n degrees of freedom, where n = 2 and n = 3 correspond to the NSE and ME, respectively. Motivated by the similarity of the channel models mentioned above, in this paper we study asymptotic lower bound approximations on the capacity (in bit per channel use) of a general noncentral chi channel with an arbitrary (even) number of degrees of freedom. To the best of our knowledge, this has not been previously reported in the literature. Similar models, however, do appear in the study of noise-driven coupled nonlinear oscillators [58].
The first contribution of this paper is to numerically obtain lower bounds for the channel capacity for three continuous input distributions, as well as for amplitude shift-keying (ASK) constellations with discrete number of constellation points. For all the continuous inputs, the lower bounds are shown to be nondecreasing functions of the SNR under an average power constraint. The second contribution of this paper is to provide an asymptotic closed-form expression for the MI of the noncentral chi-channel with an arbitrary (even) number of degrees of freedom. This asymptotic expression shows that the MI grows unbounded and at the same rate, independently of the number of degrees of freedom.

A. The Propagation Equations
The propagation of light in optical fibres in the presence of amplified spontaneous emission (ASE) noise can be described by a stochastic partial differential equation which captures the effects of chromatic dispersion, nonlinear polarisation mode dispersion, optical Kerr effect, and the generation of ASE noise from the optical amplification process. Throughout this paper we assume that the fibre loss is continuously compensated along the fibre by means of (ideal) distributed Raman amplification (DRA) [59], [60]. In this work we consider the propagation of a slowly varying 2-component envelope E( , τ ) = [E 1 ( , τ ), E 2 ( , τ )] ∈ C 2 over a nonlinear birefringent optical fibre, where τ and represent time and propagation distance, respectively. Our model also includes the 2-component ASE noise N( , τ ) = [N 1 ( , τ ), N 2 ( , τ )] due to the DRA. We also assume a uniform change of polarised state on the Poincaré sphere [61].
The resulting lossless ME is then given by [7, where the retarded time τ is measured in the reference frame moving with the optical pulse average group velocity, E ≡ E( , τ ) represents the slowly varying 2-component envelope of electric field, β 2 is the group velocity dispersion coefficient characterising the chromatic dispersion, and γ is the fibre nonlinearity coefficient. The pre-factor 8/9 in (1) comes from the averaging of the fast polarisation rotation [8, Sec. 10.3.1], [61]. For simplicity we will further work with the effective averaged nonlinear coefficient γ * 8γ /9 5 Throughout this paper, vectors are denoted by boldface symbols x = [x 1 , x 2 , x 3 , ...], while scalars are denoted by nonboldface symbols. The scalar product is denoted by · , · , and over-bar denotes complex conjugation. The Euclidean norm is denoted by x 2 |x 1 | 2 + |x 2 | 2 + . . .. The partial derivatives in the partial differential equations are expressed as subscripts, e.g., when addressing the ME. In the case of a single polarisation state, the propagation equation above reduces to the lossless generalised scalar NSE [6], [9] In this paper we consider the case of anomalous dispersion (β 2 < 0), i.e., the focusing case. In this case, both the ME in (1) and the NSE in (2) permit bright soliton solutions ("particle-like waves"), which will be discussed in more detail in Sec. II-B.
It is customary to re-scale (1) to dimensionless units. We shall use the following normalisation: The power will be measured in units of P 0 = 1 mW since it is a typical power level used in optical communications. The normalised (dimensionless) field then becomes q = E/ √ P 0 . For the distance and time, we define the dimensionless variables z and t as z = / 0 and t = τ/τ 0 , where For the scalar case (2), we use the same normalisation but we replace γ * by γ . Then, the resulting ME reads while the NSE becomes The ASE noise n(z, t) = [n 1 (z, t), n 2 (z, t)] in (4) is a normalised version of N( , τ ), and is assumed to have the following correlation properties (6) with i, j ∈ {1, 2}, with δ i j being a Kronecker symbol, E [·] is the mathematical expectation operator, and δ (·) is the Dirac delta function. The correlation properties (6) mean that each noise component n i (z, t) is assumed to be a zero-mean, independent, white circular Gaussian noise. The scalar case follows by considering a single noise component only.
The noise intensity D in (6) is (in dimensionless units) where σ 2 0 is the spectral density of the noise, with real world units [W/ (km · Hz)]. For ideal DRA, this σ 2 0 can be expressed through the optical fibre and transmission system parameters as follows: σ 2 0 = α fibre K T · hν 0 , where α fibre is the fibre attenuation coefficient, hν 0 is the average photon energy, K T is a temperature-dependent phonon occupancy factor [6].
From now on, all the quantities in this paper are in normalised units unless specified otherwise. Furthermore, we define the continuous time channel as the one defined by the normalised ME and the NSE. This is shown schematically in the inner part of Fig. 1, where the transmitted and received waveforms are x(t) ≡ q(0, t) and y(t) ≡ q(Z , t), respectively, where Z is the propagation distance.

B. Fundamental Soliton Solutions
It is known that the noiseless (n(z, t) = 0) ME (4) possesses a special class of solutions, the so-called fundamental bright solitons. 6 In general, the Manakov fundamental soliton is fully characterised by 6 parameters [57] (4 in the NSE case): frequency (also having the meaning of velocity in some physical applications), phase, phase mismatch, centreof-mass position, polarisation angle, and amplitude (the latter is inversely proportional to the width of the soliton). In this paper we consider amplitude-modulated solitons, and thus, no information is carried by the other 5 parameters. The initial values of these 5 parameters can therefore be set to arbitrary values. In this paper, all of them have been set to zero. For the initial frequency, this can be further motivated to avoid deterministic pulse walk-offs. As for the initial phase, phase mismatch, and centre-of-mass position, as we shall see in the next section, their initial values do not affect the marginal amplitude channel law. Under these assumptions, the soliton solution at z = 0 is given by [57], [62] where A is the soliton amplitude and 0 < β 0 < π/2 is the polarisation angle. The value of β 0 can be used to control how the signal power is split across the two polarisations.
For any β 0 , the Manakov soliton solution after propagation over a distance Z with the initial condition given by (8), is expressed as The soliton solution for the NSE in (5) can be obtained by using β 0 = 0 in (8)-(10), 7 which gives and As shown by (10) and (12), the solitons in (8) and (11) only acquire a phase rotation after propagation. When the noise is not zero, however, these solutions will change. This will be discussed in detail in the following section.

A. Amplitude-Modulated Solitons: One and Two Polarisations
We consider a continuous-time input signal where and k is the discrete-time index. Motivated by the results in Sec. II-B, the pulses s k (t) are chosen to be where T s is the symbol period. In principle, it is also possible to encode information by changing the polarisation angle β 0 from slot to slot. However, in this paper, we fix its value to be the same for all the time slots corresponding to a fixed (generally elliptic) degree of polarisation. Thus, the transmitted waveform corresponds to soliton amplitude modulation, which is schematically shown in Fig. 2 for the scalar (NSE) case. At the transmitter, we assume that symbols X k are mapped to soliton amplitudes A k via A k = X 2 k . This normalisation is introduced only to simplify the analytical derivations in this paper. To avoid soliton-to-soliton interactions, we also assume that the separation T s is large, i.e., exp(−A k T s ) 1, ∀k. The receiver in Fig. 1 is assumed to process the received waveform during a window of T s via the forward NFT [22], [32] and returns the amplitude of the received soliton, which we denoted by R k = Y 2 k . Before proceeding further, it is important to discuss the role of the amplitudes A k on a potential enhancement of solitonsoliton interactions. The interaction force prefactor is known to scale as the amplitude cubed [8, Ch. 9.2], [9,Ch. 5.4]. However, the interaction also decays exponentially as exp(−A k T s ). This exponential decay dominates the interaction, and thus, considering very large amplitudes (or equivalently, very large powers, as we will do later in the paper), is in principle not a problem. At extremely large amplitudes, however, the model used in this paper is invalid for different reasons: higher order nonlinearities should be taken into account. This includes stimulated Brilloin scattering (for very large powers) or Raman scattering (for very short pulses). Studying these effects is, however, out of the scope of this paper.
We would also like to emphasise that for a fixed pulse separation T s , the channel model we consider in this paper is not applicable for low soliton amplitudes. This is due to two reasons. The first one is that for low amplitude solitons, the perturbation theory used to derive the channel law becomes inapplicable as the signal becomes of the same order as noise. Secondly, low amplitude solitons are also very broad, and thus, nonnegligible soliton interactions are generated. These two cases can be overcome if the soliton amplitudes are always forced to be larger than certain cutoff amplitudeâ, which we will now estimate. For the first case (noise-limited), the thresholdâ noise is proportional to σ 2 N . In the second case (interaction-limited), the threshold is proportional to the symbol rate, i.e.,â inter ∝ T −1 s . This shows that for fixed system parameters, the thresholdâ = max{â noise ,â inter } is a constant. The implications of this will be discussed at the end of Sec. IV.
Having defined the transmitter and receiver, we can now define a discrete-time channel model, which encompasses the transmitter, the optical fibre, and the receiver, as shown in Fig. 1. Due to the assumption on solitons well-separated in time, we model the channel as memoryless, and thus, from now on we drop the time index k. This memoryless assumption is supported by additional numerical simulations we performed, which are included in Appendix A. Nevertheless, at this point it is important to consider the implications of a potential mismatch between the memoryless assumption of the model and the true channel in the context of channel capacity lower bounds. In particular, if in some regimes (e.g., low power or large transmission distances) the memoryless assumption would not hold, considering a memoryless channel model would result in approximated lower bounds on the channel capacity. Provable lower bounds can be obtained by using mismatched decoding theory [63] (as done in [64, Sec. III-A and III-B]) or by considering an average memoryless channel (as done in [6, Sec. III-F]). Although both approaches can in principle be used in the context of amplitude-modulated solitons, they both rely on having access to samples from the true channel, and not from a (potentially memoryless) model. Such samples can only be obtained through numerical simulations or an optical experiment, which is beyond the scope of this paper. In this context, the channel capacity lower bounds in Sec. IV, should be considered as a first step towards more involved analyses.
The conditional probability density function (PDF) for the received soliton amplitude R given the transmitted amplitude A was obtained in [57, eq. (15)] using standard perturbative approach and the Fokker-Planck equation method. The result can be expressed as a noncentral chi-squared distribution where is the normalised variance of accumulated ASE noise, and I 2 (·) is the modified Bessel function of the first kind of order two. The expression in (15) is a noncentral chi-squared distribution with six degrees of freedom (see, e.g., [65, eq. (29.4)]) providing non-Gaussian statistics for Manakov soliton amplitudes. By making the change of variables Y = √ R, and using X = √ A, the PDF in (15) can be expressed as which corresponds to the noncentral chi-distribution with six degrees of freedom. An extra factor 2y before the exponential function comes from the Jacobian. For the NSE, it is possible to show that the channel law becomes [49], [51], [52] which corresponds to a noncentral chi-distribution with four degrees of freedom. We note that although in this paper we only consider an amplitude modulation A k (or in the NFT terms the imaginary part of each discrete eigenvalue), it is possible to include other discrete degrees of freedom corresponding to various soliton parameters in (14) in order to improve the achievable information rates. This is, however, beyond the scope of this paper. Furthermore, the channel models presented in this section were obtained via a perturbative treatment, and thus, in the context of soliton/eigenvalue communications they are technically valid only at high SNR. 8 Despite that, in the current paper we will also study capacity lower bounds of a general noncentral chi-channel with an arbitrary number of degrees of freedom at any range of SNR. While admittedly the low-SNR region is currently only of interest when n = 1 (noncoherent phase channel) we believe its generalisation for n > 1 can still be of interest for the new generation of nonlinear optical regeneration systems.

B. Generalised Discrete-Time Channel Model
The results in the previous section show that both scalar and vector soliton channels can be modelled using the same class of the noncentral chi-distribution with an even number of degrees of freedom 2n, with n = 2, 3. The simplest channel of this type corresponds to n = 1, which describes a fibre optical communication channel with zero-dispersion [13] as well as the noncoherent phase channel studied in [66] (see also [67]). Motivated by this, here we consider a general communication channel described by the noncentral chi-distribution with an arbitrary (even) degrees of freedom 2n. Although we are currently not aware of any physically-relevant communication system that can be modelled with n ≥ 4, we present results for arbitrary n to provide an exhaustive treatment for channels of this type.
The channel in question is therefore modelled via the PDF corresponding to noncentral chi-distribution with n ∈ N and where N {1, 2, 3, . . .}. This channel law corresponds to the following input-output relation where {N i } 2n i=1 is a set of independent and identically distributed Gaussian random variables with zero mean and variance σ 2 N . The above input-output relationship is schematically shown in Fig. 3, which particularises to (17) and (18), for n = 3 and n = 2, respectively.

IV. MAIN RESULTS
In this section, we study capacity lower bounds of the channel in (19). We will show results as a function of the effective SNR defined as ρ σ 2 S /σ 2 N , where σ 2 S is the second moment of the input distribution p X and σ 2 N is given by (16). The value of σ 2 S also corresponds to the average soliton amplitude, i.e., σ 2 . It can be shown that for given system parameters, the noise power (in real world units) is constant and proportional to σ 2 N , and the signal power (in real world units) is proportional to σ 2 S . The parameter ρ therefore indeed corresponds to an effective SNR. 8 More precisely, when the total soliton energy in the time slot is much greater than that of the ASE noise. As previously explained, the inter-symbol interference due to pulse interaction can be neglected due to the large enough soliton separation assumed, and thus, the channel can be treated as a memoryless (see Appendix A for more details). The channel capacity, in bits per channel use, is then given by [68], [69] where are the output and conditional differential entropies, respectively. The optimisation in (21) is performed over all possible statistical distributions p X (x) that satisfy the power constraint. In our case this constraint corresponds to a fixed second moment of the input symbol distribution or, equivalently, to a fixed average signal power in a given symbol period.
The exact solution for the power-constrained optimisation problem (21) with the channel law (19) is unknown. For the noncentral chi-distribution with 2 degrees of freedom (i.e., to the noncoherent additive noise channel), it was shown [66] that the capacity-achieving distribution is discrete with an infinite number of mass points. To the best of our knowledge, that proof has not been extended to higher number of degrees of freedom, however, we expect that will be the case for (19) too.
In this paper, we do not aim at finding the capacityachieving distribution, but instead, we study lower bounds on the capacity. We do this because the capacity problem is in general very difficult, but also because of the relevance of having nondecreasing lower bounds on the capacity for the optical community. To obtain a lower bound on the capacity, we will simply choose an input distribution p X (x) (as done in, e.g., [5], [49]). Without claiming the generality, we, however, consider four important candidates for the input distribution. First, following [49], we use symbols drawn from a Rayleigh distribution As we will see later, this input distribution is not the one giving the highest lower bound. However, it has one important advantage: it allows some analytical results for the mutual information. The other three distributions are considered later in this section as numerical examples. The next two Lemmas provide an exact closed-form expression for the conditional differential entropy h Y |X (ρ) and an asymptotic expression for the output differential entropy h Y (ρ).
Lemma 1: For the channel in (19) and the input distribution (24) where ψ(x) d log (x)/dx is the digamma function and (α, 1, n) is the special case of the Lerch transcendent function [70, eq. (9.551)] The function F n (ρ) is defined as and K n (x) is the modified Bessel function of the second kind of order n. Proof: See Appendix B. Lemma 2: For the channel in (19) and the input distribution (24) Proof: See Appendix C. The next theorem is one of the main results of this paper. Theorem 1: The MI for the channel in (19) and the input distribution (24) admits the following asymptotic expansion (23) (numerically calculated) for the chidistribution with different degrees of freedom and the channel model (19). The asymptotic estimate given by Theorem 1 is also shown. Lower and upper bounds for n = 1 are also shown.
Proof: We expand the function F n (ρ) in (27) defining the conditional entropy in Lemma 1. At fixed large ρ the integrand asymptotically decays as exp (−ξ/2ρ), i.e., with small decrement (which can be proven by a standard large argument asymptotes of the Bessel functions). This means that the main contribution to the integral comes from the asymptotic region 1 ξ ρ in most part of which the large argument expansion of both Bessel functions is indeed justified. Using it uniformly we obtain which used in (25) gives the asymptotic expression The proof is completed by combining (30) and (28) with (23). The result in Theorem 1 is a universal and n-independent expression. The expression in (29) shows that the capacity lower bound is asymptotically equivalent to half of logarithm of SNR plus a constant which is order-independent. Fig. 4 shows the numerical evaluation of I X,Y (ρ) for n = 1, 2, 3, 12 obtained by numerically evaluating all the integrals in the exact expressions for the conditional and output entropies in (25) and (53), as well as the asymptotic expression in Theorem 1. Interestingly, we can see that even in the medium-SNR region, the influence of the number of degrees of freedom on the MI is minimal, and the curves are quite close to each other. In this figure, we also include the lower and upper bounds for n = 1 given by [67, eq. (21)] and [66, eq. (41)], resp. These results show that the asymptotic results in Theorem 1 correctly follow these two bounds. The main reason for considering a Rayleigh input distribution was that it yields a semi-analytical lower bound on the the capacity. In the following example, we consider three other input distributions and numerically calculate the resulting MI.
Example 1: Consider the geometric (exponential), half-Gaussian, and Maxwell-Boltzmann distributions given by and respectively. The MIs for these three distributions for n = 1, 2, 3 are shown in Fig. 5 and show that the lower bound given by the geometric input distribution in (31) displays high MI in the low SNR regime (ρ < 10 dB), whereas the half-Gaussian input distribution in (32) is better for medium and large SNR. On the other hand, the Maxwell-Boltzmann distribution in (33) gives the lowest MI for all SNR. Numerical results also indicate that all the presented MIs asymptotically exhibit an equivalent growth irrespective of the number of the degrees of freedom 2n.
The following example considers the use of discrete constellations. In particular, we assume that the soliton amplitudes take values on a set X {x 1 , . . . , x M }, where M |X | = 2 m is the cardinality of the constellation, and m is a number of bits per symbol. The MI (23) in this case can be evaluated as where we assumed the symbols are equally likely. Example 2: Consider ASK constellations X = {0, 1, . . . , M − 1} with m = 1, 2, 3, 4 and second moment σ 2 S , which correspond to OOK, 4-ASK, 8-ASK, and 16-ASK, respectively. The MI numerically evaluated for these constellations is shown in Fig. 6 for chi-channel with n = 1, 2, 3. As a reference, in this figure we also show (black lines) the MI for the (continuous) half-Gaussian input distribution. The results in this figure show that in the low SNR regime, the use of binary modulation is in fact better than the half-Gaussian distribution. This can, however, be remedied by using a geometric distribution, which, as shown in Fig. 5, outperforms the half-Gaussian distribution in the low SNR regime. In the high SNR regime, however, this is not the case.
Finally, let us address the impact of the cutoffâ we introduced in Sec. III. All our results for continuous input distributions have been obtained for the input distributions that are not bounded away from zero (see (24), (31)-(33)). Therefore, symbols X k are generated below the threshold x = √â , where the channel law considered in this paper does not hold. We shall now only consider here the case of the Rayleigh input (24) as this distribution was used to obtain the main result of this section. We will prove that in the highpower (i.e., high SNR) regime, the effect of the cutoff on the achievable data rate tends to zero. To do so, we note that for fixed fibre parameters and propagation distance, the cutoff x 2 =â = max{â noise ,â inter } is also fixed, while σ 2 S = ρσ 2 N grows linearly with SNR. In other words, one can achieve high SNR at the expense of high power solitons for fixed noise variance. One possible way of showing that the effect of the cutoff on the achievable rate is zero as SNR tends to infinity is to consider a transmitter which generates a dummy symbol every time X k ≤x. The value of the thresholdx is message-independent and thus, can be assumed to be known to the receiver which will discard sub-threshold symbols. This allows us to keep the main results of the paper at the expense of a data rate loss (since part of the time, dummy symbols are transmitted). The probability of such "outage" event η is given by an the integral of the input distribution from zero to the threshold. For the Rayleigh input PDF (24) this probability is given by η = 1 − exp −â/σ 2 S (see (64)- (67)). Therefore asymptotically η(ρ) ≈â/(ρσ 2 N ) → 0 when ρ → ∞. The average rate loss is then given by 1 − η(ρ), which tends to zero as ρ → ∞.
An alternative and more rigorous solution to the problem above is to consider directly the difference between the MI asymptote obtained in the current paper (i.e., Theorem 3) and that obtained by a truncated input Rayleigh distribution which simply does not generate sub-threshold symbols. This difference can be shown to tend to zero as ρ → ∞. This proof is given in Appendix D.

V. CONCLUSIONS
A non-Gaussian channel model for the conditional PDF of well-separated (in time) soliton amplitudes was used to study lower bounds on the channel capacity. Results for propagation of signals over a nonlinear optical fibre using one and two polarisations were presented. The results in this paper demonstrated both analytically and numerically that there exist lower bounds on the channel capacity that display an unbounded growth with the effective SNR, similarly to the linear Gaussian channel. All the results in this paper are given in bit per channel use only, and thus, they should be considered as a first step towards analysing the more practically relevant problem of channel capacity in bit per second per unit bandwidth. This is a considerably more challenging problem, which is left for further investigation.
Apart from the ME soliton channel model this paper also studied lower bounds on the capacity of an abstract general noncentral chi-channel with an arbitrary number of degrees of freedom. Similar channel models appear in the study of relatively general systems of noise-driven coupled nonlinear oscillators [58]. Therefore, we believe that the results for large number of degrees of freedom might also some day find applications in nonlinear communication channels.
The results obtained in this paper for the general noncentral chi-channel are true capacity lower bounds for that channel model. For the case of the application considered in this paper (amplitude-modulated soliton systems), however, the presented analysis was based on a perturbative-based model which holds at high SNR. This model also does not consider potential interaction between solitons, and thus, the results in this paper are limited to solitons well separated in time. Another way of interpreting these results is that the obtained expressions are approximated lower bounds on the capacity of the true channel. Bounds that consider memory effects are left In this section, we present numerical simulations to verify the memoryless assumption for the discrete channel model in Sec. III. To this end, we simulated the propagation of sequences of N = 10 soliton symbols through the scalar waveform channel given by (5). Two launch powers (−1.5 and 1.45 dBm) and two propagation distances (500 km and 2000 km) are considered. The simulations were carried out via the standard split-step Fourier method. The soliton amplitudes were generated as i.i.d. samples from a Rayleigh input distribution (see (24)) and the variance of X was chosen to be 1.25 and 20, so that the resulting soliton waveforms have powers of −1.5 and 1.45 dBm, respectively. The transmitted waveform x(τ ) was created using (13) at a symbol rate of 1.7 GBd. To guarantee an accurate simulation, the timedomain samples were taken every 4.6 ps and the step size was 0.1 km. White Gaussian noise was added at each step to model the ideal DRA process. The simulation parameters are similar to those used in [44] and are summarised in Table I. Fig. 7 shows the waveforms before and after propagation through the channel given in (5). As expected, the received signal is a noisy version of the transmitted waveform, where the noise increases as the propagation distance increases. These results show that doubling the transmission distance and/or (approximately) doubling the launch power has very little effect in the soliton shapes.
The noisy waveforms shown in Fig. 7 were then used to obtain soliton amplitudes Y [Y 1 , Y 2 , . . . , Y 10 ] via the forward NFT. Each amplitude is obtained by processing the corresponding symbol period via the spectral matrix method [28,. To test the memoryless assumption, we perform a simple correlation test. In particular, we consider the normalised output symbol correlation matrix, whose entries are defined as The obtained correlation matrices are shown in Fig. 8, where statistics were gathered by performing 10 3 Monte-Carlo runs The MI is invariant under a simultaneous linear re-scaling of the variables x → x/σ N and y → y/σ N . For notation simplicity, and without loss of generality, throughout this proof we thus assume σ 2 N = 1. Furthermore, we study the conditional entropy as a function of ρ = σ 2 S and all the results will be given in nats.
We express the conditional differential entropy as where (37) follows from (19). In what follows, we will compute the 5 expectations in (37). The third and fourth terms in (37) can be readily obtained using (24) To compute the second and fifth terms in (37), we first calculate the output distribution as where the joint distribution p X,Y (x, y) can be expressed using (19) and (24) as with and where (41) can be obtained using a symbolic integration software. Using (41), we obtain (using a symbolic integration software) where ψ(n) is the digamma function, (α, 1, n) is given by (26). The second moment of the output distribution is obtained directly from the channel input-output relation (20), yielding Substituting (38), (39), (44) and (45) into (37), we have where The last step is to compute the term h (6) Y |X (ρ), which using (42) can be expressed as We then make the change of variables ξ = 2x y, η = y 2 , with the Jacobian ∂(x, y)/∂(ξ, η) = (4y 2 ) −1 , yielding The integration over η can be performed analytically, yielding where K n (x) is the modified Bessel function of the second kind of order n. Using (50) in (49) gives The proof is completed by using (52) in (46), the definition of α in (43), and by returning to logarithm base 2.
APPENDIX C PROOF FOR LEMMA 2 From (41), it follows that the output entropy can then be expressed as 9 where α is given by (43), where p Y (y) is given by (41) and Notice that from its definition it follows that the function f (z) is confined to the interval 0 ≤ f (z) ≤ 1. We shall now prove that h  9 Similarly to Appendix B, the results in this proof are in nats.
Next, one notices that h (4) Y (ρ) is positive and can be upperbounded as follows It is therefore only left to prove that the integral converges, i.e., that the constant C is finite. This can be done as follows: where in the second line we have used an inequality −x ln x ≤ (1 − x), x ∈ (0, 1]. Therefore, asymptotically h (4) Y (ρ) decays not slower than 1/ρ.
The asymptotic expression for the output entropy can be written by combining (60), (44), (45) and (53), which yields The proof is completed by returning to logarithm base 2.

APPENDIX D PROOF OF THE ASYMPTOTICALLY VANISHING RATE LOSS
Here we shall prove that an input distribution bounded (truncated) away from zero gives the same results as Theorem 1 in the limit of large average power σ S → ∞. To this end, consider a system where the transmitted amplitudes X are drawn from a Rayleigh distribution with PDF given in (24). Let us now introduce a thresholdx of amplitudes realisations below which our channel law model is expected to be inapplicable. Let us now introduce an alternative system where the symbolsX are drawn from a "truncated" Rayleigh distribution with PDF where H (x −x) is the Heaviside step function, and η is defined as This probability can be expressed as As discussed in Sec. III-A and Sec. IV, the thresholdx is a constant, and thus, lim σ S →∞ η = 0.
To prove that the rate loss tends to zero, we shall prove that or equivalently, and To prove (69), we have the following: The Kullback-Leibler divergence (relative entropy) between the distributions pỸ (y) and p Y (y) is defined as Using the nonnegativity property of the relative entropy together with (79), we obtain Using the fact that the relative entropy is zero if and only if pỸ (y) = p Y (y) almost everywhere [69,Th. 8.6.1], we conclude that (69) is fulfilled since the integrands in the differential entropy integrals differ on a set with measure zero.
Let us now turn to the first conditional differential entropy in (70), for which we have where represents the conditional differential entropy of p Y |X (y|x), and p Y |X (y|x) is given by the noncentral chi-distribution (19). Using (63), the conditional differential entropy hỸ |X can be expressed as The first term on the r.h.s. of (85) tends to the conditional entropy of the untruncated distribution. We shall now prove that the last (integral) term in (85) tends to zero when σ S → ∞. We note that according to (24) the input distribution p X (x) tends to zero uniformly in the interval [0,x] as σ S → ∞. Then, according to the bounded convergence theorem, in order to prove that integral term in (85) is asymptotically vanishing, it is sufficient to prove that the function g(x) remains bounded within the interval [0,x]. We shall do so by providing separate upper and lower bounds for this function.
The upper bound for g(x) can be obtained by considering a relative entropy between the channel law p Y |X (y|x) and an auxiliary distribution p Y (y) supported on [0, ∞). The nonnegativeness of the relative entropy immediately provides an upper bound for the differential entropy (83), namely, Choosing a half-Gaussian distribution p Y (y) = 2/ √ π exp −y 2 immediately gives an upper bound g(x) ≤ E Y 2 − log 2/ √ π . The second moment for the noncentral chi distribution is readily available, e.g., from (20), leading to the following upper bound: Note that this upper bound is bounded inside an arbitrary finite Establishing a lower bound for g(x) is slightly more involved. The first step is to transform the noncentral chi distribution into a noncentral chi-squared distribution by making the following change of variable in the integral (83): z = 2y 2 /σ 2 N . Introducing the additional notation λ = 2x 2 /σ 2 N and n = k/2, where k is a number of degrees of freedom of noncentral chisquared distribution, we obtain with z ∈ [0, ∞). We can now express g(x) in (83) as an average with respect to the noncentral chi-squared distribution: where we have introduced two functions: g (1) (λ), which represents the differential entropy of the noncentral chi-squared distribution p Z | (z|λ), i.e., and g (2) (λ), which stands for minus half of the so-called expected-log, i.e., The motivation for the above transformation stems from the fact that it has been proven in [71] that the noncentral chisquared distribution function (88) is log-concave (i.e., log of p Z | (z|λ) is concave) if the number of degrees of freedom k ≥ 2, i.e., n ≥ 1, which is always the case. On the other hand, the differential entropy of any log-concave distribution function can be lower-bounded as [72,Th. 3] g (1) Finally, let us now provide a lower bound for g (2) (λ) in (92). This can be obtained by applying Jensen's inequality: Aston University, U.K., and since 2012, he has been a Research Associate with the Aston Institute of Photonics Technologies, Aston University. He has authored over 60 journal papers and conference contributions in the fields of nonlinear physics, solitons, nonlinear signal-noise interaction, optical transmission, and signal processing. His current research interests include (but are not limited to) optical transmission systems and networks, nonlinearity mitigation methods, nonlinear Fourier-based optical transmission methods, soliton usage for telecommunications, information theory, and methods for optical signal processing.