Challenges in Estimating the Information Capacity of the Fiber-Optic Channel

Since its early commercial deployment in the late 1980s, optical fiber has evolved to become the predominant carrier of the globe’s communications. Yet, after accommodating the world’s exponentially growing appetite for transmitted data for more than three decades, its ability to continue doing so is being challenged by fundamental factors. In this article, we review these factors and examine their consequences in terms of information capacity. In particular, we review the difficulties that are imposed by the nonlinear nature of fiber-optic transmission on the assessment of the capacity and on the definition of fundamental concepts, such as bandwidth and spectral efficiency. We discuss relevant approximations and regimes of operation in which bounds for the capacity can be effectively assessed while covering a broad range of applications ranging from interdatacenter communications to links spanning transoceanic distances. We relate to a broad variety of transmission schemes and discuss the potential benefits of spatial multiplexing with multimode and multicore fibers. State-of-the-art transmission experiments are also reviewed and compared with theoretical capacity bounds.


I. I N T R O D U C T I O N
For more than three decades, fiber-optic communications have been the unchallenged champion in satisfying the world's exponentially growing appetite for transmitted data [1]. Today, the vast majority of exchanged information passes through an optical fiber somewhere along its journey from source to destination. The bandwidth that is available for communication in optical fibers is larger by many orders of magnitude than in other communications media, such as copper cables and wireless. Today's optical fiber-communications' networks carry well over one exabit (10 18 bit/s/Hz) per second over multiple billions of kilometers of glass fiber, which is wrapped around the globe [1].
When fiber communications were first commercially adopted in the late 1980s, the available bandwidth appeared to be infinite relative to the needs in those days, and the challenges that needed to be met were related to the physical aspects of signal generation and transmission. Since bandwidth was abundant, no efforts were invested into spectrally efficient signaling, and information was typically encoded by directly modulating the source laser power, while the reception was based on the direct detection (DD) of intensity [2], [3]. Characteristic communication rates over standard single-mode fibers (SMFs) that were commercial in those days were of the order of 1 Gb/s, and the typical system reach was below 100 km [2], whereas communications over longer distances required signal reception and retransmission (i.e., electro-optical regeneration).
The first important turning point in the evolution of fiber-communications' systems was the commercialization of erbium-doped fiber amplifiers (EDFAs) [4], [5], [6], which enabled in-fiber optical amplification within a bandwidth of 4 THz around the highest-transparency region of the optical fiber (near 1.5 μm). This invention was accompanied by the development of optical dispersion-compensating modules [7], [8], which lifted the reach limit of fiber-communications' systems making nonregenerated optical transmission over transoceanic distances of 10 000 km possible [9].
The next turning point was the transition to wavelengthdivision-multiplexed (WDM) transmission [10]. In this paradigm, a single fiber is shared by multiple transmitter and receiver pairs, each operating within a uniquely prescribed frequency band. With the adoption of WDM, the data rates transmitted over deployed optical fiber-systems increased at a rate of 100% per year [11], until toward the end of the 1990s, when the available optical bandwidth was essentially exhausted. At that stage, a single fiber is typically supported on the order of 80 WDM channels, each operating at 10 Gb/s in the conventional EDFA band (C-band). The subsequent deployment of the enhanced bandwidth (C + L band) EDFAs led to the doubling of the transmitted data rates, but it became obvious that further increase in throughput requires the replacement of intensity modulation (IM) and detection with more spectrally efficient signaling and detection techniques.
This notion was the driving force behind the transition to coherent communications schemes, which was the next important turning point in the evolution of fibercommunication systems. The shift to coherent was made possible by advancements in high-speed electronics and digital signal processing (DSP) [12] and allowed communications to take advantage of both quadratures of the transmitted electric fields and its two orthogonal polarization components, thereby increasing the number of exploited degrees of freedom by a factor of four [13], [14], [15]. The transition to coherent transmission and detection opened the door to a plethora of communication methods that have been devised over decades in wireless and wire-line radio frequency communications, including the use of advanced modulation formats, and digital compensation for physical propagation phenomena [14]. In particular, it enabled digital compensation of chromatic dispersion after detection [16], which resulted in abandoning inline optical dispersion compensating modules and considerably simplified the link design. 1 Other propagation phenomena, such as polarization-mode dispersion (PMD), which was a major obstacle to increasing data rates in intensity-modulated DD systems [18], are now also routinely mitigated in the digital domain. As a result, coherent communication systems were able to comfortably accommodate the continuing growth in demand until the end of the first decade of this millennium when it became clear that a capacity crunch was becoming imminent [19].
The reasons for this capacity crunch lie in the unique properties of light propagation through optical fibers, in particular, in the fact that, at high optical powers, signal propagation becomes nonlinear, resulting in waveform distortions that hinder data recovery. This type of nonlinearity is unique to glass optical fibers [20], and in its presence, the usual reasoning applied in linear communication channels no longer applies. The nonlinearity of the fiber is responsible for the fact that, with known transmission schemes, there is always a system-specific launch power limit beyond which the communication performance of the system reduces. A significant fraction of research conducted in optical communications over the past decades has focused on the understanding and characterization of the fiber nonlinearity and its consequences, with the attempt of mitigating the nonlinear distortions and pushing the launch power limit farther away.
In the search for schemes that allow continued economically viable growth of data rates that are transmitted over fiber-communication systems, space-division multiplexing (SDM) has established itself as a prominent candidate in the past decade [21]. The principle of SDM is that SMFs are replaced with multimode or multicore fibers in which multiple information streams can propagate simultaneously in orthogonal spatial modes [22], [23], [24], [25], [26], [27], [28], [29]. The idea is to encourage the integration of end-equipment and inline optical components that will allow growth in the information throughput without a corresponding increase in cost, as would be the case when simply deploying parallel systems.
In this article, we discuss the ultimate limits to the rate at which information can be reliably communicated over the fiber-optic channel. In the framework of information theory, this quantity is known as the channel capacity, a concept that was introduced and formulated by Shannon [30] in 1948. While Shannon provided a formal expression for the capacity of a generic channel, applying it to a given physical channel is often a very challenging task, particularly so in the presence of nonlinearities, such as those characterizing fiber-optic transmission. Indeed, in spite of the numerous attempts that have been conducted over the years [31], [32], [33], [34], [35], [36], [37], [38], [39], [40], [41], [42], [43], determining the fiber-optic channel information capacity remains an open problem. Nonetheless, methods for assessing capacity lower bounds have been proposed and studied [44], [45], [46], [47], [48], [49], [50].
In what follows, we review the relevant phenomena involved in fiber-optic transmission and discuss their impact on capacity. We include considerations related to ultralong-haul and under-sea cable systems, on the one hand, and short interdata-center systems, on the other hand. In the process, we discuss the main past reported attempts of assessing the fiber channel capacity and review the state of the art of transmission rates that have been demonstrated experimentally.
We stress that, in this article, we do not account for any practical constraints on coding and modulation. The effect of such constraints on the achievable information rate is extensively treated in [51] and references therein. Other fundamental implications of coding are addressed in [52].
This article is organized as follows. In Section II, we review the essential preliminaries that are needed for understanding the subject. These include the necessary concepts from information theory, on the one hand, and from the theory of fiber-optic propagation, on the other hand. In Section III, we discuss the difficulty of assessing the fiber-optic channel capacity and review some of the reported attempts. In Section IV, we introduce the concept of nonlinear interference noise (NLIN) and discuss its modeling in the context of information capacity. Section V discusses the construction of a lower bound for the nonlinear fiber-optic channel capacity, while the capacity upper bound is reviewed in Section VI. Section VII discusses the effects of polarization-related propagation phenomena on the assessments of the information capacity, and in Section VIII, we provide some concluding remarks regarding the capacity of long-haul transmission systems. Section IX is devoted to reviewing fundamental capacity considerations of direct-detection systems, which are often the solution of choice for short-reach and interdatacenter communications. In Section X, we review capacityrelated aspects in space-division multiplexed transmission in the submarine and terrestrial systems. Finally, Section XI reviews state-of-the-art results in laboratory experiments and commercial systems.

II. E S S E N T I A L P R E L I M I N A R I E S
Initially, the field of optical communications started as a branch of physics engineering, gradually adopting concepts from communications theory, signal processing, and information sciences, to the extent that it has eventually become a truly interdisciplinary science. Individuals interested in understanding fiber communications today need to be simultaneously familiar with concepts of information and communication theories, as well as with the physical properties of optical fibers that govern signal propagation. In order to provide the reader with the essential background, in this section, we introduce the key concepts of information theory, on the one hand, and fiber optic transmission, on the other hand.

A. Information Capacity
Fundamentally, communications' systems contain three essential components: a transmitter that encodes information onto some transmitted physical object, a channel through which the object propagates over distance, and a receiver whose role is to extract the encoded information from the physical object once it arrives. What makes the process of communications nontrivial is the fact that, while propagating through the channel, the transmitted object is affected by distortions and noise. In most cases of interest to modern telecommunications, the physical object onto which information is encoded is the electromagnetic field. Denoting by X(t) the signal that is generated by the transmitter and launched into the channel and by Y (t) the signal that emerges from the channel and impinges upon the receiver, the effect of the channel is described by the conditional probability density of receiving Y (t), given that X(t) was transmitted. As information is usually encoded on the complex envelope of electric fields, the elements of X(t) and Y (t) are complex-valued. Moreover, in such cases, X(t) and Y (t) are vectors whose components represent orthogonal polarizations or propagation modes of the electric field. In what follows, we start from the consideration of scalar complex-valued X(t) and Y (t) signals, leaving the generalization to the multidimensional case for a later stage, after the main results are established.
Formally, in order to relate to the conditional probability of receiving Y (t) given that X(t) is transmitted, the continuous scalar time entities X(t) and Y (t) are represented by vectors X and Y having a finite number of elements. Such representation is always possible when dealing with band-limited signals whose spectrum is constrained to be within a bandwidth B and which are considered within a finite time frame T f . Sampling at a rate that satisfies Nyquist's condition is perhaps the most obvious example of a procedure through which such representation can be achieved. Since the number of degrees of freedom in the continuous-time signals is 2BT f (the factor of two accounts for the two signal quadratures), the dimension of the vectors X and Y is BT f , as each vector element is complexvalued and, therefore, carries two degrees of freedom. The elements of X and Y are referred to as the transmitted and received symbols, respectively, and the effect of the channel is represented by the conditional probability density function P Y |X (y|x). 2 Similarly, the operation of the transmitter is fully characterized by the probability density function PX (x).
One of Shannon's most groundbreaking achievements was to demonstrate that the largest amount of information per symbol that can be reliably communicated between the transmitter and the receiver is given by where E denotes ensemble averaging, and P X|Y (x|y) = P Y |X (y|x)PX(x)/PY (y). This quantity is known as mutual information per symbol, and it is measured in units of bit/s/Hz. The related quantity known as channel capacity is obtained by optimizing the mutual information with respect to the transmitter in the limit of an infinitely long time frame, namely, Notice that, different from some of the existing literature, normalization with respect to BT f is included in our definition of mutual information. In particular, the division by B implies that, in our definition, C is capacity per unit bandwidth, which is equivalent to what is sometimes referred to as spectral efficiency (SE) and often expressed in bit/s/Hz rather than simply in bits, although, of course, the two coincide with each other. In this article, for brevity, we will often simply refer to C as capacity and reserve the term SE for Section XI, where we use it to relate to the information throughput per unit bandwidth that is successfully transmitted in a given experimental system. It should also be noted that, while, in linear channels, the total capacity is given by the product of the capacity per unit bandwidth C and the total bandwidth allocated for transmission, this relation does not hold in the case of nonlinear channels, as will be discussed in Section III.
While (1) and (2) provide a clear path for determining the capacity of a known channel, establishing knowledge of the channel law P Y |X (y|x) is often a dauntingly difficult task, and as we shall see, it is particularly difficult in the case of the fiber-optic channel, which is the focus of this article. A famous and important special case in which an explicit formula for the channel capacity exists is that of the memoryless additive Gaussian noise (GN) channel where the relation between the kth input and output symbols is given by Y k = X k + N k , with the noise elements N k being identically distributed and statistically independent complex-circular zero-mean Gaussian variables. In this case, under the constraint of a given average power, the optimal distribution of each of the symbols X k is Gaussian, and the capacity per unit transmission bandwidth is given by Shannon's famous formula [30] This result can now be generalized to the case where information is encoded simultaneously over multiple modes (such as polarizations or spatial propagation modes in fibers). Under the assumption that the noise is independent and identically distributed in the various signal dimensions, the capacity is achieved by dividing the transmitted energy equally between all dimensions, with the result where M is the number of scalar modes and SNR is the signal-to-noise ratio defined as the ratio between the average signal and noise powers in any of the modes 1) Useful Lower Bound for Capacity: Apart from being attractive from an analytical standpoint, an important practical feature of the Gaussian memoryless channel is that it can be used to obtain a lower bound for the capacity of a generic communications channel, as discussed in [53] and [54]. The idea is based on the notion that white GN has the highest entropy for a given noise power, and in that sense, it is the "noisiest" noise possible. In order to obtain the bound, one constructs an auxiliary memoryless additive GN channel with a scaling factor whose role is to ensure that the auxiliary channel satisfies the same correlation relation E[X * k Y k ] as the original one.
the desired lower bound is obtained by applying Shannon's formula which is identical to (4) and (5), except for the scaling factor c.

B. Fiber-Optic Channel
Optical communications' systems that are used today rely on single-mode optical fibers. These fibers are dielectric waveguides supporting propagation of only one spatial mode with two orthogonal polarizations. The two field polarizations can be-and in fact are-addressed as two scalar communication channels operating simultaneously. Systems of this type are referred to as polarizationmultiplexed (PM).
In the past decade, the use of fibers supporting multiple spatial modes has been proposed and demonstrated in numerous laboratory experiments [22], [23], [24], [25], [26], [27], [28], [29]. In these fibers, the number of simultaneously transmitted communication channels is 2N, where N is the number of spatial modes and where the factor of two accounts for two orthogonal polarizations. Systems of this type are referred to as space-division multiplexed. For simplicity, we focus the fundamental description on the case of PM systems, postponing the generalization to the case of SDM systems to Section X. The transparency window of available optical fibers is centered around the wavelength of 1.5 μm, which is where almost all fiber communications take place. 3 The transmission bandwidth is determined mostly by the amplification bandwidth of available optical amplifiers, and in the most common cases, using EDFAs [4], [5], [6], it is of the order of 4 THz although extended-bandwidth systems in which the amplification bandwidth reach 10 THz also exist and are attracting significant attention [55]. Typically, fiber-optic systems are wavelength-division multiplexed, which means that an array of transmitters is deployed, where each transmitter operates within a unique spectral window. When combined with polarization multiplexing, each transmitter imposes two complex-valued informationcarrying signals onto the two orthogonal polarizations of the generated electric field. A schematic description of a typical WDM system is illustrated in Fig. 1. The signals generated by the various transmitters are combined with the help of an optical multiplexer and launched into the link, which consists of multiple spans of optical fiber separated by optical amplifiers. At the receiver, an optical demultiplexer separates the individual WDM channels and feeds each channel into its corresponding receiver.
Since our focus in this article is on estimates of the fiber-optic channel capacity, we will assume that the transmitters and receivers are ideal. Namely, the transmitter is assumed to be capable of producing any desirable waveform whose spectrum is contained in a specified channel bandwidth B, and the receiver is capable of measuring any waveform that impinges upon it with infinite precision.
Within this framework, the performance of fiber communications is limited solely by the combination of fiber-propagation distortions and amplification noise, to which we refer in more detail in what follows. The length of the fiber spans varies between ∼40 km in legacy transoceanic systems and ∼100 km in terrestrial transmission and some modern transoceanic links (e.g., the Curie cable and the Google-owned Subcom system [56]). As the fiber loss is of the order of 0.2 dB/km, the loss of the individual span is between 8 and 20 dB. The amplifiers placed at the end of each span compensate for this loss. The less desirable, but fundamental property of amplification that is dictated by quantum mechanics, is the generation of noise.
An optical amplifier with power gain G (G ≥ 1) produces complex-circular GN, which is referred to as amplified spontaneous emission (ASE) noise, whose spectral density in each polarization is given by [57] ω0nsp(G − 1) (9) where ω 0 is the central angular frequency of the amplified signal, is the modified Planck's constant, and nsp is the so-called inversion factor, which is equal to 1 for ideal amplifiers and is greater than 1 in all practical cases (in typical amplifiers, nsp is between 1.6 and 2). Although the presence of ω 0 in (9) implies a direct dependence of the noise power on the optical frequency, this dependence is negligible because the range in which ω 0 varies within the amplification bandwidth is very small relative to the central frequency of the amplification band. Therefore, within regions in which the amplification spectrum is reasonably flat (as required in communications' systems), the noise can be legitimately approximated as white.
The unique features of optical communications are related to the propagation properties of optical fibers. Most importantly, fibers are characterized by loss, chromatic dispersion, and nonlinear propagation distortions resulting from the dependence of the refractive index of glass on optical power. These phenomena interact nontrivially with each other and with the amplification noise that accompanies the signal, thereby challenging the process of extracting the transmitted information. The equation that captures these phenomena is known as the Manakov equation [58], [59], [60] where E(z, t) is a complex-valued 2-D column vector, describing the electric field as a function of position and time. As is customary in the description of signal propagation, t is defined with respect to a moving reference frame. 4 The orientation of the vector E is the signal's state of polarization, and the units of E are chosen such that | E| 2 is the optical power. The term α is the fiberloss coefficient, β 2 is the chromatic-dispersion coefficient, and the coefficient γ accounts for the fiber nonlinearity. Equation (10) is derived from the more fundamental coupled Schrödinger equations [60], [61], and it accounts for the fact that propagation in long optical fibers is accompanied by rapid random polarization rotations that average the nonlinear effect. As a result of this averaging, the nonlinearity enters (10) only through the optical power | E| 2 , and hence, its effect is independent of the signal polarization. Yet, we should stress that the Manakov equation does not include other polarization effects, such as PMD and polarization-dependent loss (PDL), to which we refer separately in Sections VII-A and VII-B, respectively.
At low signal powers, where γ| E| 2 is small to the extent that the nonlinear term can be neglected, the propagating field is given by which can be implemented in the frequency domain as where E(z, ω) = Ê ∞ −∞ E(z, t) exp(iωt)dt is the Fourier transform of the complex envelope of the field in the time domain so that ω is the angular frequency. 5 The roles of attenuation and chromatic dispersion become evident when looking at (12), which also shows that chromatic dispersion is an all-pass filter. It is clear that amplifiers can compensate for the effect of attenuation, whereas the effect of dispersion can be undone by applying an all-pass filter with dispersion of the opposite sign. This can be done either in the optical domain, or electronically, by DSP, after coherent detection.
Some insight into the effect of nonlinearity can be gained by solving the Manakov equation in the absence of chromatic dispersion (β 2 = 0), yielding where z eff = [1 − exp(−αz)]/α is known as the effective propagation distance [62]. Equation (13) shows that, in the absence of dispersion, nonlinearity imposes power-dependent phase modulation onto the propagating signal, and thereby, it affects its spectrum, typically spreading it far beyond the original bandwidth. In all practical situations, when both dispersion and nonlinearity are present, the propagation dynamics become more involved, and in general, the evolution of E(z, t) along the fiber can only be evaluated by solving the Manakov equation numerically. Effective numerical methods for solving the Manakov equation are well-known [63], and they are deployed routinely when simulating fiber communications' systems in the process of system analysis 5 Note that the sign convention in the definition of the Fourier transform is consistent with the optical communication literature and is opposite to what is common in some other fields of electrical engineering. Similarly, following the same convention, all analytical expressions in this article contain the angular frequency ω as opposed to the frequency ω/2π. Nonetheless, bandwidths and frequency separations are specified in units of Hz. and design. A relevant and interesting property of this equation and the numerical methods used for its solution is that it can also be solved backward. This property has important fundamental and practical consequences in the operation of the system and in the estimation of its capacity. The idea is that the electric field that is recovered by a receiver positioned at the end of the fiber (z = L) can be digitized and used as the boundary condition for a real-time numerical backward solution of (10), so as to approximate the field at the beginning of the fiber at z = 0. This procedure is known as digital back propagation (DBP) [64], and while it is still absent in commercial systems owing to its high computational complexity, it is occasionally implemented in laboratory systems with off-line processing. The reason why DBP can only approximately reconstruct the input waveform is that it also unavoidably back-propagates amplification noise generated along the system.
In addition to numerical solutions of the Manakov equation, useful approximate analytical solutions that rely on perturbation analysis can be applied in cases of weak or moderate nonlinearity. These will be discussed in Section IV.

III. A S S E S S I N G T H E F I B E R -O P T I C C H A N N E L C A P A C I T Y
Prior to delving into the details of assessing the capacity of the fiber-optic channel, it is useful to set up our expectations based on the signal propagation properties represented in (10). The expected features of the dependence of capacity on the average power of the launched optical signal are illustrated schematically in Fig. 2. At low powers, signal propagation in the fiber is essentially linear. Moreover, since dispersion is an all-pass filter and loss is frequency-independent, the amplification noise remains white Gaussian, and Eq. (4) for the capacity holds. In this regime, by increasing the signal power, the SNR increases accordingly and so does the capacity. However, as the optical power increases to the extent that nonlinearity can no longer be neglected, distortions are introduced both into the signal and into the noise, whose statistical distribution changes eventually. In this regime, the capacity is unknown, and only bounds for it are available. An upper bound is the capacity of the AWGN channel [65], [66], whereas, in the case of the lower bound, multiple system-specific estimates have been reported [44], [45], [46], [47], [48].
In Section IV, we discuss some of these lower bounds. Here, we wish to illuminate the fundamental difficulties of addressing the estimation of the fiber-channel capacity in the regime of nonlinear propagation. One such difficulty is the absence of a closed form relation between the input and output waveforms [except in the special cases represented by (12) and (13)], which implies that an expression for P Y |X (y|x), which is a crucial component in the determination of the capacity in (1) and (2), also does not exist. The second difficulty follows from the fact that, in the Shtaif et al.: Challenges in Estimating the Information Capacity of the Fiber-Optic Channel

Fig. 2. Illustration of the dependence of the fiber-channel capacity on the launched signal power. At low power levels, propagation is linear, and the fiber channel is equivalent to an AWGN channel, whose capacity is given by (4). When the power increases to an extent that nonlinear effects become significant, the capacity is unknown. It is bounded from above by the AWGN capacity, whereas lower bounds are the subject of ongoing research, as discussed in Section V. The flat dashed line relates to the option of clipping the power coupled into the link to the value at which the system performance peaks [67].
presence of nonlinear propagation, bandwidth-another crucial component in the concept of capacity-cannot be properly defined. The information-carrying signal changes its bandwidth in the process of propagation according to a nontrivial pattern that depends on the shape of the launched waveform, its average power, and the physical properties of the link [62]. Curiously, in principle, spectral broadening even allows one to construct schemes in which the capacity of a nonlinear fiber system within the bandwidth allocated to the transmitter exceeds the capacity of a linear channel of that same bandwidth [68] (although the resulting scheme's complexity makes its actual implementation impractical). Nonlinearity also implies interference between spectrally separated information-carrying signals, from which it follows that the information rate that can be realized within a given frequency band depends on what is being transmitted outside of it.
For all of these reasons, information capacity estimates of fiber-communication systems have almost always been performed in the so-called pseudolinear regime [69], [70], [71], in which nonlinearity can be treated as a small perturbation to a mostly linear propagation. Even the early works by Mitra and Stark [31] and Turitsyn et al. [37], which do not formally limit their analysis to weak nonlinearities, do not hold at very high powers since they too ignore the spectral broadening that characterizes this regime.
The most influential study dealing with the assessment of capacity in the practical context of fiber communication systems came out of Bell-Labs in 2008 [35] and 2010 [34], and outlined the broad framework that has been used in almost all capacity estimates ever since. This framework focuses on the practical setting of WDM transmission in which one of the channels is declared as the channel of interest (COI), whereas all other channels are referred to as interfering channels (ICs). The assumption is that different WDM channels are transmitted and received independently so that the receiver of the COI has no knowledge of the data transmitted in the other channels. Finally, the COI is assumed to be ideally back-propagated so that nonlinear distortions that are experienced in its propagation are eliminated. We emphasize that, since the entire system is assumed to operate in the pseudolinear regime, the difference between the input and output bandwidths is negligible.
The relevance of this framework of study follows from the fact that the vast majority of fiber-communication systems deployed in the past two decades are indeed operated in the regime of pseudolinear propagation. The system studied in [34] also assumed that chromatic dispersion is compensated for only at the receiver (as opposed to being compensated in a distributed fashion at the amplification sites), but this assumption is not critical to the validity of the analysis, except that it extends the range of signal powers for which the assumption of pseudolinearity holds. Nonetheless, most modern communications' systems are of this type. In this framework, the interference that the ICs impose on the COI is experienced as noise, to which we refer as NLIN. The NLIN comes on top of the ASE noise that is contributed by amplification. The approach in [34] was to evaluate the channel distribution numerically. To this end, the authors considered discrete constellation points arranged on multiple concentric rings in the complex constellation space.
The number of rings and the number of points in each ring were chosen to be such that, in the absence of nonlinearity, the capacity of the system was well approximated by (4) for the considered range of SNRs. The numerical evaluation of the channel distribution in the presence of nonlinearity, which was performed by means of Monte Carlo simulations, revealed that, for each inputsymbol energy, P Y |X (y|x) could be well approximated by a bivariate Gaussian distribution, specified by the best-fitting covariance matrix. It was also evident from the results of [34] that the covariance matrix of the NLIN was signal dependent, and in particular, it appeared as containing a visible contribution of phase noise (e.g., see [34, Fig. 2]). As the study in [34] was concerned with assessing capacity, the system operating conditions were idealized in various ways. In particular, the study assumed ideal distributed amplification. 6 Furthermore, in order to alleviate the computational effort, this study was performed for a scalar (single polarization) case, as opposed to PM transmission. Essiambre et al. [34], [35] provided an important data point in the study of the fiber-optic channel capacity as they reported a lower bound for capacity that was higher by approximately a factor of two than the data rates observed experimentally around that time (see [34,Fig. 38]).

IV. M O D E L I N G T H E N O N L I N E A R I N T E R F E R E N C E N O I S E
As noted earlier, empirical evidence suggests that the NLIN can be well approximated as a Gaussian process, and hence, all modeling efforts are aimed at characterizing its second-order statistics. These are the power and correlation properties, both with respect to quadratures and polarizations, and with respect to time. Practically, all analytical studies of the NLIN statistics rely on a perturbation analysis, in the sense that they are accurate only to the first order with respect to the nonlinearity coefficient γ. Furthermore, as fiber communications' systems operate at an SNR that is much higher than unity, the nonlinear mixing between signal and noise is assumed to be negligible. Within this framework, the propagating field is first evaluated under the assumption of purely linear transmission (i.e., setting γ to 0), where the solution is given by (11). We call this the zeroth-order solution and denote it by Then, in first-order with respect to the nonlinearity coefficient γ, the field of the NLIN is obtained by solving the Manakov equation (10) with E (0) (z, t) substituted into the nonlinear term as follows: with ΔE(0, t) = 0. The solution to (14) can be conveniently expressed as where we made use of (11) to express E (0) (z, t). At this point, one may consider two distinct approaches for characterizing the NLIN. One is to proceed in the time domain, as we shall review in what follows, and the other is to derive Δ E(z, ω) by converting (15) to the frequency domain, where the product | is replaced by its Fourier transform (which consists of two convolution integrals), and ∂/∂t is replaced by −iω. The frequency-domain approach is equivalent to describing the NLIN as resulting from four-wave mixing processes involving all of the propagating-signal frequency components. This approach, which was first proposed by Splett et al. [79] in 1993, underpins the development of the well-known GN model, led by the Optcom group of the Politecnico di Torino [80], [81], which is rigorously valid under the assumption of Gaussian modulation. Removing this assumption resulted in the so-called enhanced GN (EGN) model developed in [73] and [82].
A conspicuous disadvantage of the frequency-domain approach is that it makes the extraction of temporal correlations of the NLIN, as well as some of its other temporal features, less transparent. For this reason, in what follows, we concentrate on the time-domain analysis of NLIN, first introduced in [87]. We focus on the case where all WDM channels use linear single-carrier modulation so that the complex signal (15) can be expressed as where the first summation describes the COI and the second represents the ICs. For convenience, the central frequency of the COI is defined as the reference optical frequency, and the central frequencies of the ICs are denoted by Ωm. By the term an, we denote the 2-D column vector whose elements are the complex-valued constellation symbols transmitted in the nth symbol duration over the two polarizations. Similarly, b (m) n represents the data transmitted in the nth symbol time slot over the mth IC. The symbol duration is denoted by T , and g(z, t) = exp(−i(β 2 /2)z(∂ 2 /∂t 2 ))g(0, t) describes the propagation of the fundamental pulse waveform. We assume the usual case where the launched pulse g(0, t) is chosen such that it satisfies the orthogonality condition Ê dtg(0, t − jT )g * (0, t − kT ) = δ j,k , so that, in the absence of propagation-induced distortions, no intersymbol interference (ISI) is present when the signal is received after matched filtering. The field of the NLIN is obtained by substituting (16) into (15). The effect of the NLIN on the received nth constellation point after matched filtering is then given by where L is the system length and where we assume that the sampling time after the matched filter has been chosen ideally. Notice that the use of the propagated matched filter g(L, t) is equivalent to performing dispersion compensation and then matched filtering with respect to the launched waveform g(0, t). While the procedure of expressing Δ an and its statistical properties explicitly in terms of the physical parameters of the problem is somewhat involved, the structure of the solution can be easily anticipated. (15)] and given (16), it is clear that the expression for Δ an contains all of the following triplets. The triplet a † l am an, which represents a single channel nonlinearity involving only the COI. The triplets b n , which represent a two-channel interaction, involving the COI and one of the ICs. The that represents a three-channel interaction involving the COI and two different ICs, and finally, that accounts for a four-channel interaction, where three different ICs produce interference at a fourth frequency that coincides with the COI. In the special case where k = h/2, there are only two different ICs in the latter triplet, and it represents a three-channel interaction. 7 In the jargon of optical nonlinearities, the three-and four-channel interactions belong to the class of four-wavemixing (FWM) processes, whose effectiveness strongly depends on so-called phase-matching conditions [88]. These conditions are strictly fulfilled only in the absence of chromatic dispersion, which is never the case in contemporary fibers. For this reason, three-and four-channel interactions can be safely neglected in the vast majority of cases 8 so that only one-and two-channel interactions need to be taken into account. In this case, the symbol perturbation that is generated by NLIN can be accurately described as follows: The terms S l,m,n and X (k) l,m,n are proportionality coefficients that are obtained by following through 7 All other triplets represent interference fields whose central frequencies coincide with that of other channels that are not the COI. Triplets whose central frequency coincides with channels that are immediately adjacent to the COI may contribute to the NLIN owing to the spectral broadening induced by nonlinearity, as pointed out in [82]. This contribution is very small in high-dispersion links carrying high symbol rates, and we neglect it in this article. Triplets whose central frequency coincides with farther away channels do not contribute at all to the NLIN. 8 The contributions of three-and four-channel interactions to NLIN may not be negligible in dispersion-managed systems, systems with low-dispersion fiber, or low-baud rate systems (typically well below 10 Gbd) [83].
the perturbation analysis procedure presented in (15) and whose expressions can be found in [84]. These coefficients are uniquely determined by the fiber parameters, span lengths, fundamental pulse waveform g(0, t), and the frequencies of the WDM channels. In particular, the two-channel interaction coefficients X (k) l,m,n reduce in magnitude monotonically with the separation between the COI and the IC. This reduction reflects the effect of dispersion, which causes the interacting channels to propagate at different velocities and, thereby, reduces the effectiveness of the nonlinear interaction.
From the standpoint of capacity, the role of singlechannel effects is immaterial because the interference can be removed, at least in principle [50], [64], by means of back propagation, as we discussed earlier. For this reason, we will omit this term in what follows.

A. ISI Model for the NLIN
In order to better appreciate the effect of nonlinear signal distortions on the fiber channel capacity, the NLIN can be conveniently interpreted as ISI. To this end, we define the quantity [85] which allows expressing the second term on the right-hand side of (18), which accounts for two-channel interactions, as a time-dependent ISI process Describing NLIN in terms of (20) is meaningful when the dependence of the ISI coefficient matrices Rn(j) on the time j is slow relative to their dependence on n. The reason for which this is so is that, in the presence of chromatic dispersion, different WDM channels propagate at different velocities, and each symbol of the COI is passed by a large number of symbols in each of the ICs. This implies that adjacent or nearby symbols of the COI typically experience very similar interference. The contribution of the zeroth-order matrix R 0 (j) to NLIN has a unique nature that differs from the nature of the higher-order ISI contributions. Examination of (19) reveals that the matrix R 0 (j) is Hermitian, and therefore, its contribution manifests itself as phase and polarization-rotation noise (PPRN) [86]. In order to see that, note that, within the framework of first-order analysis where exp[iR 0 (j)] is a unitary matrix, and thereby, it only produces a rotation of polarization and phase. An important point here is that the mean value of the PPRN matrix is l−j,l−j,0 = Δθ I (22) which is responsible for the presence of a deterministic phase shift, which is immaterial to the process of detection and needs to be removed in the evaluation of the NLIN variance. Hence, the NLIN variance is obtained in a straightforward, albeit rather tedious procedure, yielding the following result for the case where all channels are characterized by the same average input power P and modulation format [73], [84]: where with b representing a single-polarization constellation point of an IC, and where the averaging is performed over the data constellation points. The first term on the right-hand side of (23) coincides with the GN model result [81], whereas the second term, which was first reported in [73], contains an explicit dependence on the modulation format, and it vanishes in the case of Gaussian whose computation is discussed in [84]. An additional important feature of the NLIN that will be useful in the section that follows is that its components are uncorrelated with each other and, as should be expected from symmetry Similarly, no cross-polarization correlations exist between the NLIN Δ aj and the signal aj, as can be seen by evaluating their correlation matrix using (20) and (22) E Δ aj a † where E[|a| 2 ] = P T /2 is the mean symbol energy per polarization.

V. C A P A C I T Y L O W E R B O U N D
We are now ready to deploy the formalism established in Section II-A1 in order to obtain a lower bound for the capacity of the fiber-optic channel. To this end, consistently with (20), we express the input-output relation of our channel as aj = aj + Δ aj + n j,ASE (29) where aj and aj are associated with the quantities denoted by Y k and X k in Section II-A1, respectively. We now wish to establish an auxiliary additive GN channel aj = c aj + nj (30) whose capacity constitutes a lower bound for that of the actual channel. Formally, one may argue that, since aj and aj are vectors, the arguments presented in Section II-A1 should be generalized to the case of vector symbols, possibly allowing the scaling coefficient c to be a matrix. However, such generalization is unnecessary in our case, owing to the fact that there is no correlation between the polarization components of the vectors involved, and they have identical variances, as implied by (27) and (28). This yields Since the ASE noise n j,ASE is statistically independent of aj and Δ aj, and using (23), it is evident that the denominator in the square parentheses of (32) becomes Moreover, within the perturbation analysis considered here, the term Δθ 2 in the numerator should be neglected so that the capacity lower bound simplifies to the form where is referred to as the effective SNR. An important point that must be stressed at this stage is that, although the concept of effective SNR is meaningful in general, i.e., regardless of the deployed modulation format, its use for extracting a capacity lower bound according to (34) is legitimate only when the modulation is Gaussian, in which case ξ = χ 1 . The reason for this is that the right-hand side of (34) constitutes the capacity of the auxiliary AWGN channel only when the input symbols aj are Gaussian distributed.
As is evident from (34), the capacity lower bound is determined exclusively by the effective SNR, and therefore, in what follows, we concentrate on characterizing this quantity. In particular, in what follows, we will discuss ways of exploiting correlations that are present within the NLIN, so as to reduce ξ (or χ 1 , as we are using Gaussian modulation) and increase the SNR eff . Before that, it is instructive to examine the direct consequences of (35), which are instrumental for setting one's expectations regarding the capacity. Fig. 3 shows a plot of the effective SNR of (35), as a function of the launched average optical power P . The peak value of the effective SNR and the power at which it is obtained is indicated in the figure. A critical feature of the peak effective SNR is that it is proportional to ξ −1/3 , which implies that ξ (and equivalently the NLIN power) needs to be reduced considerably in order to produce a noticeable effect on SNR eff [89], [90]. For example, reducing the NLIN by 3 dB (namely by a factor of 2) results in merely 1 dB of effective SNR improvement. The corresponding increase in the capacity lower bound would be 0.35 bit/s/Hz per single polarization in the high SNR regime (i.e., when SNR eff 1). As is demonstrated in what follows, a 3-dB reduction of ξ on the basis of correlations present in the NLIN process turns out to be highly nontrivial.
We now move to evaluate the potential effective-SNR improvement that can be achieved by taking advantage of correlations present in the NLIN. As investigated in [77], this improvement can be achieved by means of an adaptive equalizer that takes advantage of the slow temporal evolution of the matrices Rn(j), so as to learn the channel and mitigate the effects of ISI. The improvement in performance due to mitigation obviously depends on the temporal correlation properties of the Rn(j) matrices, which are, in turn, dependent upon the system design and most importantly on the dispersion coefficient, frequency separation between channels, baud rate, and the loss/amplification profile. In addition, as can be seen from [77, Fig. 3], the characteristic correlation time is different for different ISI matrices. In particular, the PPRN matrix R 0 (j) makes the largest contribution to NLIN, and therefore, its mitigation is the most beneficial. Luckily, it is also characterized by the longest correlation time of all ISI matrices, and hence, its equalization is less demanding in terms of the required equalizer adaptation speed than the equalization of the other ISI terms. For example, in the case studied in [77], the correlation length of R 0 (j) is approximately 150 symbols, whereas that of R ±1 (j) and R ±2 (j) is approximately one order of magnitude smaller.
The potential effect of ISI compensation can be bounded by removing the corresponding terms in the ISI description of NLIN [see (20)]. To illustrate this, we consider an example of a ten-span link over standard SMF (loss of 0.2 dB/km, β 2 = −21 ps 2 /km, and γ = 1.3 W −1 km −1 ), with a span length of 100 km, assuming a noise figure of 5 dB for the inline EDFAs. The system consisted of 51 WDM Gaussian-modulated channels spaced by 80 GHz with ideal Nyquist pulses, at a symbol rate of 75 Gbaud. In Fig. 4, we plot the effective SNR as a function of the launched average power per channel. The thick curve shows the effective SNR, including all ISI terms, the dashed curve shows the effect of removing the PPRN term proportional to R 0 , the dotted curve shows the effect of removing also the terms proportional to R ±1 , and the dashed-dotted curve shows the case in which also the terms proportional to R ±2 are removed. The gain observed by simply removing the PPRN term is approximately 1 dB, and it increases to 1.8 dB by removing the other terms listed above. Equation (34) implies that the capacity lower bound may be increased accordingly by ∼0. 6 and ∼1.1 bit/s/Hz in the relevant regime of high SNR, respectively. Mitigation of higher ISI orders would improve this result, yet addressing them via adaptive equalization is extremely challenging, given the shortening of the correlation times with the ISI order [77].

VI. C A P A C I T Y U P P E R B O U N D
An important result that was demonstrated in [65] and [66] is that the capacity of a nonlinear system is upper bounded by that of a linear AWGN system operating with the same average input power, i.e., with SNR lin = P/σ 2 ASE . For the optimal power Popt, which is expressed in Fig. 3, the ratio between the two SNRs is given by SNR lin SNR eff,opt = 1 2 1/3 × 0. 53 1.5 (37) so that, in cases where the SNR is much greater than 1, the difference between the upper and lower capacity bounds is ΔC 2 log 2 SNR lin SNR eff,opt 1.17 bit/s/Hz. (38) Notice that this difference is a fixed number, independent of any system parameter or mode of operation, as long as the first-order perturbation analysis remains valid, and the SNR is sufficiently large. Intuitively, this result can be attributed to the fact that improving the tolerance of a system to nonlinear effects allows the transmission of higher signal powers, so that both the lower and upper capacity bounds increase by the same amount.

VII. P O L A R I Z A T I O N E F F E C T S
Throughout the analysis, we have not accounted for the effect of polarization-related phenomena. In SMFs, these are the phenomena of PMD [18], [91], [92] and PDL [93], [94], [95], [96], which we treat separately in what follows.

A. Polarization-Mode Dispersion
PMD results from structural imperfections in the fiber that perturbs its circular symmetry, thereby causing the polarization of different frequency components of the light in the fiber to evolve differently in the process of propagation. Since PMD is a strictly unitary phenomenon (polarizations rotate as a result of it, but they do not decay), fundamentally, it does not have any effect on the channel capacity in the linear propagation regime. Even in practical contexts, with the PMD levels that characterize modern SMFs, digital equalizers that are commonly deployed in fiber-optic receivers are very effective in mitigating the effects of PMD in the case of linear propagation, particularly in view of the slow time dynamics of PMD [97], [98], [99], [100], [101], [102], [103], [104], [105], compared to available signal processing speeds.
The effect of PMD becomes much more involved when its interplay with NLIN needs to be taken into account. It is particularly complicated in the highly nonlinear regime [106], [107], where very little can be said about it. Yet, in the regime of pseudolinear transmission, which characterizes the majority of fiber-communications' systems today, the effect of PMD on NLIN appears to be negligible, as has been pointed out through extensive numerical simulations in [108]. Some physical intuition for this can be gained by considering the process through which NLIN is created. Fundamentally, interchannel NLIN is formed when pulses representing symbols in different WDM channels overlap with one another in time. Owing to the different propagation velocities of different WDM channels in the presence of chromatic dispersion, the temporal overlap between given pulses is maintained over a limited-length section in the fiber, to which we refer as collision length [86]. When PMD scrambles the relative states of polarization of the nonlinearly interfering pulses significantly within the collision length, the effectiveness of the nonlinear process and, consequently, the magnitude of NLIN reduce. The amount of PMD that is required in order to produce this result can be assessed based on the polarization autocorrelation function between two tones with a frequency separation Ω [109], [110], [111] where κ is a PMD coefficient and l is the distance that the signals travel in the fiber. For the relative polarization to be significantly scrambled, the propagation length needs to satisfy l 8 Ω −2 κ −2 PMD . In modern SMFs, κ PMD is of the order of 0.05 ps/ √ km, or less, implying that, for channel separations ranging between 50 and 500 GHz, l ranges between more than 30 000 km and 300 km. These are values that exceed typical collision lengths in relevant fiber communication settings. In the case of farther separated channels, l shortens, and the suppression of NLIN may become more significant, but the combined contribution of all such channels to NLIN is typically negligible. As we show in Section X-B, in the case of fibers used in spatially multiplexed transmission, the effect of modal dispersion (MD) (which generalizes PMD) on the NLIN becomes considerable.

B. Polarization-Dependent Loss
Polarization-dependent loss is another linear polarization phenomenon. It describes a situation in which the attenuation that is experienced by a signal propagating through the fiber depends on its state of polarization [93]. Since, unlike PMD, PDL is not a unitary effect, it implies a fundamental reduction in the channel capacity, as has been studied in [94] and [112].
In modern fiber-communication systems, PDL turns out to be dominated by lumped inline optical elements, such as optical amplifiers, much more than by the transmission fiber itself.
Each PDL element has two orthogonal polarization axes that apply different levels of attenuation to the propagating signal. Since the PDL elements are separated by sections of fiber that randomly rotate the polarization of the signal propagating between them, the PDL axes of different elements can be viewed as randomly rotated relative to one another [93], [113]. The effect of PDL on capacity must be considered separately for the linear and nonlinear propagation regimes. In the regime of linear propagation, PDL leads to the distortion of the propagating waveform, and it also randomizes the received SNR. It is interesting that, in what concerns the quality of communications, the effect of waveform distortion turns out to be totally negligible relative to the effect of SNR randomization, as has been demonstrated in [95]. The randomization of the SNR occurs because the signal does not experience the same PDL as the ASE noise. The signal passes through all of the PDL elements, whereas the noise that consists of ASE contributions produced at the various amplification sites only sees the PDL of elements that are present after the point at which it is generated. 9 This difference introduces randomness into the SNR and, thereby, affects the communication capacity of the system. Since the PDL-induced variations of the SNR are very slow, it is appropriate to treat the channel capacity as random and assess the penalty due to PDL in terms of a power margin that needs to be allocated in order to prevent the system capacity from reducing below the desired value with a prescribed outage probability [115].
As may be expected, the assessment of the impact of PDL becomes much more difficult in the regime of nonlinear propagation. Randomly varying signal powers randomize the generation of NLIN and affect its statistics in a rather nontrivial manner. This problem has been addressed in a recent study by Serena et al. [116], where the GN model is extended so as to include the effects of PDL.

VIII. C O N C L U D I N G R E M A R K S O N L O N G -H A U L S Y S T E M S C A P A C I T Y
Since, fundamentally, the regime of long-haul transmission is limited by nonlinear propagation phenomena, a rigorous capacity assessment is unlikely to be achievable in any practical setting.
It seems that the only known tractable analyses are the ones relying on the first-order perturbation approach. Within this framework, one can treat the effect of nonlin-ear propagation in terms of NLIN and obtain a useful lower bound for the fiber channel capacity. This lower bound is the capacity of an AWGN channel, where the noise power is the sum of the amplification noise power and the NLIN power obtained from the simplest GN model [81]. As discussed in Section VI, this lower bound is always 1.17 bit/s/Hz below the capacity upper bound in (36), which is the capacity of a linear system operating at the same optical power that is optimal in the nonlinear system [65], [66].
Since the nature of NLIN is similar to that of slowly varying ISI, the NLIN power can be reduced by means of ISI mitigation techniques that leverage the temporal correlations existing in the NLIN process. The potential benefit of ISI compensation depends on the specific system settings, and in a characteristic example considered in Section V, ideal zeroth-order ISI (PPRN) mitigation was shown to increase the capacity lower bound by up to 0.6 bit/s/Hz. An increase of ∼1.1 bit/s/Hz was shown to be achievable when the five lowest ISI orders are mitigated.
Finally, it should be noted that the modeling of the NLIN reviewed in this article relies on the use of the Manakov equation, which is valid for transmission bandwidths of the order of a few terahertz, covering the case of C-band systems. However, the NLIN analysis becomes more involved in the case where high-throughput transmission is pursued by extending the transmission bandwidth beyond the conventional C-band. In this regime, the Manakov equation needs to be supplemented with additional terms accounting for stimulated Raman scattering [61], which manifests itself in the form of power transfer from high-frequency channels to low-frequency channels. The extent of this power transfer increases with the frequency separation between the ICs, peaking at a separation of approximately 13 THz. As a result, in ultrawideband systems, the Raman scattering modifies the z-dependence of the individual channels' powers, thereby modifying the formation of the NLIN. The extension of the NLIN model accounting for Raman scattering is discussed in [117], [118], and [119].

IX. D I R E C T-D E T E C T I O N S Y S T E M S
In short optical links, system design is usually dominated by cost-related considerations, in which case the assessment of information capacity can only be meaningful when accounting for specific cost and complexity constraints.
In what follows, we outline the consideration of the capacity in systems constrained to the use of direct detection, which, in short-reach transmission, is typically preferred over the more costly alternative of coherent detection. This is often the situation in the case of data-center interconnecting links, which constitutes an increasingly significant fraction of the global fiber-optic network. Typically, datacenter interconnects (DCIs) extend over a few tens to 100 km. Like in coherent systems, noise is dominated by optical amplification, but the noise in the received power waveform is neither additive nor Gaussian, and hence, the capacity expression (4) does not apply. Interestingly, the actual capacity is only 1 bit lower than that of a coherent system [38], [120], provided that the bandwidth of the intensity receiver is twice as large as that of the coherent receiver, 10 and electrical receiver noise is negligible. Otherwise, if the bandwidth is the same as that of the coherent system, the capacity reduces to 1 bit less than the capacity of a single quadrature, as shown in [33]. This latter result can be intuitively understood by noting that intensity constitutes a single degree of freedom (comparable to just one of the two quadratures of the electric field). In addition, since intensity detection does not allow distinguishing between positive and negative signal values, an additional 1 bit of information is lost.
The old generation of intensity-modulated directdetection systems used one sample per symbol, and therefore, in the high-SNR limit at which they are typically operated, their capacity was 1 bit less than the capacity of a single-quadrature AWGN channel of (3), as demonstrated in [33]. In modern systems, direct detection is used for the purpose of reducing the optical complexity of receivers, typically in short links [121] (albeit not exclusively [122]), and it is complemented with digital processing in the electrical domain. These implementations are commonly referred to as advanced direct-detection schemes. In these schemes, it is possible for the receiver bandwidth to be consistent with the bandwidth of the intensity waveform (i.e., twice the optical bandwidth), in which case the capacity is only 1 bit lower than that of a coherent channel [38].
Most advanced direct-detection schemes involve the transmission of a continuous-wave (CW) carrier A together with the information carrying signal s(t) accompanied by Gaussian amplification noise, which we ignore for simplicity of illustration. Such systems are frequently referred to as self-coherent. The photocurrent from which the data are to be extracted is given by where η is a proportionality coefficient accounting for the quantum efficiency of the detector [123], and where we have assumed with no loss of generality that A is realvalued. It is clear that the information resides exclusively in the second and third terms on the right-hand side of (40), whereas the first term A 2 involves only the carrier and is, therefore, immaterial to the data-recovery process. When |A| |s(t)|, the third term |s(t)| 2 becomes negligible, and Re{s(t)} is readily extracted from I PD . If s(t) is a single-sideband signal (i.e., its spectrum resides only on one side of the CW carrier), then knowledge of Re{s(t)} implies knowledge of the entire signal s(t) because the real and imaginary parts are related by the Hilbert transform.
This is contrary to a situation in which the spectrum of s(t) is centered at the carrier frequency, in which case only the real part of s(t) can be exploited for communicating, thereby halving the SE. 11 The main issue with the above approach is that the neglect of |s(t)| 2 often requires an impractically high carrier power, whereas, for practical values of A, |s(t)| 2 is nonnegligible, and it is referred to as signal-to-signal beat noise (SSBN). Many studies conducted on advanced direct-detection systems focused on deploying DSP algorithms that increase the tolerance of the system to SSBN and, thereby, allow operation with lower carrier powers [124], [125], [126], [127], [128], [129], [130], [131], [132]. Many of these studies relied on iterative approaches, where, in each iteration, the SSBN is estimated and subtracted in the iteration that follows. An alternative scheme that has been reported more recently is that of the Kramers-Kronig (KK) receiver [133]. This scheme allows the exact reconstruction of the received complex-valued optical signal, provided that the information carrying signal is a single sideband with respect to the carrier, and the carrier amplitude A exceeds a well-defined finite value, which ensures that the entire waveform (signal and carrier) is the minimum phase. Numerical and experimental demonstrations using the KK receiver have shown that excellent performance is achieved with the carrier-to-signal power ratios exceeding the order of 6 dB [134], [135] (notably lower than that required by previously studied advanced direct-detection schemes). A number of variants of the KK receiver have been proposed and demonstrated in recent years, as reviewed in [136].
One major limitation of direct-detection schemes is in the fact that they do not easily accommodate polarization multiplexing. Various schemes to overcome this shortcoming have been proposed, the most prominent of which are the ones based on the Stokes-space receiver [137] (sometimes in combination with the KK algorithm [138], [139]) or carrier-assisted differential detection [140]. These schemes have been used to achieve record data transmission rates with polarization multiplexing [141], albeit at the expense of significantly higher optical complexity.
For completeness, another category of direct-detection systems that should be mentioned in this section is the one in which the transmitted waveform is obtained by directly modulating a CW laser. In this case, the receiver consists of a single photodiode without optical chromatic dispersion compensation and without DSP. These systems are attractive for their very low cost, robustness, and low power consumption compared to all other alternatives. Because of the low-cost constraints, such systems resort to the use of temperature-unstabilized free-running lasers. Since the central optical frequency of such lasers wanders by hundreds of GHz, it is not meaningful to refer to the 11 Notice that the SE of the two schemes is the same if one refers it to the required electrical bandwidth because photodetection doubles the bandwidth of only the single-sideband signal when the term |s(t)| 2 is negligible. system information capacity or SE. The performance of such systems is limited primarily by the fiber chromatic dispersion, which, in combination with the direct-detection process, corrupts the received intensity waveform to the extent that the information cannot be recovered. Indeed, for a modulated power waveform of the kind P (t) = P 0 [1 + m(t)], where |m(t)| 1, the transfer function experienced by m(t) after propagation in a fiber link of length L is given by [142] H(ω) = cos 2 1 2 which has notches at ω 2 = (1 + 2k)πL −1 |β 2 | −1 , k = 0, 1, 2, . . . For this reason, it is very challenging to transmit signals m(t) with a bandwidth of 20 GHz over more than 10 km of standard SMF.

X. S P A C E -D I V I S I O N M U L T I P L E X E D T R A N S M I S S I O N
As hero experiments reported by the end of the first decade of the present millennium appeared to be gradually closing the gap to early estimates of the SMF channel capacity (see [143] and references therein), SDM came into the spotlight of optical communications, as a candidate for enabling sustainable scaling of fiber-optic communication rates [1], [144]. It is implemented by taking advantage of fiber structures supporting the propagation of multiple spatial modes, where all the modes are used for the transmission of information. Depending on the specific fiber design, spatial modes can be coupled or uncoupled. For example, in a multicore fiber where the cores are sufficiently far apart, the coupling between them is negligible, and each constitutes an independent information channel. On the other hand, in multimode fibers or in multicore fibers containing a larger number of cores (so that the intercore distances are smaller), the spatial modes couple to each other during propagation. In this situation, the extraction of the information at the receiver becomes more complicated, as it requires the use of multiple-input-multiple-output (MIMO) techniques, but, at the same time, the nonlinear distortions discussed in Section IV are reduced relative to the uncoupled case (or equivalently, relative to in the case of separate SMFs), as we discuss in what follows.
In this section, we review some of the main implications of the use of SDM transmission on channel capacity. We use N to denote the number of spatial modes that are supported by the fiber so that the total number of modes M used in (4) is equal to M = 2N, where the factor of two accounts for the fact that each spatial mode is twofold degenerate with respect to polarization.

A. Uncoupled-Mode SDM Transmission in Power-Limited Links
In transoceanic transmission, the rules of the game are different from what they are in terrestrial transmission. One of the biggest issues is with the power supply to the undersea amplifiers. The use of multiple spatial paths introduces a degree of freedom that can be taken advantage of in order to increase the power efficiency, which is defined as ηp = C Ptot (42) where C is the capacity and Ptot is the total optical power that needs to be fed into the system where Pout is the output power of the amplifiers, under the assumption that they are operated in a saturation mode, and the second equality holds for G 1, as happens in most cases of interest. As shown by Sinkin et al. [145], the power efficiency, under the assumption of linear transmission, is given by the expression where P ASE is the noise power per mode at the output of the last amplifier, and we have used the fact that Pout = P ASE + P sig , with P sig denoting the signal launch power in each spatial mode (P sig /2 in each polarization). The power efficiency ηp can be seen to be largest when the SNR is equal to e − 1 1.72. The relation allows the extraction of the optimal number of spatial modes for a prescribed available total optical power and the number of spans. Notice that ηP is the largest for a relatively low SNR value, which suggests that the assumption of linear transmission is justified in this case. This observation is in contrast to traditional single-mode transoceanic systems that were designed with the goal of optimizing the system capacity, and hence, they are operated in the nonlinear regime.
Since the approach of [145] focuses only on the optimization of power efficiency, other factors that play an important role in the design, operation, and cost of transoceanic systems are not taken into account. These include the costs of deployment, cable and fiber, amplifiers, and transponders. Some of these aspects are taken into account in [146] and [147]. Additional aspects having to do with nonlinear propagation, as well as with the physics and operation of the EDFAs, are addressed in [72], [148], and [149].

B. Coupled-Mode SDM Transmission
An important aspect of transmission in SDM fibers has to do with the effect of spatial mode coupling on the Vol. 110 nonlinear signal distortion and performance impairments. It has been found in [150] and [151] that, as the random mode coupling increases, the nonlinear penalty to performance increases initially and then reduces to the extent that it becomes lower than that characterizing the case of no intermode coupling, thereby outperforming singlemode transmission. This finding encourages intentionally introducing strong random coupling between modes, as means of mitigating nonlinear transmission impairments. The benefit of strong mode coupling has been first reported in [152] on the basis of numerical studies, and the physical mechanism responsible for it has been described in [153], while experimental evidence was later presented in [154] and [155]. The main idea can be summarized by inspecting the equation describing nonlinear propagation in SDM fibers with strong mode mixing, which is known as the multicomponent Manakov equation [156] The vector E appearing in this equation contains M = 2N complex-valued elements so that each element describes the excitation of a specific space and polarization mode. This equation represents an idealized case where all modes are degenerate; namely, they are characterized by the same propagation constant β and the same loss coefficient α. It is identical in form to the Manakov equation of (10), and it is derived from a set of 2N coupled nonlinear Schrodinger equations, taking into account the fast and random mixing occurring between modes during propagation. Here too, | E| 2 = È 2N n=1 |En| 2 represents the total optical power, and |En| 2 is the optical power in the nth mode. The coefficient κ accounts for the mode multiplicity, and in the case of N = 1, it is equal to 8/9, consistently with (10). In the case of a multicore fiber with N strongly coupled cores, κ is given by [153] where the approximate equality holds for large core counts. The above implies that the effective SNR scales as (2N + 1) 2/3 , consistently with the discussion in Fig. 3, so that a lower bound for the capacity is given by where SNR (N) eff,opt = (2N + 1) 2/3 SNR (1) eff,opt with SNR (N) eff,opt denoting the optimal SNR in a fiber with N strongly coupled cores. We note, however, that experiments so far have reported a benefit lower than that predicted by (48) when using coupled-core multicore fibers [155]. The reason for this discrepancy requires further investigation and could be attributed to various factors, such as mode-dependent loss (MDL).
To develop an intuition for the inverse dependence on N , we emphasize that the Manakov equation is obtained by averaging the nonlinear terms of the coupled nonlinear Schrödinger equations with respect to the distributed random mode coupling [156]. This is equivalent to averaging the nonlinear terms on the length scale over which random coupling is effective (which is much shorter than the length scale over which the nonlinear effects become appreciable). As the field in each core becomes a random mixture of the fields transmitted in all of the cores, the effect of this averaging is to suppress the coherent (i.e., phase-dependent) products between the field contributions originating from different cores, with the result that the power transmitted in each fiber core equalizes rapidly between all cores. We stress that this mixing occurs continuously along the fiber. Although (47) was derived for multicore fibers, a similar scaling also holds for multimode fibers, provided that all mode groups are strongly mixed over a sufficiently short length scale [153].
The way in which the 1/N dependence of κ translates into the observed reduction of the NLIN can be understood as follows. The nonlinear perturbation that is imposed on the electric field in the jth mode ΔEj is proportional to κ| E| 2 or equivalently to È n |En| 2 /N . Since the powers |En| 2 of the individual modes are statistically independent, the variance of ΔEj is proportional to the sum of their variances divided by N 2 . This implies that the variance of ΔEj is smaller by a factor of N than it would be in the absence of mode coupling where the 1/N dependence of κ is absent. The effective SNR increases accordingly, and the effect on the capacity can then be assessed by considering the lower bound given in (4). Achievable rates obeying the multicomponent Manakov equation are currently under investigation [157] Another important aspect of SDM transmission is related to MD and MDL, two concepts that generalize PMD and PDL, respectively, to the case of SDM fibers.
MD, just like PMD, is responsible for introducing frequency dependence into the random mode coupling process, and being a unitary phenomenon, in principle, it does not affect the system capacity in the linear transmission regime. Yet, since the average delay spread produced by MD is greater by two or three orders of magnitude than that produced by PMD in the same length of fiber, the increase in receiver complexity that it entails is considerable [158]. The large MD implies that the frequency correlation of mode-related phenomena becomes very short. For example, after propagation in a 1000-km link with a typical average delay spread coefficient of 5 ps/km 1/2 , the evolutions of the modal content of two different spectral components of the propagating signal separated by 1 GHz are already uncorrelated from each other. This is in contrast to what happens in a typical SMF, where, for the same propagation distance, the polarization states of two spectral components become similarly decorrelated only when the frequency separation between them exceeds the order of 1 THz. For this reason, the effect of MD on the propagating signal is averaged over the signal spectrum, with the result that the output intensity waveform can be expressed as a convolution of the input intensity waveform with a deterministic impulse response-the intensity impulse response of the fiber [159], [160], [161], [162].
In the regime of nonlinear propagation, on the other hand, MD can be beneficial as it contributes to further reducing the accumulation of NLIN [163], [164]. Indeed, while, as noted in Section VII-A, the relative polarization rotation of nonlinearly interfering frequency channels is too small to affect the build-up of nonlinearity on the length scale of pulse collisions in SMFs, the equivalent effect of MD is much larger, and the build-up of the nonlinear interference is effectively reduced. This reduction comes on top of the above-discussed NLIN reduction due to strong mode mixing.
We conclude this section by briefly discussing the effect of MDL on the capacity of SDM systems. Like PDL in single-mode systems, MDL is a nonunitary phenomenon, and hence, it is responsible for a fundamental reduction in the information capacity. This reduction can be quantified either in terms of the ratio between the actual capacity and some reference capacity [21], [165] or, in terms of the difference between the two, normalized to the total number of modes [114], [166]. The latter form has been shown to be more convenient in the high-SNR limit, where its average value and variance become independent of the SNR itself and the number of modes. In general, we note that the modeling of how MDL affects the system capacity is rather involved, and there are subtleties that require special attention. These include the choice of a reference system with respect to which the capacity loss is assessed, the operation mode of the amplifiers, and the way in which noise loading is described. It should be noted, however, that, because of the fast frequency decorrelation imposed by MD discussed earlier in this section, the fluctuations of capacity induced by the randomness of MDL are effectively suppressed, and the relevant figure of merit becomes either the average capacity ratio or the average capacity loss per mode. As a reference, the average capacity loss per mode ranges between 0.1 and 0.8 bit/s/Hz when the mean link MDL ranges between 5 and 20 dB, respectively [114].
As seen in Fig. 5, the SE drops rather quickly with the transmission distance. This reduction is consistent with the theoretical bounds discussed in Section V. The dashed curves represent the lower bound 2 log 2 (1 + SNR eff,opt ), where SNR eff is the effective SNR given by (35), which accounts for all NLIN contributions [82] (intrachannel and interchannel) in the absence of nonlinearity compensation, and it is evaluated at the optimal launch power under the assumption of Gaussian modulation, where SNR eff = SNR eff,opt . 12,13 While the data points shown in the figure were obtained for a variety of system settings, the lower bound curves refer to a system with 50 tightly packed 32-Gbaud WDM channels transmitted over an SMF with a loss coefficient of 0.15 dB/km, β = 21 ps 2 /km, and γ = 1.3 W −1 km −1 . The solid curves represent the capacity upper bounds for the two considered span lengths, and as 12 As noted in the context of (23), the assumption of Gaussian modulation implies that SNR eff,opt coincides with the optimal effective SNR obtained with the GN model. 13 The lower bounds displayed in the figure were obtained under the assumption of SMF transmission. The corresponding lower bound for SDM systems should account for the dependence of the SNR on the number of modes. Depending on the degree of mode coupling, the SDM capacity lower bound may be higher or lower than the corresponding bound in the case of SMF. In the limit of strong coupling, the lower bound would be higher, consistently with (48). explained in Section VI, they are 1.17 bit/s/Hz higher than their corresponding lower bounds.
The green curves were obtained for span lengths of 100 km and should be compared with experimental results achieved in the range of terrestrial links (roughly up to 2000 km). Transoceanic experiments (above 2000 km), where short spans were used, should be compared with the red curves, which were obtained for span lengths of 50 km. The inset in Fig. 5 shows the effect of ideal PPRN compensation on the capacity lower bounds. Consistently with [86], the effect of PPRN mitigation is more significant in short-span systems (which are closer to the case of distributed amplification [73]). It is largest in the single-span case but rapidly reduces with the number of spans reaching the order of ∼1-bit/s/Hz improvement for the 50-km-span system and ∼0.5 bit/s/Hz in the system using 100-km spans.
It is interesting to note that the largest SE values in all cases were obtained in SMF transmission systems, which is consistent with the higher maturity of SMF technology.
Because some of the SE data points shown in Fig. 5 were obtained with a small number of channels, it is important to provide a more complete picture showing record results for the aggregate system throughput. Fig. 6 summarizes experimentally demonstrated record throughputs per fiber. We group the demonstrations into four types: 1) SMF transmission with WDM channels fully loaded on the C-band; 2) SMF transmission with the fully loaded extended band (C + L or S + C + L); 3) SDM fiber (multicore fiber and/or few-mode fiber) transmission with fully loaded C-band; and 4) SDM fiber with the fully loaded extended band (C + L or S + C + L). It is evident that, when operating only in the C-band, the throughput in the case of SMF saturates at ∼41 Tb/s for ∼500-km transmission [172] and ∼35 Tb/s for ∼6000-km transmission [176]. Extending the wavelengths to the L-band yields at most a factor of two in the throughput, reaching ∼70 Tb/s for ∼7000-km transmission [181]. Using S-, C-, and L-bands, the achieved link throughput is seen to increase to 74 Tbs/s for 6300-km transmission [190] and 115 Tb/s for 100-km transmission [173] (with the S-band emulated by ASE noise). Higher rates have been demonstrated for shorter distances [209].
A further increase in throughput is achieved by using multiple spatial modes. In [193], a 1-Pt/s throughput was demonstrated with a 32-core fiber. Many of the record-achieving experiments relied on the use of advanced modulation formats, with probabilistically shaped (PS) QAM [210] playing a prominent role. This technique produces Gaussian-like amplitude distributions, offering shaping gain and the flexibility for continuously tuning the constellation entropy. An example of using PS-QAM to achieve shaping gain and SE flexibility can be seen in [168], where PS-256-QAM is used and the SE is varied from 12.6 bit/s/Hz for 500-km transmission to 10.1 bit/s/Hz for 2000-km transmission. Other coded modulation schemes, such as 64-APSK [176], are also used to improve the SE. For capacity-approaching experiments, it is also common to use advanced DSP, such as nonlinear noise compensation (NLC) algorithms. Zhang et al. [176] and Cai et al. [181] are two examples where DBP is used for nonlinearity compensation. In particular, Zhang et al. [176] show that NLC improves the Q factor by ∼0.8 dB in a fully loaded C-band system with 6375-km transmission, and Cai et al. [181] report NLC gain ranging from 1.1 to 1.7 dB for the 295 measured C + L band channels after 7600-km transmission.
Similar to SMF links, advanced modulation formats, NLC, and extended bands are also used in SDM systems. For example, in [200], S-, C-, and L-bands are fully loaded with 552 wavelength channels in each fiber core. The combination of the four-core fiber and the ultrawideband utilization results in 319 Tb/s achieved in a 3001-km link. As mentioned in Section X-B, one of the advantages of SDM fibers is a higher tolerance to nonlinearity. This aspect of SDM transmission is studied systematically through experiments in [201] and [203]. In particular, Ryf et al. [201] compare transmission results of QPSK and 16-QAM signals with coupled four-core fiber, coupled seven-core fiber, and ultralarge effective area SMFs. The coupled four-and seven-core fibers show, as noted earlier, up to 1 dB higher Q factors and higher optimal launch power at all distances ranging from ∼1000 to ∼12 000 km. Naturally, SDM fibers exploit the spatial degree of freedom to further increase the link throughput. For example, the transmission of tens of Tb/s has been achieved either by fully loading the C-band of SMF with high order QAM [171], or by transmitting channels with lower order QAM modulation in a few spatial modes fiber [203], or by means of even lower order QAM transmission over an SDM fiber with a large number of spatial modes [206].
We now switch to briefly describing high-speed links extending over distances shorter than 100 km, as would be the case with DCIs. In contrast to long-distance systems, where the link throughput is typically limited by the fiber's capacity, the throughput of short-reach systems is usually limited by cost constraints that are applied to the transceivers. Indeed, commercial short-reach transceivers typically rely on IM and DD (IM-DD) and support rates between 10 and 50 Gb/s per wavelength. The lasers in such modules are usually uncooled, implying that the wavelength drifts significantly with temperature variations, thereby preventing the use of dense WDM (DWDM). Hence, short-link DCIs (e.g., up to several kilometers) typically carry four or eight wavelengths per fiber and provide a link throughput within the range of a few hundreds of Gb/s. IM-DD signals with longer transmission distances suffer from dispersion-induced spectral fading (see Section IX). In practice, IM-DD transmission of more than 50 Gb/s over distances exceeding 10 km is very challenging in the absence of optical dispersion compensation.
Communications between data centers that are separated by more than 10 km require more advanced transceiver configurations. Advanced, self-coherent DD schemes, such as the self-homodyne scheme of [207] and [208], show the real-time demonstration of 600 and 800 Gb/s per wavelength, respectively. KK receiver experiments achieving 104 and 279 Gb/s per wavelength are also reported in [134] and [211], and a Stokes receiver system carrying 186 Gb/s per wavelength is demonstrated in [212]. Combining KK and Stokes receiver techniques, the per-wavelength data rate was shown to achieve 400 Gb/s and 1 Tb/s using a superchannel [213]. Combining DWDM, the potential per-fiber link throughput for self-coherent systems can be tens of Tb/s with transmission distances approaching ∼100 km. Nonetheless, since self-coherent transmission relies on a new transceiver design that is usually not backward or forward compatible with IM-DD or coherent transmission, they have not yet been adopted in commercial links. Instead, DCIs extending over distances longer than 10 km use simplified coherent transceiver technology. In particular, such transceivers use simpler modulation formats than in long-haul systems (e.g., uniform 16-ary QAM instead of high order PS-QAM) and simpler digital processing algorithms (e.g., simpler adaptive filtering using a much smaller number of taps). Currently, commercially available DCI coherent modules support ∼60 WDM channels operating at ∼60 GBaud with 400 Gb/s per channel, corresponding to a total fiber throughput of ∼24 Tb/s. More details about such links can be found in [214].

XII. C O N C L U S I O N
We reviewed the problem of assessing the information capacity of fiber-communication systems. As the complexity of nonlinear fiber propagation prevents the extraction of information capacity by directly applying Shannon's formulation, most efforts focus on the search after useful bounds that allow its assessment. We reviewed these bounds in a variety of applications, ranging from interdata center links extending over tens of kilometers to submarine systems of thousands of kilometers in length. We also considered a variety of transmission schemes and discussed the potential benefits of spatial multiplexing using multimode or multicore fibers. Finally, state-of-the-art transmission experiments addressing the various operation regimes have been reviewed and presented. While the complexity of the fiber-optic channel prevents the extraction of clear numbers for its information capacity, current technology appears to be rapidly approaching capacity bounds, making the continuing growth of information throughputs increasingly more challenging.
A c k n o w l e d g m e n t Mark Shtaif acknowledges important discussions with R. Zamir of Tel Aviv University regarding capacity lower bounds.