Statistical BER Analysis of Wireline Links With Non-Binary Linear Block Codes Subject to DFE Error Propagation

This paper presents a statistical model to accurately estimate post-FEC BER for high-speed wireline links using standard linear block codes, such as the RS(544,514,15) KP4 and RS(528,514,7) KR4 codes. A hierarchical approach is adopted to analyze the propagation of PAM-symbol and FEC-symbol errors through a two-layer Markov model. A series of techniques including state aggregation, time aggregation, state reduction, and dynamic programming are introduced making the time complexity to compute post-FEC BER below 10−15 reasonable. Error bounds associated with each method are found. The efficiency of the proposed model allows it to handle a larger state space, more DFE taps, and more sophisticated linear block codes than prior work. A 4-PAM 60 Gb/s wireline transceiver fabricated in a 7 nm FinFET technology is used as a test vehicle to validate this model. Measured data with two different channels reveals that the statistical model can properly predict the post-FEC error floor with standard FEC codes. While this paper demonstrates the method for capturing DFE error propagation, the method is general and can be applied to model other communication systems having memory effects. Moreover, our proposed model can be easily extended to higher-level PAM schemes and other advanced equalizer architectures to assist in making architectural choices for wireline transceivers.

transceivers to achieve a target BER below 10 −15 without expensive overdesign [1]. A typical design practice, sometimes referred to as the FEC limit paradigm [2], is to design the serializer-deserializer (SerDes) for a targeted BER (e.g. 10 −6 ) without FEC, called the pre-FEC BER, assuming that an appropriate FEC code will correct most of the resulting errors providing a post-FEC BER of some desired level (e.g. 10 −12 or 10 −15 ). However, this approach is naïve. For example, the 100GBase-KP4 standard [3] specifies transmitting 4-PAM symbols at 100 Gb/s over four backplane interconnects with less than 33 dB insertion loss at 7 GHz, targeting at a post-FEC BER better than or equal to 10 −12 using a RS(544,514,15) FEC code. Depending on the equalization techniques used in the SerDes, the same pre-FEC BER may result in different post-FEC BER. In particular, error propagation in decision feedback equalization (DFE) can significantly impact BER. A DFE removes channel ISI by registering past equalized symbols in the feedback path and using them to estimate and cancel ISI from the current symbol. However, if any past decision registered in the DFE is wrong, the receiver's decision is biased and may increase the probability of additional symbol errors. Errors may thus propagate around the DFE feedback loop and result in FEC code failures. Unfortunately, simulations of the targeted post-FEC BERs are prohibitively long, especially for exploring architectural alternatives. Instead of using the FEC limit paradigm currently employed by many designs [4]- [6], which doesn't consider DFE error propagation, a model that accurately predicts very low post-FEC BERs is important for modern SerDes design.
Subject to various noise sources in wireline links [5], [7], [8], several models have been developed for BER estimation, each having its own limits. For example, the Gilbert model [9], [10] captures DFE burst errors, but its complexity grows exponentially with the number of DFE taps. Peak distortion analysis [11] focuses on the impact of residual (unequalized) inter-symbol interference (ISI) but may require too much simulation time to find all critical data patterns that contribute to BER.
A key challenge for statistical modeling is to accurately capture the impact of DFE error propagation on post-FEC BER. Ref [12] explains the approach in the IEEE 10GBASE standard for handling DFE error propagation. It considers bursts combining correct and erred bits, and enumerates all possible burst-error patterns to estimate BER and link performance. However, this time-consuming approach is ill-suited to the longer linear block codes adopted in recent wireline standards [13], [14]. Another possibility is to extrapolate to very low BERs based on a few simulations at higher BER [15]. The validity of such methods holds only if the BER-SNR correlation remains stable when BER is extrapolated to lower orders of magnitude, which we will show is impractical for many wireline links.
Past work on post-FEC BER estimation has focused on systems with BCH codes which operate in GF(2) using 2-PAM signaling [16]. A Markov chain model from [17] was adopted in [16] to account for DFE error propagation, and possible burst-error patterns are systematically grouped through trellis-based dynamic programming. However, 4-PAM signaling is becoming increasingly critical for 50 Gb/s+ wireline links [18]- [20], often with DFEs [21], [22]. Hence more powerful Reed-Solomon (RS) codes are being used to correct up to t FEC symbol errors caused by DFE error propagation. Nonetheless, very few attempts have been made to model and analyze the post-FEC BER for codes in higher-order Galois fields, GF(2 m ), m > 1 in the presence of DFE error propagation [23], [24]. In [23], DFE error propagation across PAM symbols is considered using a method similar to [12], and post-FEC BER is calculated by enumerating all symbol-error combinations that result in t + 1 or more FEC-symbol errors. It applied the method to only a 1-tap DFE, but the method's complexity can grow exponentially for a multi-tap DFE and large t values. In [24], the probability of having an error-free RS symbol is assumed to be independent of other symbol errors in a codeword, which may not be a valid assumption and thus incapable of accurately modeling error bursting for a post-FEC BER below 10 −15 .
Our proposed BER estimation method for wireline links is an extension of [16], and provides a set of tools to assist in making architectural choices for wireline transceivers, such as co-design of the equalization and FEC in the presence of DFE error propagation and various noise sources. An extension from 2-PAM to Gray-Coded 4-PAM signaling is included in our model to calculate the bit-error probability for a multi-bit PAM symbol accurately. We generalize trellis-based dynamic programming to the FEC-symbol level, resulting in a hierarchical model containing many PAM sub-trellises allowing us to look at post-FEC BER for both non-interleaved and interleaved FEC codes in a reasonable amount of time. The model is simplified through state reductions to accelerate the statistical analysis. The efficiency of the proposed model allows it to handle a larger state space, more DFE taps, and more sophisticated linear block codes than prior work. Our proposed model can be easily extended to higher-level PAM schemes, and is also applicable to other advanced equalizer architectures that are likely to arise in ADC-based receivers for 100 Gb/s+ wireline links [25].
The modeling of DFE error propagation will be discussed in Section II. This will then be followed by the application of state aggregation and trellis-based dynamic programming to improve the computational efficiency of BER estimation in Section III. In Section IV we will propose a statistical model to estimate post-FEC BER for high-order PAM schemes and linear block FEC codes on GF(2 m ), m > 1. A time-aggregated trellis model will be used to consider the error propagation at both the PAM-symbol and FEC-symbol levels. Section V will describe a method for post-FEC BER estimation and steps to minimize its computational complexity. Subsequently, in Section VI, the statistical model is experimentally verified on a 4-PAM 60 Gb/s SerDes link. Finally, conclusions are drawn in Section VII.

II. MODELING DFE ERROR PROPAGATION
The statistical model proposed in [16] will be introduced in this section to estimate the pre-FEC BER in the presence of DFE error propagation. We first explain how DFE error propagation is modeled using Markov chain theory, and then apply trellis-based dynamic programming to efficiently collect probabilities of all error patterns that are needed for post-FEC BER calculation in Section IV.
First, consider the link model shown in Fig. 1, communicating symbols b k with time index k. The symbols are filtered by a finite-impulse-response (FIR) channel response h p with main cursor h 0 , and subject to additive noise, n k . Without limiting the scope of this work, it is assumed that all pre-cursor and higher-order post-cursor ISIs have been removed by linear equalizers. The detected symbols d k may differ from the transmitted symbols resulting in the error sequence, This results in an additive error n d f e k generated by non-zero error terms in the DFE feedback path. Assuming a perfect zero-forcing N-tap DFE, Then the DFE slicer input r k becomes Error propagation is modeled as a Markov process whose state is specified by the error terms in the DFE feedback, . Hence, the rates at which d k = b k and d k = b k can be determined from the appropriate standard error function. This may be straightforwardly extended to include other impairments such as jitter, crosstalk, or residual ISI by appropriately changing the probability density function (pdf) of the received samples r k [7], [8]. The one-step state-transition probabilities q i i from a source state 'i ' to a sink state 'i ' can be calculated by applying (3) (3) is exclusively dictated by the source state 'i '. With all q i i calculated, we may find the steady-state probability, π i , of any state i in the Markov model by solving the global balance equation [26], subject to

III. REDUCING COMPUTATIONAL COMPLEXITY IN MARKOV MODEL A. Aggregation of Weakly Lumpable Markov Process
Applying state lumping (sometimes referred as state aggregation) to a Markov process allows the generation of an aggregated chain with a comparatively smaller state space resulting in reduced analytical complexity. The aggregated chain provides a coarser analysis of the state space and can be used to perform DFE error-rate analysis for the original Markov chain without losing analytical accuracy [27]. Consider a homogeneous and irreducible Markov process X with finite state space S = {1, 2, . . . , s} whose chain is defined by its one-step transition matrix Q = [q i i ] and initial probability vector γ . We say X is lumpable with respect to a partition C = {C 1 , C 2 ,…C r } given ∪C i = S, C i = ∅ and C u ∩ C v = ∅ for any u = v if the aggregated chain Y with state spaceS = {1, 2, . . . , r } is also a homogeneous Markov process [28]. If the above definition holds true for all γ , we say X is strongly lumpable with respect to the partition C; if the above definition applies to at least one but not necessarily all choices of γ , then we say X is weakly lumpable with respect to C.
In the scope of this work, we consider the link illustrated in Fig. 1 subject to AWGN, having equally spaced DFE slicer thresholds, and an equally probable symbol set b k that is independent of noise sample n k . Under this particular setting, it is proven in [27] that, by exploiting the symmetry in the error states < D k−1 , D k−2 . . . , D k−N >, an N-tap DFE Markov process is weakly lumpable with respect to the partition lumping all states having the same error magnitude |D k | at each DFE tap. In addition, it is always assumed in our work that a Markov chain is initialized by its steady-state probability vector π, which is proven in [29] to be always a choice of γ leading to a homogeneous Markov chain if the chain is weakly lumpable.
In order to obtain the aggregated Markov model from the original one, we define γ Ci as the restricted initial vector to a set C i in partition C. For all elements in γ Ci , we assign zeros to those that correspond to states not in C i and normalize γ Ci to a unit-sized vector. Therefore, the k th element γ Ci k is Let U π be the s × r distributor matrix whose i th row is π Ci , which is the steady-state probability vector restricted to set C i ; let V be the r ×s collector matrix generated by transposing the distributor matrix U π and replacing all non-zero elements by 1.
Denote p i mim as the one-step state-transition probability from a lumped state 'i m ' to a lumped state 'i m '. Transition matrix P = [ p i mim ] of the lumped process Y is then given by Moreover, a more straightforward two-step procedure for computing the aggregated state probabilities from the original chain is provided in [30]. First, the aggregated steady-state probabilities Π im can be calculated from the results obtained by (4) and (5), Next, the aggregated state-transition probabilities p i mim can be computed by We may also numerically verify the weak lumpability using a sufficient condition proposed in [28]. That is, a Markov process X is weakly lumpable with respect to a partition C if

B. Trellis-Based Dynamic Programming
We next apply trellis-based dynamic programming [31] to the Markov model to efficiently calculate the probability of bit errors in a codeword. The lumped Markov model for an N-tap DFE with M-PAM signaling may be represented by an M N -state radix-M trellis. Rather than finding the BER by enumerating all possible error patterns in the trellis, dynamic programming solves the problem much faster by grouping the probability of all trellis paths having the same number of bit errors. The same aggregation procedure is repeated recursively when traversing through each stage in the trellis, resulting in a significant reduction in computational complexity. For a length-B, t-error-correcting block code, without dynamic programming, one must calculate the probability of all paths through the length-B trellis corresponding to t + 1 or more bit errors, adding them together to find the probability of a codeword error. For example, a trellis representation of the binary 2-tap DFE Markov model with B = 3 is shown in Fig. 3, highlighting all paths that result in exactly 1 detection error in the highlighted 2 nd bit position. Combining the computed steady-state error probabilities and branch probabilities, one may compute the probability of each of these paths, along with those of paths having errors in the 1 st bit and 3 rd bit to find the total probability of 1-bit error within a 3-bit sequence. Unfortunately, the challenge of enumerating and computing these probabilities grows exponentially with block length B making the computations intractable for practical FEC codes typically having B > 1,000.
Instead, dynamic programming calculates the probability of long error patterns recursively in terms of state and error probabilities at the preceding stage. We denote Pr j k (i ) the probability of arriving at Markov state i at time step k after traversing all trellis paths containing exactly j bit errors. For example, Pr 1 2 (3) represents the probability of arriving at state #3 at the 2 nd stage of the trellis having traversed all paths corresponding to exactly 1 error. Hence, the biterror probabilities at time k + 1, Pr j k+1 (i ), can be found iteratively from the values of Pr j k (i ), Pr j −1 k (i ) and the branch probabilities p i i . For example, for states 'i ' where the most recently received bit is correct, whereas for states 'i ' where the most recently received bit is incorrect, For example, according to (12), Pr 1 2 (3) = Pr 0 1 (1) p 13 + Pr 0 1 (2) p 23 , corresponding to the red-highlighted paths in Fig. 3 where #1 and #2 are the only possible source nodes. Moreover, Pr 0 1 (1) and Pr 0 1 (2) can be found recursively from (11) when calculating all node probabilities for the 1 st trellis stage. By repeating this procedure for all k, j and i , we will be able to obtain the probability of all error counts through the trellis with a computing time that increases only linearly with B. The recursion is initialized assuming the link has reached a steady-state, so Pr 0

IV. 4-PAM STATISTICAL MODEL FOR NON-BINARY LINEAR BLOCK CODES
In the previous section, we have reviewed a 2-PAM statistical model and the application of trellis-based dynamic programming to model DFE error propagation. In current long-reach wireline SerDes applications, such as the 100GBase-KP4, Gray-coded 4-PAM signaling and RS FEC are standard. For linear FEC codes on GF(2 m ), the encoder groups every m bits into one FEC symbol, and correspondingly the decoder can detect and correct up to t erroneous FEC symbols in an n-symbol codeword. All m bit errors in each erred FEC symbol can be corrected so long as the total number of FEC symbol errors does not exceed t. Hence the higher-order RS codes provide stronger burst-error correction ability than BCH codes, a measure taken in part to accommodate DFE error propagation. In this section, we extend this statistical model to higher-order M-PAM schemes and linear block FEC codes on GF(2 m ), for m being multiple integer of log 2 (M) including the standardized wireline RS codes [13], [14]. The analysis is performed in two layers: • First, a PAM trellis is defined to model the propagation of 4-PAM (physical-layer) symbol errors within a DFE over the course of one individual GF(2 m ) FEC symbol. • Second, a higher layer of analysis groups 4-PAM symbols into GF(2 m ) FEC symbols through a time-aggregation approach and the probability of error propagation across FEC symbol boundaries is considered using a higher-level FEC trellis. Dynamic programming is applied to analyze both trellises, ultimately resulting in the post-FEC BER.

A. 4-PAM Markov Model
With M-PAM signaling, there are in total M 2 symboldetection outcomes considering all possible pairs of transmitted/detected PAM symbols. Hence an M-PAM N-tap DFE can be represented by an M 2N -state Markov model without applying state aggregation. Fig. 4 demonstrates a receiver eye diagram indicating all possible detection outcomes for a link communicating Gray-coded 4-PAM symbols b k ∈ {±3, ±1}.
, together with their associated bit-error patterns, are also labeled in the same figure. The subscript of each error value denotes its relative position in the 4-PAM eye from top to bottom. Note that states having the same error value may correspond to different bit-error patterns. For example, subject to an error event D k = +2 M , the 1 st bit of the received PAM symbol is in error, which corresponds to the pdf plot superimposed in Fig. 4 with b k = −1, d k = +1 and n d f e k = 0. However, the combination of b k = +1 and d k = +3 results in D k = +2 T , which instead makes the 2 nd bit erroneous while having the same error value.
Next, in the M 2N -state Markov model, all states having the same error magnitude are aggregated together by applying weak lumpability, resulting in a much smaller M N -state state space. Specifically, we can define a new set of D k ∈ {0, ±2, ±4, ±6} for the 4-PAM example given in Fig. 4. Steady-state and state-transition probabilities of the new aggregated chain can be calculated using (7)- (9), similar to what has been done in the 2-PAM case.

B. 4-PAM Trellis Model
When traversing in an M-PAM trellis using dynamic programming, each branch decision corresponds to between 0 and at most log 2 M bit errors. We define j PAM as the number of bit errors in a PAM symbol detection. For example, in a link communicating 4-PAM symbols b k ∈ {±3, ±1}, j PAM ∈ {0, 1, 2} and the receiver error sequence defined in (1) is D k ∈ {±6, ±4, ±2, 0}. Assuming Gray-coding, an error value ±2 or ±6 corresponds to j PAM = 1, whereas an error value ±4 indicates j PAM = 2. In each trellis iteration, for states 'i ' where the most recently received 4-PAM symbol has j PAM -bit errors, The trellis model can be further simplified to a 2 N -state radix-2 trellis as demonstrated in Fig. 5(b) by ignoring all the dotted paths in Fig. 5(a) that have unlikely ±4 and ±6 error events. In the following subsection, we will provide a quantitative justification and discuss the general conditions for ignoring these error events in the post-FEC BER analysis.
With the M-PAM trellis properly defined, a length B = m/log 2 M trellis may be analyzed using the methods in this section to find the probability of at least 1-bit error corrupting the GF(2 m ) FEC symbol.

C. 4-PAM Trellis Model Simplification
We apply the 4-PAM Markov model to a link communicating b k ∈ {±3, ±1} as depicted in Fig. 1 with N = 4 and channel response 1 The channel response is normalized by A = 1+α+ α 2 + α 3 + α 4 to maintain a peak-amplitude constraint on the transmitter, typically imposed by supply voltage limitations. Hence, larger α indicates higher A, lower channel bandwidth, and a weaker main cursor in the channel response. A zeroforcing 4-tap DFE is assumed at the receiver and may thus introduce error propagation. As α is increased, a lower noise variance σ 2 is required to maintain the same pre-FEC BER and thus a larger proportion of errors are caused by DFE error propagation. Fig. 6 plots the probability of each error value versus pre-FEC BER with α = 0.4 and 0.7, respectively. Noise variance σ 2 is swept to generate each curve. Clearly, for each slicer decision the probability of ±2 error events (associated with the nearest-neighboring PAM signal levels) will be greater than ±4 and ±6 error events. Despite the very large DFE tap weights in these channels, the probability of ±4 and ±6 error events are several orders of magnitude lower than ±2 error events. This fact can be also qualitatively verified by the example given in Fig. 4, where the noise pdf is obtained by setting b k = −1, d k = +1 and n d f e k = 0. A smaller noise variance σ 2 leads to a tighter pdf distribution and a lower BER, and the area of each shaded pdf region is proportional to the probability of each error event. As BER decreases, the ±4 and ±6 event probabilities corresponding to the area under the Gaussian-like exponential tail declines much faster than the probability of ±2 error events. This ultimately results in a much higher slope for ±4 and ±6 events in the plot. As pre-FEC BER is the weighted average of these error event probabilities, neglecting ±4 and ±6 error events will not impact the accuracy of pre-FEC BER analysis at levels below 10 −2 .
When traversing the PAM trellis in a codeword, all error patterns contributing to the post-FEC BER are recursively computed by aggregating the probability of all trellis paths having more than t FEC symbol errors. It is also possible to neglect ±4 and ±6 error events in post-FEC error-rate analysis if the probability of burst errors across multiple FEC symbols is not impacted. This can be quantitatively justified by analyzing the error propagation probability P burst between two neighboring FEC symbols using a 2m/log 2  x ∈ [1,2] in each trellis iteration. We can obtain P burst by calculating the error probability of a FEC symbol given an erroneous preceding FEC symbol. First, we traverse the trellis for x = 1. Then, the probability space in the leading FEC symbol is normalized by excluding all error-free trellis paths using scaling factor Next, the normalized probability of visiting state i at the last PAM stage in the erred leading FEC symbol, becomes the initial condition Pr[x+1] 0 0 (i ) of the following FEC symbol, Similarly, we can use the above method to generate P burst by ignoring the ±4 and ±6 error events. The relative error introduced by the simplified trellis model is With m = 10, M = 4 and N = 4, δ burst versus pre-FEC BER for α = 0.4 and 0.7 is also reported in Fig. 6.
For the case where α = 0.7, a larger proportion of errors are caused by DFE error propagation which increases the probability of ±4 and ±6 error events. The relative error δ burst monotonically decreases with smaller σ 2 , and δ burst ≤ 0.01% for pre-FEC BER ≤ 10 −2 . When estimating the probability of a codeword containing 100 FEC symbol errors, the relative estimation error is bounded by the worst-case scenario having 100 consecutive errors, 1-(1-0.0001) 100 ≈ 1%. As modern SerDes links generally operate with a pre-FEC BER ≤ 10 −2 , m = 10, N ≤ 4, t = 15 100, and α ≤ 0.4 [19]- [21], the simplified trellis model can be practically applied to the post-FEC error-rate analysis with an estimation error much less than 1%. In addition, we can always apply the original 4 N -state PAM trellis to verify the results generated by the simplified model. Therefore, to further reduce the complexity of the model, we consider a 2 N -state radix-2 trellis for all 4-PAM analysis with D k ∈ {±2, 0} in the remainder of this work.

D. Time-Aggregated FEC Trellis Model
Using the methods described so far, every FEC symbol in GF(2 m ) can be decomposed into a length-m/2 4-PAM trellis describing link behavior in the physical layer. Recall the example in Fig. 5 that we apply (13) to recursively compute Pr  Note that all paths in the trellis representing Pr j k (i ), the probability of arriving at state i at the k th stage of the trellis after traversing all trellis paths containing exactly j bit errors, can be decomposed into 2 N groups of trellis paths and each starts with one of the 2 N Markov states at k = 0. For example, in Fig. 5(b) all trellis paths representing Pr 1 2 (2) must begin with one of the two DFE states at k = 0. As such, we may simplify the entire length-m/2 2 N -state radix-2 trellis to a length-1 2 N -state radix-(2 N ·m/2) trellis by aggregating all j -bit-error paths within each of the 2 N groups to a one-step direct transition between the two states at k = 0 and k = m/2. Each one-step transition in the simplified trellis is equivalent to traversing m/2 4-PAM symbols in the fully expanded trellis. Fig. 7 shows an example of a time-aggregated 4-PAM trellis with N = 1, where we denote a j i i as the one-step statetransition probability from source state 'i ' to sink state 'i ' with exactly j bit errors. Depending on the choice of sink state 'i ' and the number of aggregated PAM-symbol stages, there are in total m/2 possible transitions between any of the two states in the simplified trellis. For example, for the transition a j 22 in Fig. 7, j ∈ {1 . . . m/2} as all the aggregated paths end at i = 2 has at least 1 bit error.
As such, we may construct a new trellis model for the entire FEC block, assuming that each state transition from the k th F to the (k F +1) th stage has traversed a group of length-m/2 PAM-trellis paths. This is referred as the time aggregation of a Markov decision processes [32]; we group trellis paths over m/2 consecutive 4-PAM symbols while the time-aggregated Markov model preserves both the timehomogeneity and bit-error information. We call this timeaggregated PAM trellis the FEC trellis model, distinguishing it from the PAM symbol-level trellis considered thus far. With this approach, a total number of iterations are required to analyze the probability of all error patterns in a FEC symbol. Compared with the computational , [14]. In addition, due to the trade-off between power, area, and speed in a multi-tap DFE design, N ≤ 2 in most high-speed wireline applications [20]- [22]. Therefore, time aggregating the underlying PAM trellis of each FEC symbol results in a significant reduction in computational complexity.
In order to analyze the FEC trellis, we must first find all the state-transition probabilities of these 2 N states by analysis of each underlying 4-PAM trellis. Fig. 8 shows an example illustrating the time-aggregation of a 4-PAM trellis for N = 1 and m = 6. The FEC trellis is expanded in Fig. 8 showing the underlying 4-PAM trellis to illustrate how we may find state-transition probabilities a j i i in the FEC trellis. First, we instantiate the expanded PAM trellis by assuming that the PAM trellis starts at the state 'i ' in a j i i with a probability of 1, Next, after traversing the expanded 4-PAM trellis using the dynamic programming procedure described in (13), the transition probability a j i i to the next (k F + 1) th FEC trellis stage can be calculated by summing the probability of all j -bit-error PAM-trellis paths ending at state 'i ', For example, in Fig.8

E. Dynamic Programming for FEC Codes in GF(2 m )
To compute the post-FEC BER, we must apply dynamic programming to enumerate the probability of all error patterns having more than t FEC symbol errors in a codeword. However, the dynamic programming algorithm described by (11)(12)(13) can only track the total number of bit errors. Therefore, we create another error index allowing us to aggregate all error patterns in terms of both FEC symbol errors and bit errors. In the FEC trellis, we denote Pr_FEC

A. Post-FEC BER Estimation
We first define Pr_FEC j s, j b n as the grouped probability of all error patterns having j s symbol errors and j b bit errors along with a FEC trellis path of length n, computed by Next, denote W ( j s ) the probability of having exactly j s FEC symbol errors in an n-symbol codeword, To calculate BER, we define E avg ( j s ) as the average number of bit errors in each erroneous FEC symbol given that exactly j s symbol errors occurred in an n-symbol codeword, Then, the pre-FEC BER can be calculated as Finally, to estimate the post-FEC BER for a t-error correcting RS code in GF(2 m ) of block length n, When evaluating BER, the time-complexity of the dynamic programming procedures may become excessive because all combinations of j b and j s must be iterated at each trellis stage. For an n-symbol codeword in GF(2 m ), the 2 N -state FEC trellis model would require a total of iterations. In Section V-B, we will propose a pruning method to improve the analytical complexity of this dynamicprogramming algorithm.

B. Pruning-Based Dynamic Programming Algorithm
At low BER, as W ( j s ) decreases exponentially with increasing j s , pruning trellis paths having negligible probabilities can result in a significant reduction in computation. This is achieved by replacing the upper summation limit n in (27) and (28) iterations.
Consider the trellis tree diagram in Fig. 9 for n = 4. All trellis paths having j s > j max s (dashed lines) are discarded during each dynamic programming iteration by (23). First of all, the post-FEC BER can be calculated by modifying the upper summation limit in (28) Since all paths having more than j max s symbol errors are neglected, W ( j s ) = 0 and E avg ( j s ) = 0 for j s > j max s . Naturally, some error is incurred by neglecting the pruned paths, but we can accurately estimate this error to ensure it is negligible. We define ε( j max s ) as the summed probability of all truncated paths, Moreover, ε( j max s ) ≈ W ( j max s +1) since W ( j s ) decreases exponentially with increasing j s . Consequently, the absolute error in the BER estimate of (31) can be approximated by We may use the fact that E avg ( j max s +1) ≈ E avg ( j max s ) to approximate e( j max s ) without having to calculate E avg ( j max s +1) using the full FEC trellis. Moreover, for a FEC code correcting t symbol errors, we may also define the relative error e r ( j max s ) in our estimate of post-FEC BER, The negligibility of the effect of this pruning approach can be illustrated by an example demonstrated in Fig. 10. The statistical model is applied to the link depicted in Fig. 1 with n = 544, t = 15, m = 10 and N = 4 for two different channel α settings. Under the same 10 −3 pre-FEC BER, results for e r ( j max s ) and W ( j s ) are plotted in Fig. 10. A larger α intensifies DFE error propagation and thus results in increased W ( j s ) at longer burst lengths. Since the same pre-FEC BER is assumed in both cases, the α = 0.4 channel has shorter bursts and thus higher W ( j s ) over shorter burst lengths. The relative error function e r ( j max s ) also increases with a larger α but decreases exponentially by increasing j max s . Accurate pre-FEC BER results are obtained by computing the weighted average of all error event probabilities provided in Section IV-C. The best value of j max s can be determined by iterating from j max s = t+1 until a given accuracy requirement η on e r ( j max s ) is met at a pre-selected pre-FEC BER level which corresponds to the desired post-FEC BER. For the example given in Fig. 10, if η is 1% with 10 −3 pre-FEC BER, the best choice of j max s is 18 for both α settings.

C. Model Verification
A 4-PAM statistical model is applied to a link as depicted in Fig. 1 with a channel response h = 0.6 + 0.2z −1 -0.2z −2 . Such a response may, for example, arise from the combination of a lowpass channel and a continuous time linear equalizer (CTLE) that over-equalizes the channel. The solid line in Fig. 11 reports the pre-FEC vs post-FEC BER calculated In Fig. 11, we may identify two regions of interest. First, consider an extreme case where no burst errors are present. In such a case, a codeword will be decoded incorrectly only when there are (t+1) random bit errors, each having probability p. Hence, post-FEC BER ∼ p (t +1) . This case corresponds to the region (a) in Fig. 11, where the slope of Post-FEC vs. Pre-FEC BER is (t+1) on a logarithmic scale. Another extreme case can be represented by region (b), where individual random bit errors turning into very long bursts are the dominant source of post-FEC errors. If some small fraction, b, of pre-FEC random errors will generate bursts long enough to create post-FEC errors, post-FEC BER ∼ b·p. Thus, the slope of post-FEC vs. pre-FEC BER in this region is 1 on a logarithmic scale.
However, our statistical model does not consider decoder failures in the presence of more than t symbol errors, where the decoder may correct to the wrong codeword, thus increasing the number of bit errors. The probability of such a decoder failure is bounded by 1/t! [33]. In typical wireline SerDes applications, t is relatively large to correct burst errors, so that decoding to the wrong codeword does not affect the modeling accuracy of, for example, the standard RS(544,514,15) code.

A. Device Under Test
We have measured a 4-PAM 60 Gb/s SerDes link based on a chip fabricated in 7 nm FinFET technology [34]. The overall system-level block diagram of the link is plotted in Fig. 12. Specifically, subject to a 1V ppd maximum output swing, the transmitter has a programmable 3-tap FIR filter to mitigate both pre-cursor and post-cursor ISI. At the receiver, a 13-tap FFE with 5 pre-cursor taps and 7 post-cursor taps is adaptively optimized to cancel ISIs in the channel. A 2-tap DFE equalizes the first two post-cursor ISIs. A statistical unit on-chip monitors and stores BER for PRBS31 data in memory. Both the RS(544, 514, 15) KP4 and RS(528, 514, 7) KR4 codes in GF (2 10 ) are implemented in the FEC encoder/decoder.

B. Modeling 2:1 Bit Multiplexing
To comply with IEEE wireline system standards, a 2:1 bit multiplexer is implemented in the PMA sublayer as illustrated in Fig. 12. The 2:1 bit multiplexing provides an extra layer of complication and must be considered in our proposed statistical model. Fig. 13 demonstrates an example showing FEC symbol distribution and 2:1 bit multiplexing at the transmitter. FEC symbols C 1 , C 2 , . . . C 544 in a KP4-encoded codeword are distributed to two PCS lanes (in a round-robin fashion). Then, a bit multiplexer in the PMA layer groups every two bits from each PCS lane and forms a physical-layer 4-PAM symbol. At the receiver, the signal flow in Fig. 13 is reversed to retrieve the codeword C. As a result, burst errors in the physical layer are shuffled across multiple FEC symbols thus making the BER worse.
To model 2:1 bit multiplexing, we carefully consider the error pattern of each erroneous 4-PAM symbol and identify the exact bit-error location. First, we apply weak lumpability to define a new set of simplified 4-PAM error states. Whereas we previously lumped together all 4-PAM symbol errors with value ±2, we must now distinguish between errors in the first and second bit of the Gray-coded symbol. Thus, from the original 16 For states 'i ' where the most recently received 4-PAM symbol is ±2 LSB , Then, in the FEC trellis model, as the 2:1 bit multiplexing correlates every two FEC symbols in GF(2 10 ), trellis paths over every 10 consecutive 4-PAM symbols are time-aggregated to obtain our FEC trellis analysis of error propagation and RS FEC decoding. Hence, we consider each transition in the FEC trellis having traversed a length-10 4-PAM trellis with j 1 MSB errors and j 2 LSB errors. This results in a 3 N -state radix-(5·3 N ) FEC trellis model if we neglect all ±4 and ±6 error events, where all the branch probabilities a j 1, j 2 i i can be found using procedures described in section IV-D. To perform

C. Test Setup
The test bench setup for the 60 Gb/s SerDes link is also superimposed in Fig. 12. A FlexTC temperature forcing system from Mechanical Devices is used to keep the device at room temperature with ±0.2 • C accuracy. Approximately Gaussian-distributed crosstalk noise is coupled to the channel through a crosstalk injection board. Different measurement cases are established by varying the channel insertion loss using an ARTEK CLE1000 variable ISI channel. The corresponding overall pulse responses (including TX FIR, TX driver, channel, RX CTLE and ADC) for two different cases are also tabulated in Fig. 12.
In case A, the overall insertion loss is 29 dB. We intentionally configure the CTLE in this case to over-equalize so that the second post-cursor ISI of the overall impulse response becomes large but negative. DFE error propagation is particularly bad in this case compared with all-positive post-cursor ISIs. b With large negative DFE tap weights, a measurable floor is expected in the post-FEC BER where burst errors due to error propagation in the DFE dominate. In this region, we expect to see a plot of post-vs. pre-FEC BER exhibit a slope of 1. In case B, the system has a lower overall insertion loss of 24 dB so that the KR4 code can provide adequate coding gain at low BER.

D. Experimental Results
In Fig. 14, measured results for both the RS(544, 514, 15) KP4 and RS(528, 514, 7) KR4 codes are reported. Gray encoding is enabled to reduce BER. Different data points are generated by varying the amount of Gaussian-like crosstalk injected to the channel. To minimize the impact of random jitter, all data points are measured by locking the CDR phase and DFE tap weights once the DFE tap weights' LMS adaptation has converged. The curves generated by our statistical model are also superimposed in Fig. 14, treating the crosstalk as additive white Gaussian noise. Following the iterative procedure described in Section V-B, we select b See Appendix for a justification.   All data points in Fig. 14 are measured down to a post-FEC BER of 10 −11 . Good consistency is observed between the theoretical curves and measured results. The combined effect of many noise sources including ISI, crosstalk, and ADC quantization noise in wireline links has a pdf that is well-approximated by a Gaussian [5], [7]. Thus, the shape of the post-FEC vs pre-FEC BER curve is mainly dictated by the DFE taps weights. Moreover, for case A where a large amount of error propagation is present, our statistical model can properly predict the error floor with the RS(528, 514, 7) KR4 code. Importantly, our statistical model accurately predicts the measured transition between the two regions for the KR4 and KP4 FEC in case A. Furthermore, the model indicates that for the KP4 FEC, in order to ensure a post-FEC BER of 10 −18 , a pre-FEC BER of 10 −4 is adequate for case B, whereas a pre-FEC BER of 10 −10 is required for case A, conclusions that would have been almost impossible to draw using the existing methods. Our statistical model can be used to quantify the precise pre-FEC BER required to achieve very low post-FEC BER depending on the channel and equalizer.

VII. CONCLUSION
This paper described a systematic and efficient method that can be used to accurately estimate post-FEC BER for high-speed wireline communication channels using standard linear block codes on GF(2 m ). We proposed a two-level hierarchical statistical model allowing us to model the propagation of both PAM-symbol and FEC-symbol errors corrupting the FEC decoder. The model is simplified through a series of techniques including state aggregation, time aggregation, state reduction, and pruning-based trellis dynamic programming to accelerate the statistical analysis. The error bound associated with each method is also clearly defined. Because of the hierarchical approach, the time complexity of the analysis only depends on the FEC code but not the underlying PAM sub-trellises. An experimental prototype verified the proposed model where all measured results worked quite closely to that predicted by the theory. A comparison of simulation times using the statistical model, a behavioral Simulink model, and a laboratory 60 Gb/s bit error rate test (BERT) measurement are recorded in Table I. The statistical model has all simulation parameters identical to those reported in Fig. 14. The behavioral model is accelerated by parallel processing using a 16-core processor, resulting in 6.81 μs per bit in the simulation. The total time needed to simulate or measure three post-FEC BER levels are reported in the table, assuming each BER simulation or measurement must observe at least 1000 bit errors. Note that our statistical analysis results extend down to 10 −15 or even further without increasing the number of calculations. At these low BER levels, the impact of error propagation is significant, but behavioral simulation and even laboratory BERT measurement are impractical. In addition, the statistical simulation can be prohibitively long without using the techniques introduced in this work to improve efficiency of the model. For example, according to (30) the statistical simulation performed in Table I would require a total number of 4.57 × 10 7 trellis node iterations assuming a KP4 code with j max s = 20. Without pruning, by (29) the FEC trellis model would instead require 1.08 × 10 10 iterations, making the simulation time almost three orders of magnitude higher.
While this paper demonstrates the statistical analysis method in the presence of DFE error propagation, the method is general and can be applied to model other communication systems having memory effects. Moreover, our proposed model can be extended to higher-level PAM schemes and other advanced equalizer architectures to assist in making architectural choices for wireline transceivers such as co-design of the equalization and FEC in the presence of error propagation and various noise sources. APPENDIX According to (2) a single receiver error D k−1 results in an additive error at the receiver input If another error arises, the additive error at time k + 1 is If h 1 > 0 the sign of (38) is opposite that of the preceding error, thus increasing the probability of a new error D k also having an opposing sign. In this case, since D k−1 and D k have opposing signs, the two terms in (39) will add constructively resulting in the largest possible additive error term only if h 1 and h 2 have opposing signs, implying h 2 < 0. Alternatively, if h 1 < 0 the additive error (38) is of the same sign as D k−1 increasing the probability of another error D k also having the same sign. In this case, the additive error (39) is increased when h 2 has the same sign as h 1 ; that is, when h 2 < 0. Thus, in either case the probability of propagating errors two or more time steps is maximized by a negative h 2 .
To prove that the probability of having errors with the same sign is higher if h 1 < 0 and vice versa, we assume D k−1 = ±2 and an equal probability of transmitting b k ∈ {±1}. According to (3) the probability of D k = + 2 is Similarly, under the same assumption the probability of D k = -2 is With a positive h 0 and negative h 1 , P +2 > P −2 if D k−1 = +2 and P −2 > P +2 if D k−1 = -2. Therefore, it is much more likely that D k−1 and D k have the same sign if h 1 < 0. Similarly, in (40) and (41) if h 1 > 0, it can be easily proven that D k−1 and D k are likely to have opposing signs.