ALL-DIGITAL approaches have drawn immense attentions to a receiver design for high-speed communication systems [1]. In general, high-level modulation schemes are preferred for highspeed data transmission. Therefore, it needs to reduce the loop delay in the timing loop to meet the requirement of low-jitter in a symbol timing recovery system.

Fig. 1 shows a simplified baseband block diagram for conventional receivers. A received signal *r*(*t*) is applied to an analog low-pass filter (LPF) to suppress out-of-band noise and limit the bandwidth of interested signals. Then, the filtered signals are sampled by an analog-to-digital converter (ADC) which operates at a free-running frequency 1/*T*_{i}. In general, the sampling clock 1/*T*_{i} differs from the baud rate 1/*T*. Therefore, a sampling-rate converter (SRC) is used to convert the asynchronous samples to the synchronous ones. It can be done by an interpolator together with a symbol timing recovery (STR) loop [2]. The STR includes a timing error detector (TED), a loop filter (LF), and a number-controlled oscillator based (NCO-based) controller. According to the averaged timing error information provided by the TED, this controller determines when to interpolate an output sample that is baud-spaced such that back-end digital signal processing (DSP) blocks can properly work. The DSP block, which includes a feed-forward equalizer (FFE), a feedback equalizer (FBE), and a slicer, is used to equalize the channel and make decisions. In general, the FFE and FBE are adaptive filters so that they can adapt to channel variation. One of the prevalent adaptation algorithms is the least-mean-square (LMS) algorithm [3]. However, this conventional receiver architecture has two design issues. First issue is loop delay. Since the FFE is included in the timing loop, the loop delay of the timing loop is increased and thereby induces larger phase noise [4], [5]. Second issue is interaction. There are two coupled loops in the system model: timing loop and equalization loop. The adaptation of FFE will change its phase response and therefore change the equivalent sampling phase. This adaptation tries to not only equalize the channel but also compensate the sampling phase error such that the signal-to-noise ratio (SNR) at the slicer input is maximized. At the same time, the NCO-based controller varies its control signals in the timing loop to eliminate the inter-symbol-interference (ISI). Such interaction may cause an unpredictable result.

Bergmans *et al.* proposed a simple solution to reduce the loop delay by which the FFE was moved out of the STR loop and was asynchronously adapted [6], [7]. However, they assumed that the SRC could perfectly obtain the timing error information and did not take the interaction issue into consideration. Daecke and Schenk proposed using the equalizer coefficients to estimate the phase offset and then feeding to the timing loop to cancel its effects [8]. However, the accuracy of this equalization-based timing error estimation method depends on the adaptation algorithm; and the optimal equalizer coefficients are assumed to be known in advance otherwise the timing error estimation would be biased. Staszewski *et al.* proposed a constrained LMS algorithm such that the stability of the STR loop was improved [9]. However, this constraint compromises the convergence of the LMS algorithm. Gysel and Gilg proposed a timing recovery scheme in which the STR and equalization loops were independently converged first then fixed the FFE coefficients so that there is no interaction between these two loops [10].

In this paper, we propose a new STR architecture by which the loop delay of the timing loop is reduced and the FFE coefficients are asynchronously updated. We also apply this architecture on 10GBASE-T systems to demonstrate the feasibility.

SECTION II

## Proposed Reduced Loop Delay Approach

As shown in Fig. 2, we propose a new architecture with reduced loop delay. It should notice that the FFE is moved out from the timing loop and we adopt a modified delayed LMS [11] together with an inverse SRC (ISRC) to continuously update the coefficients of the FFE; moreover, we propose using an extra interaction-free path together with a timing recovery scheme to avoid the interaction between timing and equalization loops. The details are described as follows.

### A. The Design of SRC and ISRC

Fig. 3 shows the block diagram of the SRC and ISRC. For SRC, the sample selector is controlled by the NCO-based controller, i.e., *m*_{n} *T*_{i} with *m*_{n} = ⌊ *nT*/*T*_{i} ⌋, where ⌊·⌋ denotes the floor function; the coefficients of the sinc-interpolator are determined by the NCO-based controller, i.e., φ_{n} *T*_{i} with φ_{n} = *nT*/*T*_{i} − *m*_{n}, as well, and can be expressed as
TeX Source
$$c_k^{\phi_n}= \left.{\sin(\pi t/T_i)\over \pi t/T_i}\right\vert_{t=\phi_nT_i}\eqno{\hbox{(1)}}$$For ISRC, the sample selector is control by *m*_{k} *T* with *m*_{k} = ⌊ *kT*_{i}/*T*⌋; the coefficients of the sinc-interpolator are determined by φ_{k} *T* with φ_{k} = *kT*_{i}/*T*− *m*_{k} and can be expressed as
TeX Source
$$c_n^{\phi_k}= \left.{\sin(\pi t/T)\over \pi t/T}\right\vert_{t=\phi_k T}\eqno{\hbox{(2)}}$$Fortunately, since the structure of SRC and ISRC are “symmetric”, we can simply set φ_{n} *T*_{i} = mod {−φ_{k} *T*, 1}, where mod{·, 1} denotes the modulo-1 operation, and set the action of the sample selector of the ISRC just opposite that of the SRC. In other words, if the SRC determines to interpolate one more sample then the ISRC will determine stop interpolating a sample at that time. As shown in Fig. 4, we assume that *T*_{i}/*T* = 2/3. The interpolator will interpolate one more sample and nothing when φ = 0 and φ = 1, respectively; otherwise, it produces one sample at a time. Such implementation approach can reduce the hardware cost. It should be noticed that we use time index n and k to represent the signals that are sampled at an equivalent rate of 1/*T* and 1/*T*_{i}, respectively, in this paper.

### B. Design of Interaction-Free Loops

The key to solve the interaction issue is to decouple the timing loop and equalization loop. The simplest way to do this is using the signal before the FFE to estimate the timing error. However, the TED needs synchronous data to estimate the timing error. Therefore, extra delay-matching and SRC elements are required. We delay the input of SRC by *D*_{F} symbols (*T*_{i}-spaced), which are determined by the delay of the FFE. The delayed input then apply to the extra SRC such that its outputs *x*_{T}[*n*] are equivalent to the *T*-spaced symbols.

### C. Proposed AD-LMS Adaptation Algorithm

After decoupling the timing loop from the equalization loop, we can correctly train the coefficients of the FFE. It should be noticed that the tap inputs and error signals used in an adaptation algorithm need to be aligned in sampling rate, otherwise this algorithm will fail. We adopt the ISRC to translate the *T*-spaced error signals *e*_{T}[*n*] to the *T*_{i}-spaced version *e*_{Ti}[*k*]. In addition, we propose using an asynchronous delayed LMS (AD-LMS) algorithm such that the delay that is induced by the delay of FFE *D*_{F}, of SRC *D*_{S}, and of ISRC *D*_{I} are taken into account. In general, the delay of SRC and ISRC are not an integer. Because both SRC and ISRC are controlled by the same parameter φ with opposite sign, the delay of SRC and ISRC can be expressed as
TeX Source
$$\eqalignno{D_S &= D_{S_I} + D_{S_F}\cr D_I &= D_{S_I} + D_{S_F}&\hbox{(3)}}$$where *D*_{SI} and *D*_{SF} denotes the integral and fractional part of the delay of the SRC, respectively. The AD-LMS algorithm can be expressed as
TeX Source
$$\eqalignno{{\bf w}[k+1] &= {\bf w}[k] + \mu_1e_{T_i}[k-D]{\bf x}[k-D]\cr{\rm with}\ e_{T_i}[k] &= ISRC \{\hat{a}[n]-\tilde{a}[n]\}\cr\tilde{a}[n] &=SRC\{{\bf x}^T[k]{\bf w}[k]\} - b[n-D_F-D_{S_I}],&\hbox{(4)}}$$where **w** is an *N*_{F} × 1 coefficient vector of the FFE, **x** is an *N*_{F} × 1 tap-input vector of the FFE, *e*_{Ti} is an asynchronous error signal, and μ_{1} is a step-size parameter; *SRC* { · } and *ISRC*{ · } denote the SRC and ISRC operation, respectively; *D* = *D*_{F} + *D*_{S} + *D*_{I} = *D*_{F} + 2*D*_{SI} denotes the delay parameter; *b*[*n*] is the output of the FBE.

The FBE coefficients are adaptively updated according to a conventional LMS algorithm. It can be expressed as
TeX Source
$$\eqalignno{{\bf m}[n+1] &={\bf m}[n]+\mu_2e_T[n]{\bf y}[n]\cr{\rm with}\ e_T[n] &=\hat{a}[n] - \tilde{a}[n],&\hbox{(5)}}$$where **m** is an *N*_{B} × 1 coefficient vector of the FBE, **y** is an *N*_{B} × 1 tap-input vector of the FBE, *e*_{T} is a synchronous error signal, and μ_{2} is a step-size parameter;

### D. Timing Recovery Scheme

We extend the timing recovery scheme proposed in [10] by using two types of TED: mMM-TED [12], [13] and B-TED [14] in different timing phases.

In *timing phase I*, we exploit the extra interaction-free path so that the mMM-TED could estimate the timing error information χ by the following equation:
TeX Source
$$\eqalignno{\chi[n] &=\tilde{a}[n]x_T[n-1] - \tilde{a}[n-1]x_T[n]\cr&+{1\over 2}(\tilde{a}[n]x_T[n-2]-\tilde{a}[n-2]x_T[n]),&\hbox{(6)}}$$At the end of this phase, the FFE will be well-trained.

In *timing phase II*, the B-TED exploits the detection output and postcursor ISI to estimate the timing error information χ by the following equation:
TeX Source
$$\eqalignno{\chi [n] &=-e_T[n-1](d[n]-d[n-2])\cr{\rm with}\ d[n] &= b[n]+\hat{a}[n],&\hbox{(7)}}$$In this phase, the input signal of TED is taken after the FFE to obtain a cleaner signal and hence the timing jitter is reduced. It should notice that the coefficients of FFE are fixed to avoid the interaction at this phase.

SECTION III

## Simulation Results With 10GBASE-T Application

We apply the proposed architecture with reduced loop delay to 10GBASE-T systems [15] which support a mode of non-loop-timed STR. The channel model of the insertion loss (IL) is obtained in [16]. The baud rate is 800 MHz. We assume that the sampling frequency offset (SFO) is varied from 50 to 2000 parts per million (ppm) and both the SRC and ISRC are sinc-interpolator with tap length of 30; the received SNR is 33 dB; *N*_{F} = 75 and *N*_{B} = 16.

According to the Bode stability criteria ([17], chap. 4), a stable control loop requires a positive phase margin. Moreover, a low phase margin will cause peaking in the closed-loop gain near the unity-gain frequency. This peaking increases the noise in that frequency range even more, thus increasing the total output noise. The phase margin and gain peaking comparisons for the conventional and the proposed approaches are shown in Fig. 5. Our approach has higher phase margin and lower gain peaking than the conventional approach when SFO is high in particular.

The comparisons of the loop behavior during the *timing phase I* are shown Fig. 6. It is observed that the interaction-free path successfully decouples the timing and equalization loops. In contrast, for the conventional approach, these two loops interfere with each other and finally both loops cannot converge.

The LF output comparison for the proposed architecture during *timing phase I* and *timing phase II* is shown in Fig. 7. As we expected, the variation of LF output during *timing phase II* is roughly 13.73 dB less than that of LF output during *timing phase I*. Therefore, the timing jitter is reduced. The LF output is about at −5 × 10^{−4}, i.e., 500 ppm, at the end of *timing phase II* so that the SFO is correctly compensated.

SECTION IV

## Conclusions and Future Works

In this paper, we have presented an all-digital receiver architecture such that the timing loop delay is reduced and proposed an corresponding timing recovery scheme such that the timing jitter is reduced. A more complicated AD-LMS has been proposed to pay for the reduced loop delay. The novelty of the proposed architecture is to add the extra interaction-free path so that the timing loop and equalization loop do not interfere with each other. In addition, our timing recovery scheme can further reduce the variation of the LF output by roughly 13.73 dB which is paid by fixing the coefficients of FFE. The future works will focus on the coordination between timing and equalization loops so that the all-digital receiver is able to combat channel variation by adaptively updating the coefficients of the FFE.

### Acknowledgment

The authors would like to thank the financial support provided by National Science Council (NSC), R.O.C., under Grant NSC96-2220-E-002-008.