Error-Backpropagation-Based Background Calibration of TI-ADC for Adaptively Equalized Digital Communication Receivers

A novel background calibration technique for Time-Interleaved Analog-to-Digital Converters (TI-ADCs) is presented in this paper. This technique is applicable to equalized digital communication receivers. As shown in the literature, in a digital receiver it is possible to treat the TI-ADC errors as part of the communication channel and take advantage of the adaptive equalizer to compensate them. Therefore calibration becomes an integral part of channel equalization. No special purpose analog or digital calibration blocks or algorithms are required. However, there is a large class of receivers where the equalization technique cannot be directly applied because other signal processing blocks are located between the TI-ADC and the equalizer. The technique presented here generalizes earlier works to this class of receivers. The error backpropagation algorithm, traditionally used in machine learning, is applied to the error computed at the receiver slicer and used to adapt an auxiliary equalizer adjacent to the TI-ADC, called the Compensation Equalizer (CE). Simulations using a dual polarization optical coherent receiver model demonstrate accurate and robust mismatch compensation across different application scenarios. Several Quadrature Amplitude Modulation (QAM) schemes are tested in simulations and experimentally. Measurements on an emulation platform which includes an 8 bit, 4 GS/s TI-ADC prototype chip fabricated in 130nm CMOS technology, show an almost ideal mitigation of the impact of the mismatches on the receiver performance when 64-QAM and 256-QAM schemes are tested.

Although the CE solves the compensation problem, there is 91 still a problem with the adaptation of the CE, because slicer 92 error components associated with different interleaves are 93 also combined by the signal pre-processing blocks. Thus, the 94 slicer error is not directly applicable to adapt the CE. To solve 95 the adaptation problem, in this work we propose to adapt the 96 CE using a post processed version of the error at the slicer 97 of the receiver. The post processing is based on the back-98 propagation algorithm [43], widely used in machine learning 99 applications [44]. Its main characteristic is that, in a multi-100 stage processing chain where several cascaded blocks have 101 adaptive parameters, it is able to determine the contribution 102 to the error generated by each one of these blocks and their 103 associated parameters for all the stages. Backpropagation is 104 used in combination with the Stochastic Gradient Algorithm 105 (SGD) to adjust the coefficients of the CE in order to mini-106 mize the slicer Mean Squared Error (MSE). The use of the CE 107 in combination with the backpropagation algorithm results in 108 robust, fast converging background calibration. As we shall 109 show, this proposal is not limited to the compensation of 110 individual TI-ADCs (which is the case for most calibration 111 techniques), but it extends itself to the entire receiver Analog 112 Front End (AFE), enabling the compensation of impairments 113 such as time skew, quadrature, and amplitude errors between 114 the in-phase and the quadrature components of the signal in 115 a receiver based on Phase Modulation (PM) or Quadrature 116 Amplitude Modulation (QAM). 117 Because ultrafast adaptation is usually not needed, the 118 backpropagation algorithm can be implemented in a highly 119 subsampled hardware block which does not require parallel 120 processing. Therefore, the implementation complexity of the 121 proposed technique is low, as will be discussed in detail. 122 Although the technique presented here is general and can be 123 used in digital receivers for different applications, the primary 124 example in this paper is a receiver for coherent optical com-125 munications. State of the art coherent optical receivers oper-126 ate at symbol rates around 96 Giga-Baud (GBd) and require 127 ADC sampling rates close to 150 GS/s and bandwidths of 128 about 50 GHz. In the near future symbol rates will increase to 129 128-150 GBd or higher, requiring bandwidths in the range of 130 65-75 GHz and sampling rates in the 200-250 GS/s range. 131 High-order QAM schemes (e.g., 64-QAM, 256-QAM and 132 higher) will be deployed to increase spectral efficiency [45]. 133 High-order modulation schemes increase the resolution and 134 overall performance requirements on the ADC. The benefits 135 of the proposed technique are experimentally verified using 136 64-QAM and 256-QAM schemes. 137 The rest of this paper is organized as follows. Section II 138 lists the requirements of calibration techniques suitable for 139 digital receivers. These requirements set this application apart 140 from more generic applications. Section II also compares 141 the technique proposed here with other state-of-the-art tech-142 niques in the light of said requirements. Section III presents 143 a discrete time model of the TI-ADC system in a Dual-144 Polarization (DP) optical coherent receiver. The error back-145 propagation based adaptive CE is introduced in Section IV. 146 Simulation results are discussed in Section V, while the 147 experimental evaluation is presented in Section VI. The hard-148 ware complexity of the proposed scheme is discussed in 149 Section VII and conclusions are drawn in Section VIII. The technique proposed in this paper meets all the above 202 requirements. In the following we compare it to some of the 203 state of the art calibration techniques presented in the tech-204 nical literature, with focus on the communications receiver 205 applications. The following is a broad categorization of the 206 most important techniques described in the literature and their 207 comparison with the one proposed in this paper: 208 • Group 1 (G1): Techniques based on the autocorrela-209 tion of the quantized signal [21], [22], [23], [24], [25]. 210 These techniques are well suited to estimating the sam-211 pling time errors, but do not provide information on 212 frequency response mismatches or mismatches affect-213 ing different TI-ADCs in the AFE of a QAM receiver. 214 This group of techniques satisfies Criterion 1 but not 215 Criteria 2, 3, and 4.

228
• Group 4 (G4): Techniques based on dither injec-229 tion [33], [34], [35], [36]. Dither injection techniques 230 are based on the addition of a known signal to the 231 sample that is being quantized in order to estimate the 232 calibration parameters of an individual TI-ADC. This 233 group may meet Criteria 1 and 4 but not 2 and 3.  Table 1 summarizes the comparison of the above tech-248 niques with the one proposed in this paper on the basis of 249 the Criteria 1 through 4 listed above.

250
It is important to notice that communications applications 251 of TI-ADCs enjoy an important advantage over more general 252 applications. This advantage is the availability of the global 253 optimality criterion referred to as Criterion 3 above, in other 254 words, the maximization of the SNR at the slicer. In general, 255 applications other than digital communications receivers lack 256 VOLUME 10, 2022 a global criterion such as the slicer SNR, whose optimiza-257 tion can be exploited to compensate the impairments of the 258 TI-ADC, or more generally, of the AFE. Therefore, TI-ADCs

289
where ω is the angular frequency, L is the fiber length,

293
where * denotes complex conjugate. Matrix J(ω) is unitary 294 (i.e., det(J(ω)) = |U (ω)| 2 + |W (ω)| 2 = 1, ∀ω) and models 295 the effects of the PMD. Chromatic dispersion is modeled as: where β 2 is related to the dispersion parameter D = 2πc λ 2 β 2 , 298 with c and λ being the speed of light and the wavelength, 299 respectively. 1 300 We highligth that a coherent receiver can compensate for 301 PMD and CD impairments without noise enhancement or 302 signal to noise ratio penalty [46].  k are assumed to be indepen-318 dent and identically distributed such E â = δ m−k with δ k being the discrete time 320 impulse function. We also defineŝ (H ) (t) andŝ (V ) (t) as the 321 complex signals at the receiver input for polarizations H 322 and V , respectively. Then, the noise-free complex electrical 323 signals provided by the optical demodulator can be expressed 324 1 Parameter D is expressed in ps/(nm*km), representing the differential delay, or time spreading (in ps), for a source with a spectral width of 1 nm traveling on 1 km of fiber. It depends on the fiber type, and in the absence of equalization it would limit the error-free bit rate or the transmission distance.

340
The four electrical signals s (1)  respectively. Every path gain/attenuation is modeled by is the gain error.

374
The quantizer is modeled as additive white noise with 375 uniform distribution since the resolution of the ADC is con- The digitized high-frequency samples can be written as 382 (see Appendix) (36), and q (i) [n] is the quantization noise. Errors and mismatches of the TI-ADC can be compensated 390 by using digital finite impulse response (FIR) filters applied 391 to each interleaved branch. In the case of a communica-392 tion receiver, the digitized signal could be applied to a 393 time-varying equalizer immediately following the TI-ADC 394 (see [1] for more details). The practical implementation of 395 this periodically time-varying equalizer is briefly addressed 396 in Section IV-A, and in more detail in [3].

397
Similarly to what was done in previous works [1], [2], [3], 398 [26], in the backpropagation-based architecture introduced in 399 this paper we propose to adaptively compensate the TI-ADC 400 mismatch, after the mitigation of the offset, using a filter with 401 an M -periodic time-varying impulse response: , L g is the 405 number of taps of the compensation filters, and w (i) [n] is the 406 DC offset-free signal given by

411
The adaptation algorithm of the CE as proposed in [1] 412 or [2] cannot be implemented in coherent optical communi-413 cation receivers. This is because of the presence of several 414 signal pre-processing blocks placed between the CE and the 415 slicers, such as the BCD or the MIMO FFE that compensates 416 VOLUME 10, 2022 444 where l = 0, · · · , L g − 1 and n 0 is an arbitrary time index 448 2 The structure of the CE shown in Fig. 4 can be extended to include the compensation of the quadrature error of the optical demodulator. This will be addressed in a future work. 3 Although the receiver DSP for wireline and wireless may include other algorithms, the technique presented here can be applied to them with minor modifications.
In high speed optical communication applications, the use 449 of parallel implementations is mandatory. Typically, a par-450 allelism factor on the order of 128 or higher is adopted. 451 Furthermore, given the number of interleaves of the TI-ADC 452 M , the parallelism factor P can be selected to be a multiple 453 of M , i.e., P = q × M with q an integer. In this way, 454 the different time multiplexed taps are located in fixed posi-455 tions of the parallel implementation, and we do not incur 456 significant additional complexity when compared to a filter 457 with just one set of coefficients (see [2] for more details). 458 The complexity of the resulting filter is similar to that of 459 the I/Q-skew compensation filter already present in current 460 coherent receivers [4]. Therefore, the typical skew correction 461 filter can be replaced by the CE without adding significant 462 penalties in area or power since the CE is also able to correct 463 time skew.

465
The filter coefficients of the impulse response in (14) are 466 adapted using the slicer error at the output of the receiver DSP 467 block. Let u (i) k be the equalized signal at the input of the slicer 468 (see Fig. 4). The latter is a quantization device that makes the 469 symbol decisionsã As usual (e.g., see [48]), in the analysis we assume that there 473 are no decision errors, 4 and thus we use a Since the slicer operates at 1/T sampling rate, a subsam-477 pling of T /T s is needed after the receiver DSP block. Then, 478 the total squared error at the slicer at time instant k is defined 479 as Let E{E k } be the MSE at the slicer with E{.} denoting the 482 expectation operator. In this work we iteratively adapt the real 483 coefficients of the CE defined by (14) by using the Least 484 Mean Squares (LMS) algorithm, in order to minimize the 485 MSE at the slicer: m,p is the L g -dimensional coefficient vector at 489 the p-th iteration given by  where β is the adaptation step, and ∇ g (i) of the T /2 receiver DSP block (see Fig 5) as  (17) can be expressed as  Then, we can derive an all-digital compensation scheme using 534 an adaptive CE with coefficients updated as where µ = αβ is the adaptation step-size. Moreover, it is 537 possible to estimate the DC offsets in the input samples, using 538 the backpropagated error defined in (24), as follows  Since channel impairments change slowly over time, the 552 coefficient updates given by (26) and (27) do not need to 553 operate at full rate, and subsampling can be applied. The 554 latter allows implementation complexity to be significantly 555 reduced. Additional complexity reduction is enabled by: 556 1) strobing the algorithms once they have converged, and/or 557 2) implementing them in firmware in an embedded processor, 558 typically available in coherent optical transceivers. Practical 559 aspects of the hardware implementation will be discussed in 560 Section VII.

562
In this section we address the convergence properties of the 563 proposed calibration algorithm, using the traditional LMS 564 algorithm as a reference. In particular, it is well known that 565 convergence of the latter is not affected by local minima 566 VOLUME 10, 2022 of the error surface where the gradient algorithm could get 567 trapped, because the error surface is quadratic. The pro-568 posed system is equivalent to a traditional LMS adaptive 569 filter [50] with the exception that the gradient is not cal- where except for the leakage factor (1 − µζ ) the adaptation 606 is equivalent to that of the traditional LMS.

608
In the following, we summarize the proposed error-609 backpropagation based background calibration algorithm.

635
Since the technique operates in background, steps 2 636 through 10 run continuously to enable tracking of param-637 eter variations caused by temperature, voltage, aging, etc. 638 This happens even after the algorithm reaches convergence. 639 However, since ultrafast compensation is usually not needed, 640 steps 7 to 10 can be implemented in a highly subsampled 641 hardware block. Consequently, the implementation complex-642 ity and power can be reduced.

644
In this section the proposed backpropagation based mis-645 match compensation technique is tested using computer 646 simulations. The simulation setup is shown in Fig. 6. 647 The simulated parameters are summarized in Table 2 Table 2.  Table 2.
An improvement in BER of one order of magnitude can be 696 achieved with the proposed technique. In particular, notice 697 that the serious impact on the receiver performance of the I/Q 698 time skew values of Table 2 is essentially eliminated by the 699 proposed CE with L g = 7 taps.

700
BER histograms for the receiver with and without the CE 701 in the presence of the combined effects are shown in Fig. 9. 702 Results of 4000 cases with random gain errors, sampling 703 phase errors, I/Q time skews, BW mismatches, and DC offsets 704 as defined in Table 2, are presented. Fig. 9 also depicts the per-705 formance of the CE with L g = 13 taps. Without CE, a severe 706 degradation on the receiver performance as a consequence of 707 the combined effects of the TI-ADC mismatches is observed. 708 However, note that the CE is able to compensate the impact of 709 all combined impairments improving the BER in some cases 710 by almost 100 times. Moreover, note that a slight performance 711 improvement can be achieved increasing the number of taps 712 L g from 7 to 13.

713
In multi-gigabit transceivers, the impairments of the AFE 714 and TI-ADCs change very slowly over time, as mentioned 715 in Section IV-C. Hence, decimation can be applied since the 716 coefficient updates given by (26) and (27) do not need to 717 be made at full rate. In ultra high-speed transceiver imple-718 mentations (e.g., for optical coherent communication), block 719 processing and frequency domain equalization based on the 720 Fast Fourier Transform (FFT) are widely used [4]. Therefore, 721 we propose to update the CE performing block decimation 722 over the error samples. The procedure is detailed as follows. 723 Let N be the block size in samples to be used for implement-724 ing the EBP. Define D B as the block decimation factor. In this 725 way, the CE is updated using only one block of N consecutive 726 samples of the oversampled slicer error (25) every D B blocks, 727 FIGURE 9. Histogram of the BER for 4000 random cases with combined impairments as defined in Table 2. Reference BER of ∼ 1 × 10 −3 . Top: without CE. Middle: CE w/L g = 7 taps. Bottom: CE w/L g = 13 taps.

740
We demonstrate the benefits of our proposal using a digital  which is implemented on a host computer. Multiple Pseudo-752 Random Binary Sequences (PRBSs) with configurable length 753 and seed are generated in the FPGA. The amplitude of the 754 symbols and the Additive White Gaussian Noise (AWGN) 755 can be set through the coefficients G S and G N , respec-756 tively. Then, we are able to evaluate different SNR scenar-757 ios. The symbol with added noise is sent to a commercial, 758 16-bit Digital-to-Analog Converter (DAC) board [56] using 759 an LVDS interface. The DAC synthesizes the samples at 760 1/T = 1 GS/s. This sampling rate is adopted due to limi-761 tations on the FPGA and DAC clocks. The communication 762 channel is modeled as a low-pass filter with a −3 dB cut-off 763 frequency of 650 MHz [57]. Figure 12 shows the measured 764 eye diagrams at the input and output of the channel with 765 Binary Phase Shift Keying (BPSK) modulation. Notice that 766 significant ISI is added by the channel. Although not explic-767 itly shown, the impact of the ISI is even more significant 768 for the higher order modulations used in the experiments, 769 such as 8-PAM/64-QAM and 16-PAM/256-QAM. This ISI 770 is an important part of the experiment since it enables the 771 verification of the backpropagation technique, as discussed 772 later in this section. On the receiver side, the signal is acquired 773 by the TI-ADC described in [58], operating at a sampling rate 774 of 2 GS/s (i.e., an oversampling ratio of T /T s = 2 is used 775 in the DSP blocks). The clocks for both DAC and ADC are 776 generated from a single 10 MHz clock reference. More details 777 of the experimental platform as well as the fabricated TI-ADC 778 can be found in [58], [59].

780
As explained before, the available experimental setup has one 781 TI-ADC. Therefore, a suitable signal for the coherent receiver 782 has to be assembled by combining four independent measure-783 ments. This is done by collecting one set of samples for each 784  The comparison of the BER curves for the receiver with 806 and without the proposed technique is shown in Fig. 14.

807
The performance of the receiver is severely affected when 808 the TI-ADC mismatch is not mitigated. A sampling phase 809 error of 4 % has been set for Fig. 14(a), whereas 1 % is set 810 for Fig. 14 although the mismatch in this case is much smaller than in 816 6 Mismatches exercised in our experiments are, as a percentage of the symbol rate or the sampling period of the receiver AFE, comparable to those observed in more advanced technology nodes and recently reported coherent optical transceivers (please see reference [60]). In other words, a more advanced technology enables at the same time: (1) higher sampling rates (96GS/S in [60]), and (2) smaller impairments (when measured in absolute units, such as picoseconds). The net result is that the relative impact on the receiver performance in our experiments is comparable to that experienced in more advanced technology nodes and state-of-the-art coherent transceivers.  the previous case. After enabling the proposed technique, the 817 performance of the receiver is restored to almost replicate the 818 case without mismatch. This result indicates that our proposal 819 is able to nearly eliminate the receiver penalty introduced by 820 the mismatches of the TI-ADC.

821
The spectrum comparison for a 972 MHz sinusoidal input 822 is shown in Fig. 15. Samples from one of the emulated chan-823 nels are collected before and after running the technique on 824 the communication setup of Fig. 11. Since the CE would not 825 adapt properly with a sinusoidal signal, for this experiment 826 the CE is frozen after being exercised with pseudo-random 827 64-QAM signals. Mismatches of ±4 %T in the sampling 828 phase and ±5 % of gain with respect to the unity are applied. 829 The input tone is identified with a , and spurs from mis-830 matches in the TI-ADC are marked with a ×. Notice that 831 the spurs caused by the mismatches among the interleaves 832 seriously degrade both the Signal-to-Noise-plus-Distortion 833 Ratio (SNDR) and Spourious-Free-Dynamic-Range (SFDR) 834 to 19.4 dBFS and 21.9 dBFS, respectively. After applying 835 the proposed technique, the performance of the TI-ADC is 836 boosted to 39 dBFS and 46.6 dBFS, for SNDR and SFDR, 837 respectively.

838
A comparison to other calibration techniques is reported 839 in Table 3. The technique proposed in this paper is the only 840 one that meets all four criteria established in Section II. 841 As stated there, these criteria are important for digital com-842 munication receivers. In addition, our technique is the one 843 that provides the largest improvement when the calibrated 844 high frequency SNDR (HF SNDR) is compared with the 845 uncalibrated one. The best way to compare performance of 846 calibration techniques in the context of a digital communica-847 tion receiver application is on the basis of the receiver BERs 848 before and after calibration (see Figure 14). In the publica-849 tions referenced in Table 3, this data is either not provided, 850   In these architectures, the parallelism factor P can be cho-885 sen to be a multiple of the ADC parallelism factor M , i.e., 886 7 The focus of our paper is the calibration technique proposed, and not the absolute performance of the ADC. The latter may be limited by effects that cannot be calibrated, such as random jitter, kT/C noise, thermal noise, etc.  Fig. 16). We highlight that the resulting filter 891 is equivalent in complexity to the I/Q-skew compensation 892 filter already present in current coherent receivers [4]. Since 893 the proposed scheme also corrects skew, the classical skew 894 correction filter can be replaced by the proposed CE without 895 incurring significant additional area or power.

898
A straightforward implementation of error backpropagation 899 must include a processing stage for each DSP block located 900 between the ADCs and the slicers. Typically these blocks 901 comprise the BCD, FFE, TR interpolators, and the FCR. All 902 these blocks can be mathematically modeled as a sub-case 903 of the generic receiver DSP block used in Section IV-C and 904 the Appendix. The EBP block is algorithmically equivalent 905 to its corresponding DSP block with the only difference that 906 the coefficients are transposed. Therefore, in the worst case, 907 the EBP complexity would be similar to that of the receiver 908 DSP block. 8 Since doubling power and area consumption is 909 not acceptable for commercial applications, important sim-910 plifications must be provided.

911
Considering that AFE and TI-ADC impairments change 912 very slowly over time in multi-gigabit optical coherent 913 transceivers, the coefficient updates given by (26) and (27)    Typically, a serial implementation requires that hardware such as multipliers be reused with variable numerical values of coefficients, whereas in a parallel implementation hardware can be optimized for fixed coefficient values. This results in a somewhat higher power per operation in a serial implementation. Nevertheless, the drastic power reduction achieved through decimation greatly outweighs this effect.  . Equivalent discrete-time model of the analog front-end and TI-ADC system with impairments for the signal component given by (35) (i.e., without DC offsets and quantization noise) for signal s (i ) (t ) with i = 1, 2, 3, 4.

973
Next we review the model of the TI-ADC with impairments 974 used in this paper (see Fig. 3). The effects of the sampling 975 time errors δ   The total impulse response of the m-th interleaved channel 983 is defined as  Therefore, it can be shown that the digitized high-995 frequency signal can be expressed as:   (34). 10 We highlight that the impact 1002 of both the AFE impairments and the M -channel TI-ADC 1003 mismatches are included in (35). Finally, the digitized highfrequency sequence is obtained by replacing (35) in (32)