A CNRZ-7 Based Wireline Transceiver With High-Bandwidth-Density, Low-Power for D2D Communication

A novel high-speed transceiver based on 7 bit correlated non-return-to-zero(CNRZ-7) with high-bandwidth-density(bandwidth per unit length) and low-power for Die-to-Die(D2D) communication is proposed. In order to further improve the SNR and the bandwidth of the CNRZ-5 in D2D communication, a CNRZ-7 based transmitter matrix and receiver matrix are proposed firstly, which are derived from Walsh Hadamard(W-H) transform and inverse transformation. In addition, to reduce the power consumption of the transmitter, the encoding driver based on CNRZ-7 transmitter matrix is designed with source series terminated drivers(SST). To further improve the SNR of the receiver, the decoding circuit based on CNRZ-7 receiver matrix is designed with a special multi-input comparators(MIC), which contain equalizer circuits. This transceiver is designed with a 28nm CMOS technology and the core area is 0.66mm2. The post-simulation results show that this transceiver can operate at 280 Gb/s, and the data rate is 35 Gb/s/wire. The worst width of the receiver’s eye-diagrams is 0.45UI when the transceiver operates at a 50 mm PCB channel with a 10dB@20GHz insertion loss, and the total BER is less than 1E-15. The power consumption is 1.1pJ/b under a normal corner.

The parallel interface of a single-ended transmission has 80 weak equalization capabilities. In the standard package, the 81 total bandwidth of the D2D communication port is limited 82 by the channel transmission distance [9]. With frequency of 83 the signal is getting higher, the change of the signal edge is 84 getting faster. The bandwidth of the single-ended transmis-85 sion line cannot gradually meet the needs of D2D commu-86 nication, so the traditional D2D communication port adopts 87 differential parallel NRZ transmission. However, the parallel 88 interface pin efficiency of differential transmission is half 89 the efficiency of single-ended transmission parallel interface 90 pins, so its bandwidth density is much lower than that of 91 single-ended data transmission. The aggregate bandwidth is 92 promoted by increasing the number of channels, so the pin 93 efficiency of the chip becomes the primary reason for the low 94 bandwidth density of the D2D communication port. In advanced packaging, the advantages of single-ended par-97 allel transmission are high bandwidth density, low power 98 consumption and low error rate. In 2019, Intel introduced 99 AIB 1.2 protocol. As shown in Table 1, the protocol 100 uses EMIB advanced packaging technology to achieve up 101 to 24 single-ended data transmissions with a maximum 102 single-channel rate of 2Gb/s and the bandwidth density is 103 up to 150Gb/s/mm [10]. In 2020, TSMC and ARM col-104 laborated to launch the LIPINCON protocol. The protocol 105 enables single-ended data transmission in a silicon intermedi-106 ary using CoWoS packaging technology with a single channel 107 maximum rate of 8Gb/s and the bandwidth density is up 108 to 600 Gb/s/mm [11]. In February 2022, Intel, TSMC, Sam-109 sung, AMD, ARM, and Qualcomm jointly launched the UCIe 110 protocol. The protocol uses EMIB and CoWoS technologies 111 in advanced packaging technologies to achieve single-ended 112 data transmission of up to 64 channels in silicon intermedi-113 aries with single-channel rates of 32 Gb/s and the bandwidth 114 density is up to 1317 Gb/s/mm [12]. However, with the 115 increase of data rate, the anti-interference ability of single-116 ended signal becomes weaker, so the signal transmission is 117 more dependent on advanced packaging technology, resulting 118 in a high cost. Therefore, it is still a technical problem to 119 apply single-ended signal to standard packaging for D2D 120 communication.

122
To effectively increase the bandwidth density, high-pin-123 efficiency D2D communication interface technology with 124 VOLUME 10, 2022 chord coding has been used [13], [14]. This technology takes distribute 5 bits of energy to each channel, ensuring that the 138 binary eyes decoded by the receiver are the same (eye height 139 and eye width), as this helps the bathtub curves of the five 140 eye diagrams to be similar, making the worst eye opening 141 width wider than [17]. However, compared with ENRZ, the 142 CNRZ-5 encoded signal has an asymmetric waveform in the 143 output of the transmission. Therefore, it results in a low SNR 144 of the receiver signal and the data rate is promoted difficultly.

145
In order to further improve the SNR and the bandwidth   As mentioned above, due to higher bandwidth requirements, 182 D2D communication use differential signal transmission 183 methods. But the pin efficiency of differential parallel NRZ 184 transmission is half of the single-ended one. so its natural 185 properties determine that it is extremely difficult to increase 186 the bandwidth density for differential parallel NRZ transmis-187 sion. Chord coding solved the problem well.

188
To further explain the pin efficiency of signal transmission 189 and chord coding, we introduced the Walsh function, which is 190 composed of a square wave signal and is very convenient for 191 computer calculations [19]. It takes the following form (1). 192 R(k+1,t) is the arbitrary Rademacher function, g(i) is the 194 gray code of i, and g(i) k is the kth digit of this gray code which 195 is either 1 or -1. And P is a positive integer. This transfor-196 mation has been applied in digital circuits, such as switched-197 capacitor realization [20] and DSP [21], but in this work, 198 we apply it to analog design of string-coding circuit. When 199 P=1, (2) and (3) are sampled to obtain. The second-order 200 orthogonal matrix(4) is obtained.
As shown in the matrix (4), Walsh's matrix is orthogonal, 205 and the first column is all 1. We summarize the differential 206 signal transmission to the matrix transformation. Fig.2 shows 207 that the common-mode voltage Vcm is the first column of the 208 matrix, and the signal is encoded up and down the common-209 mode point. Thus, the differential signal transmission method 210 is the simplest chord coding and it is a way to solve the 211 SSN problem, because differential transmission can eliminate 212 the common mode noise(CMN) and greatly provide signal 213 quality. It can be observed that in the case of a certain band-215 width, the pin efficiency is only 50% relative to the 216 single-ended signal transmission. Therefore, while maintain-217 ing the advantages of differential signal transmission at high 218 speed, improving pin efficiency has become a technical key.   interval. Each 2nd-order Hadama matrix corresponds to a dis-232 crete Walsh function. It can be summarized as equation (10).
The matrix was proposed in 2013 and applied to ENRZ   of the four transmitter terminals are identical. The coding 261 method can work at higher rates and higher insertion loss 262 (20dB). However, as can be seen from Fig.5, when the 263 CNRZ-5 coding method is used, the encoding matrix and 264 decoding matrix are asymmetrical. The multi-level signal at 265 the transmitting end will appear as two kinds of unequal 266 height eye diagrams. SNR in CNRZ-5 of the transmitting sig-267 nal is lower than ENRZ encoding, limiting the rate increase of 268 D2D communication. Although the transmission method of 269 CNRZ-5 has higher pin efficiency, it sacrifices a certain SNR. 270 According to equation (12), WH matrix H H based on the 271 8th order is derived. D2D communication of the 8-channel 272 transmission of 7 bits data is designed, which improves the 273 pin efficiency to 87.5%Ṫhe eye diagram after encoding and 274 decoding is shown in Fig.6. This scheme avoids the unequal 275 height eye diagram of CNRZ-5, but the problem of low SNR 276 still exists while improving pin efficiency. As the signal rate 277 increases, the BER also increases. Therefore, in order to solve 278 this problem, the matrix A is used for row transformation(13) 279 which significantly improves the SNR when the information 280 VOLUME 10, 2022 of the signal is not lost.   is composed of −1, 0, and 1. In the encoding process shown 289 in Fig.7, the 7 bits signal is reasonably allocated to the eight 290 corresponding channels according to the transmitter matrix. 291 As shown in the post-simulation result in Fig.21, the num-292 ber of levels transmitted on each channel is four. Using the 293 transmitter matrix in Fig.7, 7 bits signal can be transmitted on 294 8 wires. Compared with the transmission scheme of CNRZ-5 295 (5bits/6wires), it not only improves the problem of the low 296 SNR of CNRZ-5, but also further improves pin efficiency.
To further illustrate the advantages of CNRZ-7, a factor 300 Q is introduced, which represents the SNR parameter of the 301 eye diagram [22]. The relationship between Q and SNR given 302 by (14), and the calculation method is (15). P t represents 303 the average value of the logic high level of the signal and 304 P b represents the average value of the logic low level. The 305 rms value σ 1 of high-level signal noise and the rms value σ 0 306 of logic low-level noise signal, the sum of which is the rms 307 value of signal noise. The Q factor is the ratio of the signal 308 amplitude to the noise amplitude. 309 We selected four kinds of signals( NRZ, ENRZ, CNRZ-5, 310 and CNRZ-7) for comparison, and selected the worst eye at 311 the transmitting end to analyze the Q factor. As shown in the 312 Fig.8(a), the simulation results indicate that:

313
(1) As the Nyquist-frequency(F N ) increases, the overall 314 SNR of the signal decreases. The SNR of CNRZ-7 is com-315 parable to that of ENRZ.

319
(3) At the same F N , the total bandwidth of CNRZ-7 is 320 7 times that of NRZ, 2.3 times that of ENRZ, and 1.4 times 321 that of CNRZ-5.

322
The quality of the SNR in transmitter determines the qual-323 ity of the decoded binary eye diagram at the receiver. Fig.8(b) 324 shows the receiver eye opening widths of four signals when 325 BER is less than 10E-15 under different insertion loss@F N . 326 It can be seen that at 10dB@20G, CNRZ-7 still maintains an 327 eye width of 0.45UI when BER is less than 1E-15.

328
Therefore, CNRZ-7 not only address the problem of the 329 low SNR on CNRZ-5, but also further improves pin efficiency 330 and bandwidth.

332
The architecture of the entire system is shown in Fig.9. 333 The overall structure includes two transmitters, two receivers 334 and a shared clock path. One of the transmitters com-   drivers. The characteristic impedance Z 0 of this channel is 384 50 . According to the impedance matching principle, the 385 output resistance of the driver should also be 50 . The output 386 resistance of the driver is given by (16).
To ensure the matching impedance and prevent the influ-396 ence of reflection on the circuit signal, four pairs of NMOS 397 and PMOS are introduced, using the resistance characteristics 398 of the MOS transistor to form a matching impedance circuit. 399 Four pairs of digitally controlled NMOS, PMOS, and SST 400 are used for matching and adjustment. The formed variable 401 resistance is controlled by a pair of opposite R< 0:3 > and 402 R_ < 0:3 >. The logical expression for the SST circuit is 403 given by (17). A(A=1/3) is the amplification factor of the 404 drive circuit. W 1 to W 7 links are presented in the form of 405 equation (18), (19), (20), (21), (22), (23) and (24).

407
The eight SST circuits are used for decoding and driving of 408 the circuit design, encoding the signal on eight lines. The 409 receiving end also includes seven decoding circuits-MIC, 410 as shown in Fig.12, which uses a differential structure to 411 decode the encoded 4-level signal into two-levels, as shown in 412 V OUT ,5 = [w 6 -w 7 ] × g m R (31) 427 Fig.13(a) shows an improvement in the traditional equal-428 izer, and the transfer function is (32). Fig.14 shows the chan-429 nel S12 parameters used for the simulation in this study. 430 The parameters of this channel are extracted from a real 431 50mm long organic substrate channel. At the 20GHz Nyquist 432 VOLUME 10, 2022    20GHz Nyquist sampling frequency.
By changing the W/L of the input PMOS in Fig.13(a), MIC 439 is formed as shown in Fig.13(b). The circuit structure can not 440 only decode binary signal, but also play a role of equalization. 441

442
The RX-PLL shown in Fig.15    and V5, V2 and V6, V3 and V7 form four differential pairs. 481 FIGURE 16. TEM-CODE block. Binary code (P 3 P 2 P 1 A 5 A 4 A 3 A 2 A 1 ) to thermometer code. (a) is the lower five bits coding to thermometer codes, (b) is the higher three bits coding to P<2:0>, (c) is the phase variation design. Fig.17(a) shows the specific circuit structure of the PI. R1 482 and R2 are load resistors with equal resistance values. Below 483 each differential-pair transistor is an equal-value parallel cur-484 rent source controlled by 32 switches. The PI is controlled 485 by the 8 bits digital information output of the CDA. P1, P2, 486 and P3 are 3 bits phase control codes, and the Gray is used to 487 avoid competition and risk. The lower five digits are used to 488 generate BIT0-BIT31 which control the thermometer code of 489 the phase. BIT0 is the lowest bit of the control code, which is 490 used to fine-tune the clock phase to effectively detect edge 491 information and its output has good linearity, as shown in 492 Fig.17.

493
At normal corner, the imitation curve of DNL and INL 494 controlled from 0 to 256(0 • to 360 • ) is shown in Fig.18. The 495 maximum value of DNL is 0.54LSB, and the maximum value 496 of INL is 0.68LSB.

498
The transceiver can work at a channel length of 50mm for 499 D2D communication. Fig.19 shows the circuit layout (TX and 500 RX) of a CNRZ-7 transceiver system and its total bandwidth 501 is 560Gb/s. RXs are located in the left half of the layout and 502 TXs are located in the right. There are 18 PADs in transceiver 503 for the data and clock. Two CNRZ-7 transceivers occupy 504 16 channels and share a common pair of differential clock 505 channels, as shown in the Fig.20.

506
As shown in Fig.20, the clock adopts a forward clock struc-507 ture, and a pair of differential clocks is located in the middle. 508 The top and bottom PADs are used for other signals and 509 power. The transceiver system uses a 9 metal-layer(9 ML) 510 28nm CMOS technology, occupying 0.66mm × 1mm area. 511 Fig.21 shows the eye diagrams(W 0 to W 7 ) from the trans-512 mitter. The simulation results show that the voltage swing 513 margin is 302 mV. The minimum eye-height is 99mV and the 514 maximum one is 102mV as shown in Fig.22. The eye linearity 515 is 1.03 from Equation(34) [23].         Table 3 shows the performance of the previous work com-545 pared to this one. There is a decrease in the area relative 546 to [17], [18], [24], and a slight decrease in power consumption 547 relative to [24]. This work increases bandwidth density up 548 to 848 Gb/s/mm, which can support an effective through-549 put of 35 Gb/s/wire for D2D communication. Compared to 550 [17], [18], CNRZ-7 exhibits greater advantages on bandwidth 551 density. With better encoding, it can support a higher insertion 552 loss(10dB) and lower BER.  Fig.25(a) shows the BER for 7 bits of data without the lay-554 out parasitic. The worst eyes eye-opening is 14ps(0.56UI) at 555 BER= 1E-15. But due to the limited time and data quantity of 556 simulation, the bathtub curve can only be simulated to 1E-11 557 VOLUME 10, 2022 after extracting the layout parasitic, as shown in Fig.25(b). 558 The worst one in Fig.25(b) is 11.2ps(0.45UI) at BER= 1E-15 559 by fitting and forecasting from Fig.25(a).