UP-Down OLC: New One-Lambda Crosstalk Avoidance Code Design Based on 5-Wire Model

Crosstalk fault avoidance codings (CACs) is extensively utilized. Among promising existing CACs methods, the one which has attracted much attention due to worst-case crosstalk fault delay prevention is One-Lambda Codes (OLCs). Unfortunately, two challenging issues are coming their way: 1- Exact model is required to estimate an exact delay of wires to provide accurate OLCs. 2- OLCs codec suffers from high energy consumption and overhead caused by critical path and area occupation. In this research, an accurate probability model is introduced to overcome these problems by Accurate Crosstalk Model (ACM) prediction. If crosstalk faults show up, ACM can estimate the communication channels delay by using 5-wire delay. It can help to improve the recently proposed analytical model that suffers from a lack of accuracy. Next, we present an accurate UP-Down OLC (UD-OLC) based on the numeral system and can eliminate transition patterns, including OLC-induced patterns, completely. In comparison with the other state-of-the-art OLCs, the UD-OLC mapping algorithm effectively alleviates wires energy consumption and average latency by up to 61% and 27%. Experimental results also reveal that this an overhead efficient numeral-based method benefits from the overhead degradation in terms of dynamic power, area, and critical path by up to 52%, 45%, and 20%, respectively when we compared it to the others.


I. INTRODUCTION
The advancements in nanotechnology have made it possible to integrate a huge number of processing elements (PEs) into a single chip [1]. These PEs are responsible for processing by exchanging data between each other. through parallel and adjacent channels of wires between PEs [2], [3]. However, International Technology Roadmap for Semiconductors predicts that on a chip, PEs will be increased to 16000m/cm 2 till 2026 [1]. These unprecedented abundance of PEs require reliable communication architecture to send and receive data between each other to achieve an efficient performance.
However, using Nano-scale Very-Large-Scale Integration (VLSI) technologies in the manufacturing of communication channels have made reliability one of the main The associate editor coordinating the review of this manuscript and approving it for publication was Chaitanya U. Kshirsagar. obstacles in the design as well as the system implementation. One of these reliability challenges that severely affect the data reliability is crosstalk faults [4]. From electrical aspect, crosstalk is result of inductance and capacitance coupling along within long and adjacent wires of channels. Transition patterns in 3-wire model are classified into five different classes according to the relative delay of a wire [16].
Rising/falling delays and rising/falling speed-ups are the crosstalk consequences [5], [6]. These effects cause altering channel delay, the transmission of inaccurate and incorrect data as well as extra energy consumption in communication channels. The case is that the Crosstalk fault has a severe dependency on data when there are data traversal among PEs, the transition patterns on the wire impact the severity of this fault.
Different mechanisms at physical [7], [8] and transistor levels [9], impose high overheads and require accurate timing adjacent between the sender and receiver, respectively. However, at Register Transfer Level (RTL) [11], [17], numericalbased Crosstalk Avoidance Codes (CAC) [13] can efficiently reduce crosstalk by omitting particular transition patterns from wires. On the basis of disposed eliminated patterns, crosstalk avoidance codings can be divided based on the delay that can be reduced in the coding method. These codes are Forbidden Pattern Condition (FPC) [13], [15] codes, Forbidden Overlap Condition (FOC) [10], [16] codes, Forbidden Transition Condition (FTC) [11] codes and One Lambda Code (OLC) [10], [18]. OLCs are considered as the most effective types among the stated mechanisms. This method considers no opposite direction transition in neighboring wires. It means that something like a boundary exists between neighboring wires, which avoids transition patterns. In case there are only 11, 01, and 00 across this boundary, it is considered a boundary of 01-type. Otherwise, if only all-zero, one-zero, and all-one patterns appear, we call it a boundary of 10-type [10].
However, OLCs are faced with two problems: first, modeling of crosstalk delay suffers from the lack of accuracy, especially when the technology size is marching toward the Nano-meter regime [19], [25]. In fact, a recently proposed delay model does not consider the transition overlaps [16]. Second, the codec of OLCs, including encoder and decoder, impose high energy consumption, area overhead, and critical path.
To solve these two problems, this paper predicts the wires delay model by presenting a 5-wire model-based accurate delay model [20], allowing designers to choose more effective crosstalk mitigation mechanisms up to three times faster. In the next step, based on ACM an efficient numeral-based OLC called UP-Down (UD-OLC) is proposed. UD-OLC is an accurate model which removes OLC-induced transition patterns by utilizing a numeral system. Compared to the other well-known OLCs, UD-OLC mapping algorithm improves energy consumption and wire latency by up to 61% and 27%, respectively. On the other hand, experimental results reveal that UD-OLC can overcome other methods by improving dynamic energy consumption (by up to 52%), critical path (by up to 45%), and area occupation (by up to 20%).
To sum up, this paper aims to: 1-To overcome 3-wire delay model issue (with limitation on the transitions overlaps) and accelerate the evaluation phase of crosstalk tackling mechanisms, we present 5-wire model, called an Accurate Delay Model (ACM) to predict the wires delay model. Utilizing this analytical model is an impressive guide for designers to select more efficient approaches to crosstalk mitigation faster.
2-A novel coding mechanism-based 5-wire model called UP-Down (UD-OLC) is proposed that has exceeded other similar OLC CAC methods by improving the wires delay and codecs overheads in respect of energy consumption, area overhead, and critical path.
The organization of this paper is as follows: Section II presents the history of faults models, related work is reviewed in Section III. Our proposed model is presented in section.IV In Section V, we will be looking at the proposed OLC coding mechanism. Our evaluation outcomes are assigned to Section VI. In the end, the proposed research idea is concluded in Section VII.  Figure 1 shows the architecture of the communication channels with 5 wires. In each of these wires, we will have loading capacitance reduction among wires and ground when the feature size shrinks; however, this is different for the coupling capacitance because that capacitance increments between adjacent wires [3]. In case the former is surpassed by the latter, the transition timing delay on communication channels could be several times in magnitude in comparison with a single wire transition [3]. This delayed penalty is a function of the ratio of the loading and coupling capacitance, and it is named crosstalk delay. This is a challenging issue these days, and based on the recent projection of the International Technology Roadmap for Semiconductors (ITRS) [1], this can be identified as a grand challenge.

II. BACKGROUND: CROSSTALK
The main source of inefficiency in the communication channels is related to the Crosstalk fault, which grows proportionally to the wire length. It means that parallel and long wires have a higher chance of experiencing this kind of fault [4]. Crosstalk fault increases. Based on predictions, until 2026, the length of the total wire would reach 16000m/cm 2 , and it may increase the occurrence of the crosstalk fault.
Depending on the appeared transition pattern, the delay that occurred in wires is subject to change. This causes us to classify the delay of wires based on the type of transition patterns. Each of these classifications is called transition's class. In [16], crosstalk categories are demonstrated in a 3-wire model in the form of C0, C1, C2, C3, C4 classifications and their relative delay. C0 and C4 have the least and the most significant delays among all classes. However, this model suffers from limited accuracy due to utilizing 3-wire models; as a result, the overestimation of patterns delays is happened in [16], and we need to address the new classification based on 5 wires which is proposed in [14]. The key point behind this new classification is that it includes more wires and classifies the transition patterns with no overlapping delay ranges. VOLUME 10, 2022 This classification provides better performance resulting in solving the model delay limitation. In this classification, appeared transitions in channels can be classified into seven different categories: C0, C1, C2, C3, C4, C5and C6. CAC proposed based on these models is accurate enough to propose efficient crosstalk avoidance codes (CACs). This is one of the identifications that mentioned the CACs delays are not tightly controlled. Therefore, it encourages authors of [14] to take the problem into account other points of view and decide to utilize a higher number of wires aligned with classifying the transition patterns with no overlapping delay ranges. In [14] presents a new classification based on 5 wires. This classification is a finer classification that solves the limitation of Sotiriadis's delay model. In this classification, transitions that occurred in channels are classified into seven different categories: C0, C1, C2, C3, C4, C5 and C6. Table 1 shows the transition patterns based on a 5-wire model. 1) Due to the dependency on three wires, the model suffers from the lack of enough accuracy: the model overestimates the delays of patterns. 2) In some specific classes, the actual delay span intersections with other classes.

III. PREVIOUS WORK
Recently, dealing with crosstalk effects has been extensively explored [7], [15]. We can classify the proposed methods in different levels, physical, level, transistor level, and RTL level.
At the physical level [7], [8], shielding wires and inserting repeater between the segments of wires are proposed to remove crosstalk faults [7]. In this line, repeater insertion relies on the fact that the wire can be separated into several segments in which each part is obtained by non-inverting or inverting buffers [21].
At the transistor level, skewed transitions is utilized to lessen the delay uncertainty as a result of crosstalk coupling [9]. Indeed, skewed transitions generated relative delay to prevent simultaneous transition in different directions within neighboring wires.
Conversely, for crosstalk mitigation at the RTL level, Crosstalk Avoidance Codes (CACs) [11], [17] have been introduced as a leading contender. CACs reduce crosstalk faults by preventing specific transition patterns from different classes of transitions with less area occupation and energy consumption. In terms of area overhead, Coding the data has less overhead compared to the shielding techniques [21]. However, codec overheads are the challenging issue in the Crosstalk Avoidance Codes. To reduce CACS codecs overhead, channel partitioning is one potential way [11] which suffers from appearing transition classes in borders [22]. Numerical systems are another potential approach that can help to reduce codec overhead by utilizing mathematical notations.
In this approach, each codeword has its own weight and can represent numbers of a given set by using bases in a consistent manner [13]. As a result, utilizing a suitable numerical system plays a pivotal role in degrading the area occupation, the critical path of the codec modules, and improvement of energy consumption, and can easily decrease the decoder and encoder overheads. The efficient numerical-based CACs can be classified into different groups like Pattern Free (FPF), Forbidden Transition Free (FTF) [11] codes, Forbidden Pattern Free (FPF) [13], [15] codes, Forbidden Overlap Condition (FOC) [10], [16] codes and One Lambda Code (OLC) [10], [18] based on the transition patterns that they are preventing to occur.
FTC, FPF, FOC, and OLC omit the transitions of C 2 , C 2 , C 3 and C 1 , respectively [10]. FPF CACs can be generated by using Fibonacci-based [11], [12] and non-Fibonaccibased [15], [17] numerical systems. Fibonacci numerical system, Fibo-CAC, is one of the Fibonacci-based numerical systems [11]. FiboCAC produces code words utilizing a Fibonacci sequence. Improved-Fibonacci coding mechanism (hereafter referred to as ''Improved-Fibo-CAC'') is the other Fibonacci-based numerical system [11]. Like the previous numerical system, improved-Fibo-CAC uses the same bases of the Fibonacci sequence [23], but it has a difference. The penultimate bit position of the bases is duplicated compared to the Fibo-CAC numerical system. The other numerical-based CAC is Penultimate-Subtracted Fibonacci (hereafter referred to as ''PS-Fibo'') [24]. PS-Fibo generates FPF CACs using a non-Fibonacci-based numerical system. Recently, the OLC coding mechanisms Subtraction-based-Numeral (SubNum) [18] and WU-OLC [10] have been proposed. The symbols that are used in paper is shown in Table 2. However, to accelerate the evaluation phase of crosstalk tackling mechanisms, designers require to have comprehensive knowledge of the crosstalk faults effects. Despite these mechanisms, the coding mechanism is proposed in this paper has a lower area overhead and lower cost. On this subject, the aim of this paper is to present a precise delay model on the basis of 5-wire delay classification.

IV. ACCURATE 5-WIRED-BASED CROSSTALK MODEL (ACM)
The literature review shows that none of the crosstalk tackling mechanisms can entirely deter all classes of transition patterns. Hence, in spite of incorporating these mechanisms, producing various timing delays caused by crosstalk faults can increase the accuracy of crosstalk tackling mechanisms. In this respect, an analytical model can be proposed allowing the estimation of timing delay in channels in the presence of crosstalk faults. This analytical model should predict the delay of k-wire communication channels by expected computing numbers of classes of transition patterns. However, lately proposed analytical model does not have adequate accuracy, and it proposed to estimate a delay of 3-wire communication channels. An objective of this study is to present an Accurate Crosstalk Model used for delay estimation of 5-wire communication channels when crosstalk happens. In this model, more wires in the delay model and the relevant transition classification are considered; hence it's more accurate than previous models.
A probability model is presented, allowing the 5-wire communication channel timing delay estimation by considering the crosstalk fault effect. Using this model accelerates evaluating the delay of communication channels. Accordingly, ACM computes the expected number of transition patterns in all seven classes of 5-wire classification model [15] with the width of k bits. Hence, the expected number of transition patterns in all seven classes of the 5-wire classification model is computed, then in order to be formulated, it is expanded for all categories of harmful transition patterns in communication channels with the width of k bits. Table 3 presents all transitions when two tandem f 0 and f 1 data are in a 3-wire model. For example, in column A, '000' is assigned to data f 0 , and data f 1 can get any of eight alternative values given in rows. In column A (right), the transition appearing on the channel is shown. In this context, symbols ↑ and ↓ stands for transitions 0 → 1 and 1 → 0, respectively and -represents the situation in which no transition occurs.
In Table 4, he occurrence frequency and occurrence probability of transition pairs of Table 3 are given. In ACM the probability of occurrence of '0's and '1's are the same and equal to 1/2.
Considering Table 4, patterns of the 5-wired classification model can be generated by concatenating each of the mentioned transition patterns of Table 3  Therefore, the probability of one C6 pattern in the 6-wire can be obtained.
According to [15], the probability of 'i' patterns of class C6 is calculated in the following in k-wire model: where k and i stand for the widths of the communication channel and the number of expected C6 transition patterns appeared on the channel, respectively. To clear of the above equation, consider windows with fixed length 6. This windows show the locations that transition patterns  can take place. As an example, in Figure 2, for a 20-bit communication channel (k=20) These windows are shown. The frequency on windows is determined by k and equals to k 6 . Number of 'i' also varies between 0 and k 6 (0 ≤ i ≤ 3).
One of the problems model [19] and above equation, is that the overlap between windows are not considered. To resolve this issue, a parameter j is defined, varying from 0 to 5. in Figure 3, these windows are shown for different values of j. The frequency of these windows is determined by k, which is (0 ≤ i ≤ 15). Hence considering the overlapping windows equation 2, is rewritten as: In the following, expected number of C6 patterns appearing on a k-wire channel is: The probability of 'i' patterns of class C n (0 ≤ n ≤ 6) in k-wire communication channel can be obtained.
The expected number of C n patterns which occur on a k-wire channel is equal to: According to [15], Similarly to P(C6) and P(C5), using 4, the probability of one of C4, C3, C2, C1, and C0 patterns in 6-wire communication channel can be calculated. By exploiting equations 5 and 6, we can obtain the probability of having 'i' patterns of C class and C patterns appearing on a k-wire channel. Also, by using E(C 6 ), E(C 5 ), . . . and E(C 0 ), channel delay can be measured by: where D(C i ) is delay of C i transition. So, D(C i ) is computed by: where V 3 (L, D(C i )) denotes the transient signal at time D(C i ) and position L. By solving above equation for different transition, D(C i ) calculated for different C i .

V. WIRE MODEL-BASED ONE-LAMBDA CROSSTALK AVOIDANCE ALGORITHM
In this section, 1. An extensible numeral system called UP-Down OLC can be used for different channel widths, 2. An OLC codeword generator and mapping algorithm, Numerical systems-based mapping algorithms, and finally, the codec hardware model is proposed.

A. UP-DOWN ONE-LAMBDA CROSSTALK AVOIDANCE
The overheads of the codec module are directly affected by the OLC numeral system; hence, the coding overheads can be reduced by utilizing an efficient numeral system. In this regard, we propose a numeral system called UP-Down OLC (UD-OLC) that is overhead-efficient. Equation 9 represents generating bases for greater channel widths. It can be extended to any channel width.
However, to generate the UP-Down OLC code words, it is necessary to utilize a mapping algorithm. The algorithm which is used in the UP-Down OLC coding mechanism consist of three parts to map the data word d k d k−1 . . . d 2 d 1 to the equivalent OLC codeword c k c k−1 . . . c 2 c 1 where k is referred to the channel width. Values of d i and r i are determined when the algorithm proceeds for i steps. d i stands for the OLC codeword's ith wire, and r i represents the remaining input value utilized in the consequent step and calculated by the following equation, Figure 4 presents the calculation steps of the OLC code words. There are conditions provided by UP-Down OLC resulting in induced free OLC transition: VOLUME 10, 2022  1) UP-Down OLC is completed if, for every λ i , λ i ≤ 1 + i−1 j=1 λ j holds i.e, for input data words, a specific Up-Down OLC code word exists. 2) OLC codewords should be generated by the UP-Down OLC algorithm. Codewords generated by the UP-Down OLC algorithm contain altering zero-one and one-zero type boundaries. Proposed mapping algorithm prevents the appearance of 01 pattern at the boundaries of d 2i d 2i+1 because based on this algorithm at 2i'th stage if the equation d i+1 = 1 holds, c i and d 2i 'th stage output will equal to 1 and d 2i−1 , and the stage output is equal to 0 in case d 2i = 0. Hence prevents occurrence of 10 pattern at the boundaries of d 2i−1 d 2i .

VI. EXPERIMENTAL EVALUATIONS
The UP-Down coding mechanism is evaluated in this section. In this regard, this section includes 1. ACM Model validation, 2.The efficiency of the UP-Down OLC coding mechanism with regard to the codec overheads including area overhead, the critical path of codec, and energy consumption 3. energy consumption and wires' delay These results are compared with the well-known OLC coding. In this case, SPICE stimulation Compiler Design Synopsys tools are used [3]. In addition, we used SPEC2006 benchmarks.

A. ACM MODEL VALIDATION
To validate ACM, SPICE simulations are carried out in several different working conditions. In Figure 5, a 5-bit wire channel electrical model is demonstrated, which is deployed in simulations. This model considers both capacitance and inductive coupling, enabling to imitate the communication channels behavior in the deep submicron region accurately. As shown in Figure 5, each wire of a channel b i has load capacitance C iG , resistance R i and inductance L i . C ij and L ij are coupling capacitance and inductive coupling between wire b i and wire b j respectively [20]. Based on this model, each wire is coupled with its four adjacent neighbors.
In this section, simulations are done in HSPICE, and the obtained results are based on 10 metal layers using a 45nm technology [15]. We focus on global communication channels in the top metal layer. The wire parameters are achieved by structure 1 in Predictive Technology Model (PTM).
In each simulation, 8000 data bits are transferred across the wire channels. 8000 wire channels data bits are classified  into data with widths of 16, 32, 64, and 128 wire for 16-wire, 32-wire, 64-bit, and 128-bit wire channels, respectively. Figure 6 shows the delay of transmitted data when 250 data of 32-wire are traversed through the channel. During each data being transmitted, the Rise/Fall times of each wire are measured, and then the average of rising/fall times is computed as the delay of data transmission.
In Figure 7, a comparison is drawn between the average delays showed in simulations and our proposed model. Also, Figure 7 shows the model speed up and the error percentage of the model for various widths of the channel. Based on Figure 7, our proposed model provides a remarkable speed-up of about three orders of magnitude while having a percentage Error below 8%. This is while the model in [19], does not have enough accuracy in deep submicron technology. We utilize SPICE simulation to estimate energy consumption and our proposed coding mechanism's wire delay To model these wires on-chip, an interconnected predictive model (PIM) is exploited. In the simulations, for every channel wire b i a capacity, resistance and inductance of C iG , R i , and L i are Also wires b i−2 , b i−1 , b i+1 , and b i+2 cover the wire b i .
To be more accurate and align with standards, actual load flows are transferred to the wire from the extracted standards such as SPEC 2006 [20], including gcc, mcf, namd, soplex, h264, omnetpp, and aster benchmarks from gem5 simulator [21]. These bit flows are coded using coding mechanisms of UP-Down and Sub-Num, WU-OLC and their energy consumption is compared with energy consumption and wire delays.
The percentage of improving wires energy consumption at the presence of coding mechanisms of UP-Down, Sub-Num, and WU-OLC using various different benchmarks in different widths of the channel is shown in Figure. 8. A comparison is drowning between various channel widths ranging from 16-bit to 128-bit wire and an unused channel of coding mechanism. Due to tandem transferring in wires, wires' energy consumption within channels is dependent on switching activity. As OLCs occur from 01 to 10 or 10 to 01, both following coding methods and WU-OLC reduce the energy consumption of wires. However, reducing the energy consumption using UP-Down OLC coding is averagely 61% more than another coding mechanism. According to these results, UP-Down OLC can reduce the energy consumption of wires in different widths of the channel in different products of criteria. More precisely, the results demonstrate better switching activity reduction performance using OLCs about 8 to 11% in higher technologies.

B. ENERGY CONSUMPTION AND DELAY OF WIRES
In Figure 9, the wires' delay improvement in each channel width using the UP-Down mechanism is given in percent. Each bar shows a separate benchmark in the presence of proposed UP-Down OLC with respect to the channel to no coding channel Our simulations show that the UP-Down coding mechanism is better than Sub-Num and WU-OLC in reducing wire delay of channels. Based on the obtained results, UP-Down can improve by an average 61% with respect to Sub-Num and WU-OLC.

C. UP-DOWN OLC CODEC OVERHEAD
Hardware architecture of UP-Down OLC codec includes encoder and decoder for generating code words for k-wire. Based on the UP-Down OLC coding mechanism, the proposed method measures the value of the i'th wire of the OLC code i.e.,d i , at the i'th stage; also, it calculates the remainder of the it'h stage, i.e.,r i . In the proposed architecture, subtractor modules generate residuals that will be utilized in the coming stages, whereas comparator and multiplexer modules implement if-then-else commands in the algorithm. The UP-Down OLC decoder module calculates . . c 2 c 1 is the OLC code word and λ k λ k−1 . . . λ 2 λ 1 are the bases made on the basis of the numeral system performed for a k-wire channel. Hence, For an arbitrary value of k, the decoder module can simply be implemented. The 5-wire UP-Down OLC codeword 01111 that encoded with base 10302 means 1 × 0 + 0 × 1 + 3 × 1 + 0 × 1 + 2 × 1. Embedded codec modules that contain encoder and decoder. In this regard, the hardware overhead, including area occupation, energy consumption, and critical path of UP-Down OLC with respect to Sub-Num and OLC-WU is evaluated in this section.

1) AREA OCCUPATION OF CODEC
The imposed area overhead of codec to chips is evaluated by implementing the encoder and decoder using VHDL. The code is synthesized using the Design Compiler tool in 45 nm technology, and the area overhead of the codec is measured. The results of the area overhead of codec are in different channel wires from 8 to 128 wires is shown in Figure 10. We compare the results of UP-Down with Sub-Num and OLC-WU. These results confirm that the codec of UP-Down occupies less overhead than Sub-Num and WU-OLC by an average of 20% different channel widths.

2) ENERGY CONSUMPTION OF CODEC
For the purpose of evaluating the energy consumption of codecs, it is synthesized by the Design Compiler tool. Results   of dynamic and leakage energy consumption are shown in Figure 11 and Figure 12, respectively. Based on the result of the dynamic power of UP-Down OLC has improved codec by an average of 52%, 83%, respectively. Also, this improvement is increased by increasing the channel width.

3) CRITICAL PATH OF CODEC
Also, the critical path of the codec is evaluated in 45 nm technology. Regarding to the critical path, the UP-Down coding mechanism has a 45% improvement with respect to Sub-Num and WU-OLC. Results of the critical path are shown in Figure 13.

VII. CONCLUSION
In this paper, we show that the Crosstalk fault is the main source of inefficiency in the communication channels, and it grows proportionally to the wire length. To overcome this problem, we present an accurate probability model, called Accurate Crosstalk Model (ACM) prediction. In the presence of crosstalk fault, ACM can estimate the communication channels delay by using 5-wire delay. Generally, by having the number of transition patterns, ACM provides the delay prediction of the k-wire communication channel considerably faster, specifically in deep submicron technology. It can help improve the presented analytical model that is not accurate sufficiently. Then, based on this model, the UP-Down OLC coding mechanism is proposed. Compared to the other famous OLCs, UD-OLC improves energy consumption and average wires delay by up to 61% and 27%, respectively. Adding to it, from UD-OLC codec overheads reduction point of view, in comparison with other numeral-based OLCs, from the UD-OLC codec overheads reduction side, dynamic power consumption, critical path, and area occupation are reduced by 52%, 45%, and 20%, respectively. As a future work intelligence mechanisms can be used for CAC selection based on appearing on wires.