IEEE Xplore At-A-Glance
  • Abstract

A Convolutional Code for On-Chip Interconnect Crosstalk Reduction

Interconnects are now considered as the bottleneck in the design of system-on-chip (SoC) since they introduce delay and power consumption. To deal with this issue, data-coding for interconnect power and timing optimization is a promising method. Based on some realistic observations on interconnect delay and power estimation, a new data-coding technique called “Convolutional Encoder for Crosstalk Reduction” (CECR) is proposed. It allows the reduction of delay, power consumption (including extra power consumption due to codecs) and noise for on-chip buses. The concept of the technique is to reduce the switching activity to its minimum considering the transmission of data on the encoded wires. Results show the technique efficiency for different technologies and bus lengths. The power consumption reduction can reach up to 12% for a 10 mm bus in the 65 nm technology and more if buses are longer. It also allows the acceleration of the data propagation of 20% and the reduction of the overall worst noise case transitions of 51%.

SECTION I

Introduction

TODAYS System-on-Chip (SoC) are more and more complex and require many computational resources, implying a large volume of data to be stored or to be transmitted. To transfer these data from memory to processor or from one processor to another, on-chip interconnect buses or networks have to be used. In the actual SoC, interconnect can represent up to 50% of the total power consumption [1]. Moreover, the transistor and wire dimension scaling has an increasingly strong impact on the propagation time and the energy due to wires [2]. Therefore, estimation and optimization of power and delay due to interconnections become a major issue in SoC design. Many works have been proposed in the past around interconnect power optimization at different abstraction levels [3], [4], [5], [6], [7], [8], [9]. Unfortunately most of the proposed techniques are not efficient for reducing the power consumption due to interconnects. This is first due to the fact that interconnect length used in their experimentations are not corresponding to nowadays SoCs ones. Therefore, the efficiency of most techniques designed for long interconnects is not valid anymore [10]. It is also due to the hardware complexity used by codecs of the published techniques which consumes too much power compared to the power reduction gain on buses.

Finally, previous works showed that, in the case of crosstalk effects, the impact of delay reduction is quite different of the power one.

In this paper, we propose a data-coding technique called “Convolutional Encoder for Crosstalk Reduction” (CECR) which improves power consumption, propagation time and noise for on-chip buses (A theoritical approach has been introduced in [11]). This paper is organized as follows. Section II presents the previous works that allowed us to develop the proposed optimization technique. The CECR technique and its hardware implementation are described in Section III. In Section IV, experimental results in terms of power consumption, propagation time and noise variation using the technique are discussed. The last section concludes this paper.

SECTION II

Preliminary

Crosstalk is the effect of the coupling capacitance between a victim wire and its neighboring wires and depends on their transitions and states.

First, results presented in [12] show that the transition classification (according to the crosstalk capacitance (Cc) presented by the victim wire) differ if power consumption or delay is considered. Table I, extracted from [12], shows that transition classification, from a power consumption point of view, starts with rising transitions followed by falling transitions. Therefore, a first key point for power optimization is to encode data such as falling transitions on the bus are achieved with the lowest crosstalk capacitance and thus consume less energy as possible.

Table 1
TABLE I Delay and Power Consumption Results According to the Transition Patterns. ↑ Represents a Rising Transition, ↓ a Falling Transition and—Means That the Wire Remains on a Stable Logical Level Gnd or Vdd. Cs Is the Wire-to-Substrate Capacitance and Cc Is the Crosstalk Capacitance

Secondly, when the activity profile of data stimuli files is analyzed, it can be noticed that applying performance optimization techniques on least significant bits (LSB) has a better impact in terms of power consumption reduction. This is due to the fact that the LSBs have the strongest activity as demonstrated in [13]. For instance, the Partial Bus Invert technique (Partial Bus Invert is Bus Invert [6] applied to LSBs) has better results in terms of power consumption reduction than the classical Bus Invert.

Finally, to be fair, the energy saving should take into acount the whole system (i.e., the codecs+the wires).

Based on this analysis, the next section introduces the concept of the CECR technique for the optimization of delay, power consumption and noise for on-chip buses.

SECTION III

Proposed Optimization Technique

The problem can be state on how to encode a sequence of words Ek of n bits in a sequence of word Sk of m bits (i.e., the m encoded wires) so that two consecutives values Sk and Sk+1 minimises both the hamming distance H(Sk, Sk+1) between two consecutives symbols (power dissipation minimization) and minimises the maximum delay D induced by crosstalk phenomena. In this paper, the maximum delay is fixed to 2 (i.e., the switching of two neighbouring consecutive bits is forbidden). In other words, SkSk+1 never contains two consecutives 1 (delay condition).

Somehow, this problem appears to be dual of the problem of error correcting code construction. In the later, the code construction tries to maximise the Hamming distance between codewords (or sequence of codewords). Since memory is involved (Sk and Sk+1 are related), the problem can be solved by looking for a rate r = n/m convolutional encoder that minimize the hamming distance between two consecutives symbols and that verifies the crosstalk condition. This general problem can be optimally solved for n = 2 and m = 3. In fact, the input symbol Ek can take 4 different values. Since SkSk+1 should not possess two consecutive 1, it takes its values among the set 000, 100, 010, 001, and 101. This list contains 5 symbols. Symbol 101 is discarded from the list because it correspond to a hamming distance of 2 between Sk and Sk+1. The four remaining symbols are then enough to code the 4 inputs values. The simplest encoder/decoder to use this coding/decoding scheme is given in the following equation (the aim is to transform the two input bit using a demux structure):Formula TeX Source $$\eqalignno{C_k(1) &= \bar{E}_k(1)E_k(0)\cr C_k(2) &= {E}_k(1)\bar{E}_k(0) &\hbox{(1)}\cr C_k(3) &= {E}_k(1){E}_k(0)}$$Generate a transition on the wire where Ck(m) equals 1:Formula TeX Source $$S_{k(m)} = S_{k-1(m)}\oplus C_{k(m)}\ {\rm for}\ m = 1,2,3\eqno{\hbox{(2)}}$$The value Sk(m)m = 1,2,3 are sent on the three wires.

At the receiver side, the decoding process is symmetrical:Formula TeX Source $$D_{k(m)} = S_{k-1(m)}\oplus S_{k(m)}\ {\rm for}\ m = 1,2,3\eqno{\hbox{(3)}}$$Note that, by construction, Dk(m) = Ck(m). Then, symbols Ek(0) and Ek(1) are computed as:Formula TeX Source $$\eqalignno{E_k(0) &= {D}_k(1)\ {\rm OR}\ D_k(3)\cr E_k(1) &= {D}_k(2)\ {\rm OR}\ D_k(3)&\hbox{(4)}}$$The architecture of the entire corresponding coding/decoding process is illustrated on Fig. 1.

Figure 1
Fig. 1. Architecture of the bus coding/decoding scheme.
SECTION IV

Experimental Results

For simulations, the buses have been modeled as a distributed RCΠ3 model considering crosstalk capacitances as defined in [12]. Experimental results have been obtained using a SPICE simulator (ELDO v5.7) for different technologies (130 nm, 90 nm and 65 nm) and for a full random data file and an image stimuli file (it is important to note that our results follow the same behavior for other data files such as music or speech). As each technology has a specific number of metal layers, SPICE simulations have been achieved on all metal layers from the lowest ones (mostly reserved for short wires) to the highest ones (mostly reserved for buses which is our topic of interest). As propagation time becomes critical on interconnects, some techniques are proposed in [14], [15] to accelerate the data propagation by inserting some buffers on wires. Thus, in our experimentations, buffered and non-buffered interconnects have been simulated. Power consumption results have been obtained by considering the extra power consumption due to codecs.

A. Effects on Delay

As said in the previous section, to minimize the switching activity the maximum hamming distance between two consecutives values is one, in other words only one wire between the three encoded one can switch. Thus the worst transition pattern considering delay will be a Cs+ 2Cc class, exactly (−,↑,−) or (−,↓,−).

SPICE simulation results show that, for a 65 nm technology and a 1 mm length, the worst case propagation delay is reduced of 20% when the CECR technique is used (the computed delay includes also coding and decoding logic).

B. Effects on Energy Consumption

As said in the preliminary section, applying techniques on bits that have the strongest activity leads to the best results. First, results will be presented by considering the worst consumption case which means applying the technique on a full random bit bus to see what is the best power consumption reduction the technique can bring. Then, results will be presented by applying the CECR solution on the 4 least significant bits (which have the same switching activity as random bits as shown in Section IV.C) of an 8-bit data transmission bus, this means that two coding/decoding blocks will be used. As MSB are more correlated, (i.e., the toggling probability is very low as demonstrated in [13]) applying the CECR technique on them will not have any real impact.

The coding and decoding process generate also power consumption that has to be taken into account to evaluate the quality of the proposed method. To do that, SPICE simulations have been performed according to the previous defined set of parameters.

First, as shown on Fig. 2, it can be noticed that the CECR technique efficiency increases with technology shrinking, which is a major issue for energy consumption reduction in current and future technologies.

Figure 2
Fig. 2. Energy consumption gain (in %) according to technology, metal layer and bus length for a full random and an image data flow.

Secondly, the CECR technique is already efficient on the lowest metal layers, but it is more efficient on the highest metal layers reserved in particular for long buses.

Thirdly, Fig. 2 shows that the longer the bus is, the higher the energy consumption reduction is. The energy consumption reduction can rise up to 12% for a full random data bus and up to 7% for a normal image data flow (i.e., results can reach upper values if buses are longer than the simulated ones (10 mm)). In state-of-the-art optimization techniques, the interconnect length used for experimental results are not often realistic. For instance, the technique used in [7] claims energy consumption reduction for a 7.5 cm bus length. In addition, results presented in [10] show that many coding techniques start to be efficient for very long buses because of the extra consumption due to codecs (e.g., 2 cm for Bus Invert). Moreover, many coding techniques ([8], [9] for instance) do not always take the extra consumption due to codecs into account, when presenting power consumption results for buses.

C. Effects on Switching Activity

When the codecs are applied on bits that have an average switching activity of 1/2 for data transmission cases (as shown in [13]), it can be noticed that, in average, the activity of the encoded wires is 1/4 as illustrated on Table II. In one cycle, 3/4 transistions occurs in average considering a block of three encoded wires; compared to the activity of 1 of the direct transmission of two bits (two wires with an individual probability of transistion of 1/2).

Table 2
TABLE II Activity of Each Bit of the Bus for a Full Random and an Image Data Flow When the Technique Is Applied or Not

D. Effects on Noise

The CECR technique can also bring a significant part in noise reduction on the encoded wires. By considering a wire which remain on a stable logical level, worst cases for noise are when its one or two neighbours are switching in the same direction [16] (i.e., (↑,GND, ↑) or (↓,Vdd, ↓) or (↑,GND,−) or (↓, Vdd,−) transitions). Fig. 3 presents the (↑, GND,↑) and the (↑, GND, −) cases, the victim wire is defined to be the central wire. These unwanted generated voltage noise peak (above GND or under Vdd) can cause errors, if its value is crossing the buffer threshold voltage at the end of the bus. As illustrated in Table III, using the CECR technique can bring a significant reduction of the overall worst cases transitions up to 51% for a random data flow. These results have been obtained by computing the number of transitions when the two neighbours are switching level simultenaously in the same direction.

Figure 3
Fig. 3. Noise generated on the victim wire according to its neighbours transitions.
Table 3
TABLE III Percentage of Worst Noise Cases Transition Reduction Compared to Different Data Flows
SECTION V

Conclusion

Based on some previous analysis for interconnect delay and power optimization, a new optimization technique called “Convolutional Encoder for Crosstalk Reduction” (CECR) is presented. This technique aims at lowering as less as possible the switching activity on the most consuming wires (i.e., the LSB) for on-chip data buses. After the presentation of the concept of the technique, one implementation has been proposed. Then, experimental results in terms of power consumption, delay, switching activity and noise reduction using three different technologies and their associated metal layers for different technological parameters variation are presented. Results are presented for technologies and bus length used in todays SoCs. The energy consumption reduction can reach up to 12% for a 10 mm bus in the 65 nm technology and more if buses are longer. It also allows the acceleration of the data propagation by 20% and the reduction of the overall worst noise case transitions by 51%.

Acknowledgment

This work has been supported by the European Union and the Brittany Region in the context of Programme Objectif 2 Bretagne 2000–2006.

Footnotes

Antoine Courtay, Emmanuel Boutillon and Johann Laurent are with the Universite´ Europe´enne de Bretagne-UBS Lab-STICC, Lorient, France Email: Antoine.Courtay@univ-ubs.fr, Emmanuel.Boutillon@univ-ubs.fr, Johann.Laurent@univ-ubs.fr.

References

1. Interconnect-power dissipation in a microprocessor

N. Magen, A. Kolodny, U. Weiser, N. Shamir

Proceedings of the International Workshop on System Level Interconnect Prediction, 2004, 7–13

2. The future of wires

R. Ho, K. Mai, M. Horowitz

Proceedings of the IEEE, vol. 89, issue (4), p. 490–504, 2001

3. A bus delay reduction technique considering crosstalk

K. Hirose, H. Yasuura

Proceedings of the Conference on Design, Automation and Test in Europe, 2000, 441–445

4. Crosstalk Noise Immune VLSI Design Regular Layout Fabrics

S. P. Khatri, R. K. Brayton, A. L. Sangiovanni-Vincentelli

Hingham, MA, USA
Crosstalk Noise Immune VLSI Design Regular Layout Fabrics, Kluwer Academic Publishers, 2001

5. Saving power in the control path of embedded processors

C. L. Su, C. Y. Tsu, A. M. Despain

IEEE Design & Test of Computers, vol. 11, issue (4), p. 24–31, 1994

6. Bus-invert coding for low-power I/O

M. R. Stan, W. P. Burleson

IEEE Trans. on Very Large Scale Integration Systems, vol. 3, issue (1), p. 49–58, 1995

7. Low power chip interface based on bus data encoding with adaptive code-book method

S. Komatsu, M. Ikeda, K. Asada

Proceedings of the 9th IEEE Great Lakes Symposium on VLSI, 1999, 368–371

8. Asymptotic zero-transition activity encoding for address busses in low-power microprocessor based systems

L. Benini, E. Micheli, E. Macii, D. Sciuto, C. Silvano

Proceedings of the 7th IEEE Great Lakes Symposium on VLSI, 1997, 77–82

9. Area efficient temporal coding schemes reducing crosstalk effects

J. M. Philippe, S. Pillement, O. Sentieys

Proceedings of the International Symposium on Quality Electronic Design, 2006, 334–339

10. Why transition coding for power minimization of on-chip buses does not work

C. Kretzschmar, A. K. Nieuwland, D. Muller

Proceedings of the Conference on Design, Automation and Test in Europe, 2004, 10512–10517

11. Two-dimensional crosstalk avoidance codes

X. Wu, Z. Yan, Y. Xie

Proceedings of the IEEE Workshop on Signal Processing Systems, 2008, 106–111

12. High-level interconnect delay and power estimation

A. Courtay, O. Sentieys, J. Laurent, N. Julien

J. Low Power Electronics, vol. 4, issue (1), p. 21–33, 2008

13. Architectural power analysis: The dual bit type method

P. E. Landman, J. M. Rabaey

IEEE Trans. on Very Large Scale Integration Systems, vol. 3, issue (2), p. 173–187, 1995

14. Optimal interconnection circuits for VLSI

H. B. Bakoglu, J. D. Meindl

IEEE Trans. on Electron. Devices, vol. 32, issue (5), p. 903–909, 1985

15. Low-power repeaters driving RC and RLC interconnects with delay and bandwidth constraints

G. Chen, E. G. Friedman

IEEE Trans. on VLSI, vol. 14, issue (2), p. 161–172, 2006

16. Efficient coupled noise estimation for on-chip interconnects

A. Devgan

Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 1997, 147–153

Authors

No Photo Available

Antoine Courtay

No Bio Available
No Photo Available

Emmanuel Boutillon

No Bio Available
No Photo Available

Johann Laurent

No Bio Available

Cited By

No Citations Available

Keywords

INSPEC: Non-Controlled Indexing

No Keywords Available

Authors Keywords

No Keywords Available

More Keywords

No Keywords Available

Corrections

No Corrections

Media

No Content Available

Indexed by Inspec

© Copyright 2011 IEEE – All Rights Reserved