By Topic

Networks-on-Chip, 2009. NoCS 2009. 3rd ACM/IEEE International Symposium on

Date 10-13 May 2009

Filter Results

Displaying Results 1 - 25 of 50
  • 3rd ACM/IEEE international symposium on networks-on-chip

    Page(s): I
    Save to Project icon | Request Permissions | PDF file iconPDF (100 KB)  
    Freely Available from IEEE
  • 3rd ACM/IEEE international symposium on networks-on-chip

    Page(s): i
    Save to Project icon | Request Permissions | PDF file iconPDF (160 KB)  
    Freely Available from IEEE
  • [Copyright notice]

    Save to Project icon | Request Permissions | PDF file iconPDF (55 KB)  
    Freely Available from IEEE
  • Organizing Committee

    Page(s): iii
    Save to Project icon | Request Permissions | PDF file iconPDF (86 KB)  
    Freely Available from IEEE
  • Program Committee

    Page(s): iv
    Save to Project icon | Request Permissions | PDF file iconPDF (100 KB)  
    Freely Available from IEEE
  • Steering Committee

    Page(s): v
    Save to Project icon | Request Permissions | PDF file iconPDF (94 KB)  
    Freely Available from IEEE
  • Additional reviewers

    Page(s): vi
    Save to Project icon | Request Permissions | PDF file iconPDF (95 KB)  
    Freely Available from IEEE
  • Message from the chairs

    Page(s): vii
    Save to Project icon | Request Permissions | PDF file iconPDF (77 KB)  
    Freely Available from IEEE
  • Table of contents

    Page(s): viii - xi
    Save to Project icon | Request Permissions | PDF file iconPDF (105 KB)  
    Freely Available from IEEE
  • Keynote 1 NoCs: It is about the memory and the programming model

    Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (86 KB)  

    CPUs are multicore (and multi-cache) supported by a coherent, global, shared memory model. FPGAs offer a vast number of distributed programmable function blocks and distributed memory blocks across distributed memory spaces. This presentation will discuss a hybrid computing architecture that unifies the development of applications for a combined CPU-FPGA platform. The proposed programming model is based on message passing (MPI) and distributed memory. NoCs are at the heart of the hybrid platform managing the control and data flows. NoCs are implemented through shared memory buffers on the CPU portion of the hybrid computing platform. On parallel hardware, NoCs are implemented as application-specific point-to-point networks exploiting the abundant routing and switching resources of the FPGA. NoCs enable application-specific memory models while keeping with standard, familiar programming models such as MPI. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • HiRA: A methodology for deadlock free routing in hierarchical networks on chip

    Page(s): 2 - 11
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (697 KB) |  | HTML iconHTML  

    Complexity of designing large and complex NoCs can be reduced/managed by using the concept of hierarchical networks. In this paper, we propose a methodology for design of deadlock free routing algorithms for hierarchical networks, by combining routing algorithms of component subnets. Specifically, our methodology ensures reachability and deadlock freedom for the complete network if routing algorithms for subnets are deadlock free. We evaluate and compare the performance of hierarchical routing algorithms designed using our methodology with routing algorithms for corresponding flat networks. We show that hierarchical routing, combining best routing algorithm for each subnet, has a potential for providing better performance than using any single routing algorithm. This is observed for both synthetic as well as traffic from real applications. We also demonstrate, by measuring jitter in throughput, that hierarchical routing algorithms leads to smoother flow of network traffic. A router architecture that supports scalable table-based routing is briefly outlined. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Using adaptive routing to compensate for performance heterogeneity

    Page(s): 12 - 21
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (860 KB) |  | HTML iconHTML  

    Scalable and power efficient multi-core architectures must be performance heterogeneous to accommodate semi-conductor parametric variations and non-uniform access to shared resources. Due to its rate matching, a NoC on a Voltage-Frequency Island architecture can connect cores without forcing each one to give up its own operating point for the chip-wide common worst case. With run-time adaptive routing and task-to-core mapping, a NoC can run at the average not the worst case network saturation bandwidth. These run-time processes compensate for variations because they match application resource requirements with heterogeneous cores and routers. We focus on adaptive routing that simultaneously combats communication load imbalance from on-die variations and application topology. We show that even with static, fixed task-to-core mapping on multi-core architectures affected by stochastic variations, our MATC router increases the expected saturation bandwidth by 7-25% vs Dimension Order router. With systematic variations, the improvements are 5-50%. These gains compensate for saturation bandwidth degradation due to manufacturing variations and help to reduce design guard-bands. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fault-tolerant architecture and deflection routing for degradable NoC switches

    Page(s): 22 - 31
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (419 KB) |  | HTML iconHTML  

    Networks-on-Chips (NoCs) provide inherent structural redundancy of on-chip communication pathways. This redundancy can be exploited to maintain connectivity even if some components of an NoC exhibit faults which will appear at an increasing rate in future chip generations. Based on a fine-grained functional fault model, error-detecting circuitry, and distributed online fault diagnosis, we determine the fault status of NoC switches, including their adjacent links. The remaining functionality of partly defective switches is utilized by a modified deflection routing algorithm to achieve graceful degradation of packet throughput. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptive stochastic routing in fault-tolerant on-chip networks

    Page(s): 32 - 37
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (343 KB) |  | HTML iconHTML  

    Due to shrinking transistor geometries, on-chip circuits are becoming vulnerable to errors, but at the same time on-chip networks are required to provide reliable services over unreliable physical interconnects. A connection oriented stochastic routing (COSR) algorithm has been used on one NoC platform that provides excellent fault-tolerance and dynamic reconfiguration capability. A probability model has been built to analyze the COSR algorithm. According to the model, the performance may be improved by implementing a self learning mechanism in each router. Thus a new adaptive stochastic routing (ASR) algorithm is proposed whereby each router learns the network status from acknowledgement flits and stores the outcomes in a routing table. Simulation of both algorithms reveals that the ASR algorithm shows a higher path reservation success rate and a larger maximal accepted traffic than the COSR algorithm. The simulations also show that the learning procedures are accurate and that both algorithms are fault-tolerant to intermittent/permanent errors. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Static virtual channel allocation in oblivious routing

    Page(s): 38 - 43
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (165 KB) |  | HTML iconHTML  

    Most virtual channel routers have multiple virtual channels to mitigate the effects of head-of-line blocking. When there are more flows than virtual channels at a link, packets or flows must compete for channels, either in a dynamic way at each link or by static assignment computed before transmission starts. In this paper, we present methods that statically allocate channels to flows at each link when oblivious routing is used, and ensure deadlock freedom for arbitrary minimal routes when two or more virtual channels are available. We then experimentally explore the performance trade-offs of static and dynamic virtual channel allocation for various oblivious routing methods, including DOR, ROMM, Valiant and a novel bandwidth-sensitive oblivious routing scheme (BSORM). Through judicious separation of flows, static allocation schemes often exceed the performance of dynamic allocation schemes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Analysis of worst-case delay bounds for best-effort communication in wormhole networks on chip

    Page(s): 44 - 53
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (720 KB) |  | HTML iconHTML  

    In packet-switched network-on-chip, computing worst-case delay bounds is crucial for designing predictable and cost-effective communication systems but yet an intractable problem due to complicated resource sharing scenarios. For wormhole networks with credit-based flow control, the existence of cyclic dependency between flit delivery and credit generation further complicates the problem. Based on network calculus, we propose a technique for analyzing communication delay bounds for individual flows in wormhole networks. We first propose router service analysis models for flow control, link and buffer sharing. Based on these analysis models, we obtain a buffering-sharing analysis network, which is open-ended and captures both flow control and link sharing. Furthermore, we compute equivalent service curves for individual flows using the network contention tree model in the buffer-sharing analysis network, and then derive their delay bounds. Our experimental results verify that the theoretical bounds are correct and tight. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Lookahead-based adaptive voltage scheme for energy-efficient on-chip interconnect links

    Page(s): 54 - 63
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (430 KB) |  | HTML iconHTML  

    This paper presents a novel adaptive voltage scheme based on a lookahead circuit that checks the transmitter buffer for data transitions. The advanced knowledge of incoming data patterns is used to adjust the link swing voltage, improving delay and energy performance. In the presented example system, a transition detection circuit is used to check the transmitter buffer for rising transitions (dasia0psila in cycle t, dasia1psila in cycle t+1). When a rising transition is detected, a higher supply voltage is applied to the driver for a small portion of the clock cycle to boost the rising edge delay, improving link performance. A lower voltage is used for all other transmissions, improving the delay performance of falling edge transitions and the link energy dissipation. For a 1 GHz link frequency, the proposed approach improves energy dissipation by 45% compared to a traditional two-inverter buffer. An energy savings of up to 15% is achieved compared to a previously proposed dual-voltage scheme. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Recursive partitioning multicast: A bandwidth-efficient routing for Networks-on-Chip

    Page(s): 64 - 73
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (360 KB) |  | HTML iconHTML  

    Chip Multi-processor (CMP) architectures have become mainstream for designing processors. With a large number of cores, Networks-on-Chip (NOCs) provide a scalable communication method for CMP architectures. NOCs must be carefully designed to meet constraints of power consumption and area, and provide ultra low latencies. Existing NOCs mostly use Dimension Order Routing (DOR) to determine the route taken by a packet in unicast traffic. However, with the development of diverse applications in CMPs, one-to-many (multicast) and one-to-all (broadcast) traffic are becoming more common. Current unicast routing cannot support multicast and broadcast traffic efficiently. In this paper, we propose Recursive Partitioning Multicast (RPM) routing and a detailed multicast wormhole router design for NOCs. RPM allows routers to select intermediate replication nodes based on the global distribution of destination nodes. This provides more path diversities, thus achieves more bandwidth-efficiency and finally improves the performance of the whole network. Our simulation results using a detailed cycle-accurate simulator show that compared with the most recent multicast scheme, RPM saves 25% of crossbar and link power, and 33% of link utilization with 50% network performance improvement. Also RPM is more scalable to large networks than the recently proposed VCTM. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Analytical modeling and evaluation of On-Chip Interconnects using Network Calculus

    Page(s): 74 - 79
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (131 KB) |  | HTML iconHTML  

    Network-on-Chip (NoC) has been proposed as an alternative to bus-based schemes to achieve high performance and scalability in System-on-Chip (SoC) design. Performance evaluation of On-Chip Interconnect (OCI) architectures is widely based on simulation which becomes computationally expensive, especially for large-scale NoCs. In this paper, a performance analysis model using Network Calculus is presented to characterize and evaluate the performance of NoC-based applications. The 2D Mesh on-chip interconnect is analyzed and main performance metrics such as end-to-end delay and buffer size requirements are computed and compared against the results produced by a discrete event simulator. The results shed more light on the potential of this analytical technique as a useful tool for NoC design and performance analysis. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Energy efficient application mapping to NoC processing elements operating at multiple voltage levels

    Page(s): 80 - 85
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (467 KB) |  | HTML iconHTML  

    An efficient technique for mapping application tasks to heterogeneous processing elements (PEs) on a network-on-chip (NoC) platform, operating at multiple voltage levels, is presented in this paper. The goal of the mapping is to minimize energy consumption subject to the performance constraints. Such a mapping involves solving several subproblems. Most of the research effort in this area often address these subproblems in a sequential fashion or a subset of them. We take a unified approach to the problem without compromising the solution time and provide techniques for optimal and heuristic solutions. We prove that the voltage assignment component of the problem itself is NP-hard and is in approximable within any constant factor. Our optimal solution utilizes a mixed integer linear program (MILP) formulation of the problem. The heuristic utilizes MILP relaxation and randomized rounding. Experimental results based on E3S benchmark applications and a few real applications show that our heuristic produces near-optimal solution in a fraction of time needed to find the optimal. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The design of a latency constrained, power optimized NoC for a 4G SoC

    Page(s): 86
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (88 KB) |  | HTML iconHTML  

    Network on-Chip (NoC) is being adopted by chip architects as a means to improve design productivity. As the number of modules connected to a bus increase, its physical implementation becomes very complex, and achieving the desired throughput and latency requires time consuming custom modifications. Conversely, NoCs are designed separately from the functional units of the system to handle all foreseen inter-module communication needs. Their inherent scalable architecture facilitates the integration of the system and shortens the time-to-market of complex products. In this work, we discuss and evaluate the design process of a NoC for a state-of-the-art system on-chip (SoC). More specifically, we describe our experience in designing a cost optimized NoC interconnect for a high-performance, power constrained 4G wireless modem. We focus on the power and performance aspects of various module mapping schemes, looking for a tradeoff that is characterized by a minimal power consumption that still meets the timing requirements of all targeted applications. Using a simulated annealing based mapping process, we place the system's modules on a grid, minimizing the dynamic energy consumed by the transmission of packets over the NoC. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance Evaluation of NoC Architectures for Parallel Workloads

    Page(s): 87
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (158 KB) |  | HTML iconHTML  

    Network-on-Chip is the state-of-the-art approach to interconnect many processing cores in the next generation of general-purpose processors. In this context, the problem is to choose NoC architectures capable of achieving high performance for parallel programs. Therefore, the main goal of this paper is to evaluate the performance of three NoC architectures using well-known parallel workloads. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Packet-level static timing analysis for NoCs

    Page(s): 88
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (61 KB) |  | HTML iconHTML  

    Networks-on-chip (NoCs) are used in a growing number of SoCs and multi-core processors, increasing the need for accurate and efficient modeling to aid the design of these highly-integrated systems. Towards this modeling goal, we present a methodology for packet-level static timing analysis in NoCs. Our methodology enables quick and accurate gauging of the performance parameters of a virtual-channel wormhole NoC without using simulation techniques and supports any topology, link capacities, and buffer depths. It provides per-flow analysis that is orders-of-magnitude faster than simulation while being both significantly more accurate and more complete than prior static modeling techniques. Our methodology is inspired by models of industrial flow-lines. Using a carefully derived and reduced Markov chain, the model can statically represent the dynamic network state and closely estimate the average latency of each flow. Use of the model in a placement optimization problem is shown as an example application of the method. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Increasing NoC power estimation accuracy through a rate-based model

    Page(s): 89
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (105 KB) |  | HTML iconHTML  

    This research work presents and compares two NoC power estimation models, one based on the volume of information transmitted in the network, and another based on the transmission rates of each router. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On-Chip photonic interconnects for scalable multi-core architectures

    Page(s): 90
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (146 KB) |  | HTML iconHTML  

    In this paper, we propose PROPEL, a photonic network-on-chip (NoC) that improves performance and power with energy-efficient opto-electronic components for future chip multiprocessors (CMPs). Our analytical and simulation results indicate that PROPEL improves throughput and reduces power over optical and electrical networks for various traffic traces while requiring fewer photonic components and devices. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.