2016 IEEE 24th Annual Symposium on High-Performance Interconnects (HOTI)

24-26 Aug. 2016

Filter Results

Displaying Results 1 - 25 of 27
  • [Front cover]

    Publication Year: 2016, Page(s): c1
    Request permission for commercial reuse | PDF file iconPDF (937 KB)
    Freely Available from IEEE
  • [Title page i]

    Publication Year: 2016, Page(s): i
    Request permission for commercial reuse | PDF file iconPDF (97 KB)
    Freely Available from IEEE
  • [Title page iii]

    Publication Year: 2016, Page(s): iii
    Request permission for commercial reuse | PDF file iconPDF (133 KB)
    Freely Available from IEEE
  • [Copyright notice]

    Publication Year: 2016, Page(s): iv
    Request permission for commercial reuse | PDF file iconPDF (114 KB)
    Freely Available from IEEE
  • Table of contents

    Publication Year: 2016, Page(s):v - vi
    Request permission for commercial reuse | PDF file iconPDF (128 KB)
    Freely Available from IEEE
  • Message from the General Co-Chairs

    Publication Year: 2016, Page(s):vii - viii
    Request permission for commercial reuse | PDF file iconPDF (95 KB)
    Freely Available from IEEE
  • Message from the Technical Program Co-Chairs

    Publication Year: 2016, Page(s): ix
    Request permission for commercial reuse | PDF file iconPDF (93 KB)
    Freely Available from IEEE
  • Committee Lists

    Publication Year: 2016, Page(s):x - xi
    Request permission for commercial reuse | PDF file iconPDF (94 KB)
    Freely Available from IEEE
  • Technical Program Committee

    Publication Year: 2016, Page(s): xii
    Request permission for commercial reuse | PDF file iconPDF (93 KB)
    Freely Available from IEEE
  • Keynotes

    Publication Year: 2016, Page(s):xiii - xiv
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (96 KB)

    Provides an abstract for each of the keynote presentations and a brief professional biography of each presenter. The complete presentations were not made available for publication as part of the conference proceedings. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Invited Talks

    Publication Year: 2016, Page(s):xv - xvi
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (96 KB)

    Provides an abstract for each of the invited presentations and a brief professional biography of each presenter. The complete presentations were not made available for publication as part of the conference proceedings. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • [Panel] Many-core reality check - How increasing core counts, on-node networks, and deep integration will impact system interconnects

    Publication Year: 2016, Page(s): xvii
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (93 KB)

    Summary form only given, as follows. The complete presentation was not made available for publication as part of the conference proceedings. Node architectures have entered an era of intense innovation, with trends toward increasing numbers of devices per processor; integration of memory and the introduction of new memory technologies; and rapid increases in the scale of on-chip, on-package, and o... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Tutorials

    Publication Year: 2016, Page(s):xviii - xxvi
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (130 KB)

    Thes tutorials discusses the following: Accelerating Big Data Processing with Hadoop, Spark and Memcached over High-Performance Interconnects; Designing and Developing Performance Portable Network Codes; Data-Center Interconnection (DCI) Technology Innovations in Transport Network Architectures; Efficient Communication in GPU Clusters with GPUDirect Technologies. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Ensuring Deadlock-Freedom in Low-Diameter InfiniBand Networks

    Publication Year: 2016, Page(s):1 - 8
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (319 KB) | HTML iconHTML

    Lossless networks, such as InfiniBand use flow-control to avoid packet-loss due to congestion. This introduces dependencies between input and output channels, in case of cyclic dependencies the network can deadlock. Deadlocks can be resolved by splitting a physical channel into multiple virtual channels with independent buffers and credit systems. Currently available routing engines for InfiniBand... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scalable, Global, Optimal-bandwidth, Application-Specific Routing

    Publication Year: 2016, Page(s):9 - 18
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (273 KB) | HTML iconHTML

    High performance computing platforms can benefit from additional bandwidth from the interconnection network because there are many applications with significant communication demands. Further, many HPC applications expressed as MPI programs have stable communication patterns across runs. Ideally, one would like to exploit the stable communication patterns by using global routing of communication p... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Traffic Pattern-Based Adaptive Routing for Intra-Group Communication in Dragonfly Networks

    Publication Year: 2016, Page(s):19 - 26
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (299 KB) | HTML iconHTML

    The Cray Cascade architecture uses Dragonfly as its interconnect topology and employs a globally adaptive routing scheme called UGAL. UGAL directs traffic based on link loads but may make inappropriate adaptive routing decisions in various situations, which degrades its performance. In this work, we propose to improve UGAL by incorporating a traffic patternbased adaptation mechanism for intra-grou... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improvements to the InfiniBand Congestion Control Mechanism

    Publication Year: 2016, Page(s):27 - 36
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1127 KB) | HTML iconHTML

    The InfiniBand Congestion Control mechanism (IB CC) is able to reduce the negative consequences of congestion in many situations. However, its effectiveness depends on a set of parameters that must be set by administrators. If the parameters are not appropriately configured, IB CC could negatively impact network performance. Additionally, no one has been able to find a universal parameter setting ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scalable High-Radix Modular Crossbar Switches

    Publication Year: 2016, Page(s):37 - 44
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (377 KB) | HTML iconHTML

    Crossbars are a basic building block of networks on chip that can be used as fast, single-stage networks or in router cores for larger scale networks. However, scaling crossbars to high radices presents a number of efficiency, performance, and area challenges. Thus, we propose modular flow-through crossbar switch cores that perform better at high radices than conventional monolithic designs. The m... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Clos-Network Switch Architecture Based on Partially-Buffered Crossbar Fabrics

    Publication Year: 2016, Page(s):45 - 52
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (412 KB) | HTML iconHTML

    Modern Data Center Networks (DCNs) that scale to thousands of servers require high performance switches/routers to handle high traffic loads with minimum delays. Today's switches need be scalable, have good performance and-more importantly-be cost-effective. This paper describes a novel three-stage Clos-network switching fabric with partially-buffered crossbar modules and different scheduling algo... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Race Cars vs. Trailer Trucks: Switch Buffers Sizing vs. Latency Trade-Offs in Data Center Networks

    Publication Year: 2016, Page(s):53 - 59
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (222 KB) | HTML iconHTML

    This paper raises the data center designers question of trade-off between high-buffer switches versus low-latency switches. Packet buffer hardware dictates this trade-off due to the constraints of DRAM and SRAM technologies. While the designers who prefer network robust solutions would typically prefer large-buffer switches with settling for high latency, the designers who can adapt applications t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Multilevel NOSQL Cache Design Combining In-NIC and In-Kernel Caches

    Publication Year: 2016, Page(s):60 - 67
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (393 KB) | HTML iconHTML

    Since a large-scale in-memory data store, such as key-value store (KVS), is an important software platform for data centers, this paper focuses on an FPGA-based custom hardware to further improve the efficiency of KVS. Although such FPGA-based KVS accelerators have been studied and shown a high performance per Watt compared to software-based processing, since their cache capacity is strictly limit... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • RoB-Router: Low Latency Network-on-Chip Router Microarchitecture Using Reorder Buffer

    Publication Year: 2016, Page(s):68 - 75
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (254 KB) | HTML iconHTML

    Switch allocation is the critical pipeline stage for network-on-chips (NoCs) and it is influenced by the order of packets in input buffers. Traditional input-queued routers in NoCs only have a small number of virtual channels (VCs) and the packets in a VC are organized in fixed order. Such design is susceptible to head-of-line (HoL) blocking as only the packet at the head of a VC can be allocated ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Offloading Collective Operations to Programmable Logic on a Zynq Cluster

    Publication Year: 2016, Page(s):76 - 83
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (386 KB) | HTML iconHTML

    This paper describes our architecture and implementation for offloading collective operations to programmable logic in the communication substrate. Collective operations - operations that involve communication between groups of co-operating processes - are widely used in parallel processing. The design and implementation strategies of collective operations plays a significant role in their perform... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploring Data Vortex Network Architectures

    Publication Year: 2016, Page(s):84 - 91
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (407 KB) | HTML iconHTML

    In this work, we present an overview of the Data Vortex interconnection network, a network designed for both traditional HPC and emerging irregular and data analytics workloads. The Data Vortex network consists of a congestion-free, high-radix network switch and a Vortex Interconnection Controller (VIC) that interfaces the compute node with the rest of the network. The Data Vortex network is desig... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploring Wireless Technology for Off-Chip Memory Access

    Publication Year: 2016, Page(s):92 - 99
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (975 KB) | HTML iconHTML

    The trend of shifting from multi-core to many-core processors is exceeding the data-carrying capacity of the traditional on-chip communication fabric. While the importance of the on-chip communication paradigm cannot be denied, the off-chip memory access latency is fast becoming an important challenge. As more memory intensive applications are developed, off-chip memory access will limit the perfo... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.