Scheduled System Maintenance on May 29th, 2015:
IEEE Xplore will be upgraded between 11:00 AM and 10:00 PM EDT. During this time there may be intermittent impact on performance. We apologize for any inconvenience.
By Topic

High-Performance Interconnects, 14th IEEE Symposium on

Date 23-25 Aug. 2006

Filter Results

Displaying Results 1 - 25 of 29
  • 14th IEEE Symposium on High-Performance Interconnects - Cover

    Publication Year: 2006 , Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (70 KB)  
    Freely Available from IEEE
  • 14th IEEE Symposium on High-Performance Interconnects - Title

    Publication Year: 2006 , Page(s): i - iii
    Save to Project icon | Request Permissions | PDF file iconPDF (827 KB)  
    Freely Available from IEEE
  • 14th IEEE Symposium on High-Performance Interconnects - Copyright

    Publication Year: 2006 , Page(s): iv
    Save to Project icon | Request Permissions | PDF file iconPDF (525 KB)  
    Freely Available from IEEE
  • 14th IEEE Symposium on High-Performance Interconnects - TOC

    Publication Year: 2006 , Page(s): v - vi
    Save to Project icon | Request Permissions | PDF file iconPDF (445 KB)  
    Freely Available from IEEE
  • General Chairs' message

    Publication Year: 2006 , Page(s): vii
    Save to Project icon | Request Permissions | PDF file iconPDF (421 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • General Co-Chairs' Message

    Publication Year: 2006 , Page(s): viii
    Save to Project icon | Request Permissions | PDF file iconPDF (425 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • Program Committee

    Publication Year: 2006 , Page(s): ix
    Save to Project icon | Request Permissions | PDF file iconPDF (416 KB)  
    Freely Available from IEEE
  • Organizing Committee

    Publication Year: 2006 , Page(s): x
    Save to Project icon | Request Permissions | PDF file iconPDF (422 KB)  
    Freely Available from IEEE
  • Technical Programming Committee

    Publication Year: 2006 , Page(s): xi
    Save to Project icon | Request Permissions | PDF file iconPDF (421 KB)  
    Freely Available from IEEE
  • Keynote Speaker I

    Publication Year: 2006 , Page(s): xii
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (461 KB)  

    Provides an abstract for each of the keynote presentations and a brief professional biography of each presenter. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Keynote Speaker II

    Publication Year: 2006 , Page(s): xiii
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (464 KB)  

    Provides an abstract for each of the keynote presentations and a brief professional biography of each presenter. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Tutorial I

    Publication Year: 2006 , Page(s): xiv - xv
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (584 KB) |  | HTML iconHTML  

    Summary form only for tutorial. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Tutorial II

    Publication Year: 2006 , Page(s): xvi
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (460 KB)  

    Summary form only for tutorial. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Tutorial III

    Publication Year: 2006 , Page(s): xvii
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (464 KB)  

    Summary form only for tutorial. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Tutorial IV

    Publication Year: 2006 , Page(s): xviii
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (463 KB)  

    Summary form only for tutorial. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Loosely Coupled TCP Acceleration Architecture

    Publication Year: 2006 , Page(s): 3 - 8
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (134 KB) |  | HTML iconHTML  

    We present a novel approach for scalable network acceleration. The architecture uses limited hardware support and preserves protocol processing flexibility, combining the benefits of TCP offload and onload. The architecture is based on decoupling the data movement functions, accelerated by a hardware engine, from complex protocol processing, controlled by an isolated software entity running on a central CPU. These operate in parallel and interact asynchronously. We describe a prototype implementation which achieves multi-gigabit throughput with extremely low CPU utilization View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Network I/O Acceleration in Heterogeneous Multicore Processors

    Publication Year: 2006 , Page(s): 9 - 14
    Cited by:  Papers (3)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (184 KB) |  | HTML iconHTML  

    Chip multiprocessor (CMP) architectures are fast becoming the dominant design for general purpose processors. Whereas current generation server and desktop processors use homogenous CMP architectures, network processors (NPs) have used heterogeneous CMP architectures for years. At the same time, the failure of network stacks in traditional processors to scale with increased network bandwidths has spawned numerous proposals for new approaches to accelerate network processing. This paper looks at moving network stack processing from the main CPU to a series of smaller, closely coupled, and more efficient processors in a heterogeneous CMP by implementing such an architecture on an Intel IXP network processor. Our experiments show that the close coupling and flexible nature of the IXP's microengines allow them to greatly accelerate network processing for a small cost in area View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Case Study in I/O Disaggregation using PCI Express Advanced Switching Interconnect (ASI)

    Publication Year: 2006 , Page(s): 15 - 24
    Cited by:  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (324 KB) |  | HTML iconHTML  

    Decoupling the processor and I/O subsystem provides immense benefits that include high availability, efficient allocation and cost-effective upgrade of system resources. Such a disaggregation model calls for a high-performance interconnect to isolate the processor and I/O subsystem domains, yet provide the veneer of a single system. PCI express (PCIe) is one such interconnect and is becoming the de-facto I/O fabric. However, PCIe, as specified currently, provides limited support for I/O disaggregation and does not yet natively support dynamic sharing of I/O resources amongst processor subsystems - this is the next major step in I/O disaggregation. PCI express advanced switching interconnect (ASI) is well-suited for enhancing the capabilities of PCIe in a non-disruptive manner. ASI is built upon PCIe and has the innate ability to co-exist with PCIe devices due to its commonality of the link/physical layer with PCIe as well as its native support for encapsulating PCIe packets. Towards a simple yet illustrative demonstration of ASI-based disaggregation of PCIe devices, we employed StarGen's ASI products for creating a basic ASI fabric and disaggregated a PCIe based GigE NIC from a host system. The initial set of results showed a marginal effect on the application's latency, but contrary to expectations, the throughput was significantly impacted. Further analysis revealed that this unexpected drop in throughput could be rectified easily and indeed, the final results confirm that the use of ASI for supporting I/O disaggregation does not result in sub-optimal utilization of the GigE NIC View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Designing Full-Connectivity WDM Optical Interconnects with Reduced Switching and Conversion Complexity

    Publication Year: 2006 , Page(s): 25 - 30
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (192 KB) |  | HTML iconHTML  

    Most existing wavelength division multiplexing (WDM) optical interconnects make use of a large number of switching elements and require wide-range tunable wavelength converters to support full-connectivity among inputs and outputs. This results in complex and expensive designs. In this paper, we propose new full-connectivity single-stage and multi-stage WDM optical interconnects with and reduced hardware complexity. The proposed designs require a smaller number of switching elements and use only fixed-range wavelength conversion. Analysis of hardware complexity shows that, the proposed designs have a smaller number of switching and conversion costs compared to most existing interconnects View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A New Dynamic Bandwidth Re-Allocation Technique in Optically Interconnected High-Performance Computing Systems

    Publication Year: 2006 , Page(s): 31 - 36
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (227 KB) |  | HTML iconHTML  

    As bit rates increase, optical interconnects based high-performance computing (HPC) systems improve performance by increasing the available bandwidth (using wavelength-division multiplexing (WDM) and space-division multiplexing (SDM)) and decreasing power dissipation as compared to traditional electrical interconnects. While static allocation of wavelengths (channels) in optical interconnects provide every node with equal opportunity for communication, it can lead to network congestion for non-uniform traffic patterns. In this paper, we propose an opto-electronic interconnect for designing a flexible, high-bandwidth, low-latency, dynamically reconfigurable architecture for scalable HPC systems. Reconfigurability is realized by monitoring traffic intensities, and implementing dynamic bandwidth re-allocation (DBR) technique that adapts to changes in communication patterns. We propose a DBR technique - lock-step (LS) that balances the load on each communication channel based on past utilization. Simulation results indicate that the reconfigured architecture shows 40% increased throughput and 20% reduced network latency as compared to HPC electrical networks View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Strictly Nonblocking Multicasting WDM Optical Cross Connects Using Multiwavelength Converters

    Publication Year: 2006 , Page(s): 37 - 44
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (174 KB) |  | HTML iconHTML  

    In this paper, we propose new strictly nonblocking multicast capable optical cross connects (MC-OXCs) architectures that exploit the potential of multi-wavelength converters (MWCs). An MWC is capable of simultaneously replicating a signal on an input wavelength to several output wavelengths. We investigate two families of MCOXCs based on use of full- and limited-range MWCs, and present a number of architectures in each of the two families. Proposed architectures present a trade-off between switching complexity, conversion cost, and signal loss View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • ExpressEther - Ethernet-Based Virtualization Technology for Reconfigurable Hardware Platform

    Publication Year: 2006 , Page(s): 45 - 51
    Cited by:  Papers (5)  |  Patents (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (545 KB) |  | HTML iconHTML  

    We propose ExpressEther as Ethernet-based virtualization technology for a reconfigurable hardware platform. It groups modularized hardware resources interconnected by an Ethernet, and transports a PCI express (PCIe) packet between the grouped modules by encapsulating it into an Ethernet frame. The configuration of the group is dynamically reconfigured by an Ethernet connection, followed by standardized PCIe hot-plug event. Such reconfigurability enables sharing of a physical resource among computer entities. We demonstrate that an I/O device is shared by servers, using our developed prototype consisted of an interface card and I/O concentrator. A commercially available server and serial ATA card are used for the demonstration, without any change for an operating system, device driver, PCIe interface, and Ethernet switch. The benchmark of I/O performance shows at most 16% degradation, which is caused by the implementation matters of our prototype. No degradation is measured when data flow from an I/O to a server View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Memory Scalability Evaluation of the Next-Generation Intel Bensley Platform with InfiniBand

    Publication Year: 2006 , Page(s): 52 - 60
    Cited by:  Papers (2)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (344 KB) |  | HTML iconHTML  

    As multi-core systems gain popularity for their increased computing power at low-cost, the rest of the architecture must be kept in balance, such as the memory subsystem. Many existing memory subsystems can suffer from scalability issues and show memory performance degradation with more than one process running. To address these scalability issues, fully-buffered DIMMs have recently been introduced. In this paper we present an initial performance evaluation of the next-generation multi-core Intel platform by evaluating the FB-DIMM-based memory subsystem and the associated InfiniBand performance. To the best of our knowledge this is the first such study of Intel multi-core platforms with multi-rail InfiniBand DDR configurations. We provide an evaluation of the current-generation Intel Lindenhurst platform as a reference point. We find that the Intel Bensley platform can provide memory scalability to support memory accesses by multiple processes on the same machine as well as drastically improved inter-node throughput over InfiniBand. On the Bensley platform we observe a 1.85 times increase in aggregate write bandwidth over the Lindenhurst platform. For inter-node MPI-level benchmarks we show bi-directional bandwidth of over 4.55 GB/sec for the Bensley platform using 2 DDR InfiniBand host channel adapters (HCAs), an improvement of 77% over the current generation Lindenhurst platform. The Bensley system is also able to achieve a throughput of 3.12 million MPI messages/sec in the above configuration View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Reconfigurable Architecture for Multi-Gigabit Speed Content-Based Routing

    Publication Year: 2006 , Page(s): 61 - 66
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (395 KB) |  | HTML iconHTML  

    This paper presents a reconfigurable architecture for high-speed content-based routing. Our architecture goes beyond simple pattern matching by implementing a parsing engine that defines the semantics of patterns that are parsed within the data stream. Defining the semantics of patterns allows for more accurate processing and routing of packets using any fields that appear within the payload of the packet. The architecture consists of several components, including a pattern matcher, a parsing structure, and a routing module. Both the pattern matcher and parsing structure are automatically generated using an application-specific compiler that is described in this paper. The compiler accepts a grammar specification as input and outputs a data parser in VHDL. The routing module receives control signals from both the pattern matcher and the parsing structure that aid in the routing of packets. We illustrate how a content-based router can be implemented with our technique using an XML parser as an example. The XML parser presented was designed, implemented, and tested in a Xilinx Virtex XCV2000E FPGA on the FPX platform. It is capable of processing 32-bits of data per clock cycle and runs at 100 MHz. This allows the system to process and route XML messages at 3.2 Gbps View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast Buffer Memory with Deterministic Packet Departures

    Publication Year: 2006 , Page(s): 67 - 72
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (136 KB) |  | HTML iconHTML  

    High-performance routers need to store temporarily a large number of packets in response to congestion. DRAM is typically used to implement the needed packet buffers, but DRAM devices are too slow to match the bandwidth requirements. To bridge the bandwidth gap, a number of hybrid SRAM/DRAM packet buffer architectures have been proposed (S. Iyer and N. Mckeown, 2002) (S. Kumar et al., 2005). These packet buffer architectures assume a very general model where the buffer consists of many logically separated FIFO queues that may be accessed in random order. For example, virtual output queues (VOQs) are used in crossbar routers, where each VOQ corresponds to a logical queue corresponding to a particular output. Depending on the scheduling algorithm used, the access pattern to these logical queues may indeed be at random. However, for a number of router architectures, this worst-case random access assumption is unnecessary since packet departure times are deterministic. One architecture is the switch-memory-switch router architecture (A. Prakash et al., 2002) (S. Iyer et al., 2002) that efficiently mimics an output queueing switch. Another architecture is the load-balanced router architecture (C.S. Chang et al., 2002) (I. Keslassy et al., 2003) that has interesting scalability properties. In these architectures, for best-effort routing, the departure times of packets can be deterministically calculated before inserting packets into packet buffers. In this paper, we describe a novel packet buffer architecture based on interleaved memories that takes advantage of the known packet departure times to achieve simplicity and determinism. The number of interleaved DRAM banks required to implement the proposed packet buffer architecture is independent of the number of logical queues, yet the proposed architecture can achieve the performance of an SRAM implementation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.