Close category search window
 

A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture

Full text access may be available

To access full text, please use your member or institutional sign in.


This paper appears in:
Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International
Date of Conference: 25-29 April 2006
Author(s): Ying Ping Zhang
Dept. of Electr. & Comput. Eng., Delaware Univ., Newark, DE, USA
Taikyeong Jeong ;  Fei Chen ;  Haiping Wu ;  Nitzsche, R. ;  Gao, G.R.

Product Type: Conference Publications

Available Formats Non-Member Price Member Price
US$31.00 US$10.00
Learn how you can qualify for the best price for the item!
  • Email
  • Print
  • Rights And Permissions

Abstract

The designs of high-performance processor architectures are moving toward the integration of a large number of multiple processing cores on a single chip. The IBM Cyclops-64 (C64) is a petaflop supercomputer built on multi-core system-on-a-chip technology. Each C64 chip employs a multistage pipelined crossbar switch as its on-chip interconnection network to provide high bandwidth and low latency communication between the 160 thread processing cores, the on-chip SRAM memory banks, and other components. In this paper, we present a study of the architecture and performance of the C64 on-chip interconnection network through simulation. Our experimental results provide observations on the network behavior: (1) Dedicated channels can be created between any output port to input port of the C64 crossbar with latency as low as 7 cycles. The C64 crossbar has the potential reach the full hardware bandwidth, and exhibit a non-blocking behavior; (2) The C64 crossbar is a stable network; (3) The network logic design appears to provide a reasonable opportunity for sharing the channel bandwidth between traffic in either direction; (4) A simple circular neighbor arbitration scheme can achieve competitive performance level comparing to the complex segmented LRU (least recently used) matrix arbitration scheme without losing the fairness. (5) Application-driven benchmarks provide comparable results to synthetic workloads.

Index Terms

Index Terms are available to subscribers and IEEE members.

 





Need Help?


IEEE Advancing Technology for Humanity About IEEE Xplore | Contact | Help | Terms of Use | Nondiscrimination Policy | Site Map | Privacy & Opting Out of Cookies

A non-profit organization, IEEE is the world's largest professional association for the advancement of technology.
© Copyright 2012 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.