By Topic

Multitoroidal Interconnects For Tightly Coupled Supercomputers

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

6 Author(s)

The processing elements of many modern tightly coupled multicomputers are connected via mesh or toroidal networks. Such interconnects are simple and highly scalable, but suffer from high fragmentation, low utilization, and insufficient fault tolerance when the resources allocated to each job are dedicated. High-dimensional interconnects may be more efficient in certain cases, but are based on complex and expensive components and scale poorly. We present a novel hardware/software architectural approach that detaches the processing elements of the system from the interconnect and augments the traditional toroidal topology to provide additional connectivity options and additional link redundancy. We explore the properties of the new "multitoroidal" topology and the improvements it offers in resource utilization and failure tolerance. We present the results of extensive simulation studies to show that for practically important types of workloads, the resource utilization may be increased by 50 percent and, in certain cases, as much as 100 percent compared to toroidal machines and is, in fact, close to the theoretically optimal case of a full crossbar interconnect. The combined hardware/software architectural innovation is a major significant improvement in resource utilization on top of the state of the art in scheduling algorithm research. Also, multitoroidal multicomputers are able to work under link failure rates of 0.002 failures per week that would shut down toroidal machines. A variant of the multitoroidal architecture is implemented in the Blue Gene/L supercomputer.

Published in:

IEEE Transactions on Parallel and Distributed Systems  (Volume:19 ,  Issue: 1 )