By Topic

Partitioning regular grid applications with irregular boundaries for cache-coherent multiprocessors

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Yang Zeng ; Dept. of Electr. Eng. & Comput. Sci., Michigan Univ., Ann Arbor, MI, USA ; S. G. Abraham

We consider the problem of partitioning applications that operate on a regular grid but have irregular boundaries for a cache-coherent multiprocessor. Domain decomposition techniques such as RSB have commonly been used to reduce interprocessor communication in message passing multiprocessors. We apply these partitioning algorithms on cache-coherent multiprocessors to reduce cache-coherency traffic. We find that the actual cache-coherency traffic is approximately double the estimated true coherency traffic, primarily due to false-sharing and the consequent false coherency traffic. We devise two techniques that eliminate false sharing traffic in partitions produced using the common domain decomposition algorithms. In our compensation algorithm, we modify the partition produced by the domain decomposition to ensure that all the nodes on a cache line are assigned to the same processor. In our coalescing algorithm, nodes belonging to the same cache line are coalesced into a single node and the weights on nodes and arcs adjusted to represent the overall computation and communication costs of the coalesced nodes. This coalesced graph is partitioned using a domain decomposition algorithm and then the coalesced nodes in the partition are expanded. Our experimental results using an Indian Ocean circulation application on the KSR1 multiprocessor demonstrate that compensation reduces coherency traffic by as much as 55% and execution time by up to 18% and that graph coalescing reduces coherency traffic by up to 74%

Published in:

Parallel Processing Symposium, 1995. Proceedings., 9th International

Date of Conference:

25-28 Apr 1995