By Topic

Reducing remote conflict misses: NUMA with remote cache versus COMA

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Zheng Zhang ; Center for Supercomput. Res. & Dev., Illinois Univ., Urbana, IL, USA ; Torrellas, J.

Many future applications for scalable shared-memory multiprocessors are likely to have large working sets that overflow secondary or tertiary caches. Two possible solutions to this problem are to add a very large cache called remote cache that caches remote data (NUMA-RC), or organize the machine as a cache-only memory architecture (COMA). This paper tries to determine which solution is best. To compare the performance of the two organizations for the same amount of total memory, we introduce a model of data sharing. The model uses three data sharing patterns: replication, read-mostly migration, and read-writs migration. Replication data is accessed in read-mostly mode by several processors, while migration data is accessed largely by one processor at a time. For large working sets, the weight of the migration data largely determines whether COMA outperforms NUMA-RC. Ideally, COMA only needs to fit the replication data in its extra memory; the migration data will simply be swapped between attraction memories. The remote cache of NUMA-RC, instead, needs to house both the replication and the migration data. However, simulations of seven Splash2 applications show that COMA does not outperform NUMA-RC. This is due to two reasons. First, the extra memory added has more associativity in NUMA-RC than in COMA and, therefore, can be utilized better by the working set in NUMA-RC. Second, COMA memory accesses are more expensive. Of course, our results are affected by the applications used, which have been optimized for a cache-coherent NUMA machine. Overall, since NUMA-RC is cheaper, NUMA-RC is more cost-effective for these applications

Published in:

High-Performance Computer Architecture, 1997., Third International Symposium on

Date of Conference:

1-5 Feb 1997