By Topic

Analyzing and improving performance scalability of commercial server workloads on a chip multiprocessor

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)

A chip multiprocessor (CMP) with many low performance cores can achieve high performance or high performance/power for commercial server applications. The large number of hardware threads of a CMP with many low performance cores poses significant challenges to application developers in writing scalable applications. Many papers have assessed the architectural characteristics and the performance scalability, and some of them have identified lock contention as one of the scalability bottlenecks. However, there are few studies that resolved these problems, analyzed their causes, and compared the architectural characteristics before and after the scalability limitations were addressed. We analyzed and resolved some of the problems limiting the scalability of three commercial server applications with 64 hardware threads. We also did before and after comparisons of the architectural characteristics affected by the scalability enhancements, supporting the development of new processors. We addressed the lock contention with changes in the Java code. Our enhancements improved the performance scalability by up to 132%. We show that though the causes of lock contention are in different software layers, they share certain similarities and can be organized in three categories. Our comparisons reveal that the CPI and data TLB miss rates decrease, but the L2 data cache miss rates, L2 instruction cache miss rates, and memory traffic increase. These results suggest that we need to address the performance scalability problems of an application before we can accurately measure the architectural characteristics of a CMP.

Published in:

Workload Characterization, 2009. IISWC 2009. IEEE International Symposium on

Date of Conference:

4-6 Oct. 2009