By Topic

Accessing hardware performance counters in order to measure the influence of cache on the performance of integer sorting

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Cerin, C. ; Univ. de Picardie Jules Verne, Amiens, France ; Fkaier, H. ; Jemni, M.

Hardware performance counters are available on most modern microprocessors. These counters are implemented as a small set of registers that count events related to the processor's functions. The Perfctr toolkit is one of the most popular toolkits (for x86 processors) for monitoring these events. In this paper, it is used to discover the impact of L1 data cache misses on the overall performance of six integer sorting algorithms. Most of them are cache conscious algorithms recently introduced, or known to behave well according to previous simulations, or they are totally not explored. We demonstrate through experiments on an Athlon processor that a good balance between L1 data cache misses and retired instructions provides the fastest algorithm for sorting in practical cases. The fastest sorting algorithm is not obtained with the implementation that gives the smallest number of misses and the smallest number of instructions. The fastest algorithm in practice is thus a new flavour of merge-sort that we have developed and it beats its rival. Keywords: hardware performance counters, cache conscious and oblivious algorithms, in-core sorting algorithms, two levels memory hierarchy, parallelism at the chip level.

Published in:

Parallel and Distributed Processing Symposium, 2003. Proceedings. International

Date of Conference:

22-26 April 2003