By Topic

Sorting on the SGI Origin 2000: comparing MPI and shared memory implementations

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Jimenez-Gonzalez, D. ; Dept. d''Arquitectura de Comput., Univ. Politecnica de Catalunya, Barcelona, Spain ; Guinovart, E. ; Larriba-Pey, J.-L. ; Navarro, J.J.

Analyses the C3-Radix (Communication- and Cache-Conscious Radix) sort algorithm, using the distributed and the shared memory parallel programming models. C3-Radix was originally proposed based on the idea of the classic Radix sort to exploit the memory hierarchy locality and to reduce the amount of communication for distributed memory computers. We implement C3 -Radix on the SGI Origin 2000 NUMA multiprocessor and make use of the Message Passing Interface (MPI) and the native shared memory directives of that computer to implement the two programming models that we want to analyse. We give results for up to 16 processors and 64 million 32-bit keys. The results show that for data sets that are small compared to the number of processors, the MPI implementation is faster, while for data sets that are large, the shared memory implementation is faster. In this paper, we explain the reasons for the different behaviours depending on the size of the data sets

Published in:

Computer Science Society, 1999. Proceedings. SCCC '99. XIX International Conference of the Chilean

Date of Conference: