By Topic

SIMD parallelization of applications that traverse irregular data structures

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

6 Author(s)
Ren, B. ; Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA ; Agrawal, G. ; Larus, J.R. ; Mytkowicz, T.
more authors

Fine-grained data parallelism is increasingly common in mainstream processors in the form of longer vectors and on-chip GPUs. This paper develops support for exploiting such data parallelism for a class of non-numeric, non-graphic applications, which perform computations while traversing many independent, irregular data structures. While the traversal of any one irregular data structure does not give opportunity for parallelization, traversing a set of these does. However, mapping such parallelism to SIMD units is nontrivial and not addressed in prior work. We address this problem by developing an intermediate language for specifying such traversals, followed by a run-time scheduler that maps traversals to SIMD units. A key idea in our run-time scheme is converting branches to arithmetic operations, which then allows us to use SIMD hardware. In order to make our approach fast, we demonstrate several optimizations including a stream compaction method that aids with control flow in SIMD, a set of layouts that reduce memory latency, and a tiling approach that enables more effective prefetching. Using our approach, we demonstrate significant increases in single-core performance over optimized baselines for two applications.

Published in:

Code Generation and Optimization (CGO), 2013 IEEE/ACM International Symposium on

Date of Conference:

23-27 Feb. 2013