By Topic

Performance Study of SIMD Programming Models on Intel Multicore Processors

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Kristof, P. ; Dept. of Comput. Sci., Purdue Univ., West Lafayette, IN, USA ; Hongtao Yu ; Zhiyuan Li ; Tian, X.

Modern multicore hardware employs a variety of parallel execution units, including multiple CPU cores for executing multiple threads simultaneously, vector units such as the Intel SIMD on the CPU cores, as well as GPU-like processing arrays. Availability of such unprecedented level of parallelism on main-stream computers offers an enormous potential to enable a new generation of computation-intensive nontraditional applications. On the other hand, how to best harness the hardware parallelism presents a new challenge to application programmers, language designers and compiler developers. In this paper, we evaluate the impact of several different parallel execution models, especially the new SIMD vectorization methods, supported by the latest Intel ICC compiler (version 12.1), using three computation-intensive nontraditional parallel applications as the test workload. Unlike traditional numerical programs, these applications use highly irregular data structures and therefore present nontrivial challenges to effective use of SIMD vector units. The first application is a game engine architecture requiring real-time performance. The second application involves a kd-tree traversal, which is typical to the state-of-the-art 3D ray-tracing applications. The last application processes data for large-scale weather visualization system in the order of tens of minutes. We compare the execution time of these codes using different SIMD models supported by ICC in conjunction with parallel threading under TBB and OpenMP.

Published in:

Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International

Date of Conference:

21-25 May 2012