Skip to Main Content
Modern multicore hardware employs a variety of parallel execution units, including multiple CPU cores for executing multiple threads simultaneously, vector units such as the Intel SIMD on the CPU cores, as well as GPU-like processing arrays. Availability of such unprecedented level of parallelism on main-stream computers offers an enormous potential to enable a new generation of computation-intensive nontraditional applications. On the other hand, how to best harness the hardware parallelism presents a new challenge to application programmers, language designers and compiler developers. In this paper, we evaluate the impact of several different parallel execution models, especially the new SIMD vectorization methods, supported by the latest Intel ICC compiler (version 12.1), using three computation-intensive nontraditional parallel applications as the test workload. Unlike traditional numerical programs, these applications use highly irregular data structures and therefore present nontrivial challenges to effective use of SIMD vector units. The first application is a game engine architecture requiring real-time performance. The second application involves a kd-tree traversal, which is typical to the state-of-the-art 3D ray-tracing applications. The last application processes data for large-scale weather visualization system in the order of tens of minutes. We compare the execution time of these codes using different SIMD models supported by ICC in conjunction with parallel threading under TBB and OpenMP.