By Topic

Optimizing SIMD Parallel Computation with Non-Consecutive Array Access in Inline SSE Assembly Language

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Chen Juan ; Sch. of Comput., Nat. Univ. of Defense Technol., Changsha, China ; Yang Canqun

Many processors, such as Intel Xeon processor 5100 series, AMD Athlon 64, support SIMD computation model with the Streaming SIMD Extensions (SSE), SSE2 and SSE3. Using double-precision SSE/SSE2/SSE3 instructions simultaneously can handle two packed double-precision floating-point data elements with 128-bit XMM vector registers, which greatly improves floating-point performance. Sometimes non-consecutive data instead of consecutive ones appear in SIMD computation, which prevents SIMD optimization. That is because two non-consecutive double precision floating-point data elements cannot be loaded into 128-bit vector registers simultaneously and they have to be loaded for twice. How to implement SIMD optimization for non-consecutive data is our concern. Loop unrolling exposes the rule and characteristics of such non-consecutive data. Register rotation can help transform non-consecutive data to vector data. Based on a representative kernel program, we illustrate our SIMD optimization combining loop unrolling with register rotation. Through vectorizing non-consecutive data, the performance of "KERNEL" code is improved by 42.4% and PQMRCGSTAB application is improved by 15.3%.

Published in:

Intelligent Computation Technology and Automation (ICICTA), 2012 Fifth International Conference on

Date of Conference:

12-14 Jan. 2012