By Topic

SIMD Architectural Enhancements to Improve the Performance of the 2D Discrete Wavelet Transform

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Shahbahrami, A. ; Comput. Eng. Lab., Delft Univ. of Technol., Delft, Netherlands ; Juurlink, B.

The 2D Discrete Wavelet Transform (DWT) is a time-consuming kernel in many multimedia applications such as JPEG2000 and MPEG-4. The 2D DWT consists of horizontal filtering along the rows followed by vertical filtering along the columns. The vertical filtering is easy to vectorize (assuming row-major order), but to vectorize the horizontal filtering many overhead instructions are required. In this paper we propose some SIMD architectural enhancements, such as the MAC operation, extended subwords, and the matrix register file technique, to develop high-performance implementations of the 2D DWT on SIMD architectures. The MAC operation performs four 32-bit single-precision floating-point multiplications with accumulation. The matrix register file allows to load data stored consecutively in memory to a column of the register file, where a column corresponds to corresponding subwords of different registers. These techniques avoid the need of data rearrangement instructions. In addition, in order to avoid data type conversion instructions, the extended subword technique is applied for the (5, 3) lifting transform. Extended subwords use registers that are wider than the packed format used to store the data. These techniques provide speedups of up to 2.90 and 1.32 for the (5, 3) lifting and Daub-4 transforms, respectively.

Published in:

Digital System Design, Architectures, Methods and Tools, 2009. DSD '09. 12th Euromicro Conference on

Date of Conference:

27-29 Aug. 2009