By Topic

Parallel and Pipeline Architectures for High-Throughput Computation of Multilevel 3-D DWT

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Basant K. Mohanty ; Dept. of Electronics and Communication Engineering, Jaypee Institute of Engineering and Technology, Guna, India ; Pramod K. Meher

In this paper, we present a throughput-scalable parallel and pipeline architecture for high-throughput computation of multilevel 3-D discrete wavelet transform (3-D DWT). The computation of 3-D DWT for each level of decomposition is split into three distinct stages, and all the three stages are implemented in parallel by a processing unit consisting of an array of processing modules. The processing unit for the first level decomposition of a video stream of frame-size (M × N) consists of Q/2 processing modules, where Q is the number of input samples available to the structure in each clock cycle. The processing unit for a higher level of decomposition requires 1/8 times the number of processing modules required by the processing unit for its preceding level. For J level 3-D DWT of a video stream, each of the proposed structures involves J processing units in a cascaded pipeline. The proposed structures have a small output latency, and can perform multilevel 3-D DWT computation with 100% hardware utilization efficiency. The throughput rate of proposed structures are Q/7 time higher than the best of the corresponding existing structures. Interestingly, the proposed structures involve a frame-buffer of O(MN) while the frame-buffer size of the existing structures is O(MNR) . Besides, the on-chip storage and the frame-buffer size of the proposed structure is independent of the input-block size and this favors to derive highly concurrent parallel architecture for high-throughput implementation. The overall area-delay products of proposed structure are significantly lower than the existing structures, although they involve slightly more multiplier-delay product and more adder-delay product, since it involves significantly less frame-buffer and storage-word-delay product. The throughput rate of the proposed structure can easily be scaled without increasing the on-chip storage and frame-memory by using mo- e number of processing modules, and it provides greater advantage over the existing designs for higher frame-rates and higher input block-size. The full-parallel implementation of proposed scalable structure provides the best of its performance. When very high throughput generated by such parallel structure is not required, the structure could be operated by a slower clock, where speed could be traded for power by scaling down the operating voltage and/or the processing modules could be implemented by slower but hardware-efficient arithmetic circuits.

Published in:

IEEE Transactions on Circuits and Systems for Video Technology  (Volume:20 ,  Issue: 9 )