Skip to Main Content
In this brief, we propose a new parallel lifting-based 2-D DWT architecture with high memory efficiency and short critical path. The memory efficiency is achieved with a novel scanning method that enables tradeoff of external memory bandwidth and on-chip memory. Based on the data flow graph of the flipped lifting algorithm, processing units (PUs) are developed for maximally utilizing the inherent parallelism. With S number of PUs, the throughput can be scaled while keeping the latency constant. Compared with the best existing architecture, the proposed architecture requires less memory. For an N × N image, the proposed architecture consumes a total of only 3N + 24S words of transposition memory, temporal memory, and pipeline registers. The synthesized results in a 90-nm CMOS process show that it achieves better area-delay products than the best existing design by 32.3%, 31.5%, and 27.0% when S = 2, 4, and 8, respectively, and by 26%, 26%, and 22% when the overhead for buffering the required overlapped pixels is taken into account.