An high-performance implementation of 2-D lifting-based Discrete Wavelet Transform (DWT) in JPEG2000 applications is designed with low memory and high pipeline architecture. The architecture consists of a row processor module, a column processor module and two memory modules. we present two new row/column processor architecture and memory architecture, one of which includes 7 dual port rams. The For the N*N tile image, only 4N temporal memory are required for 5/3 filter. Symmetric extension is used at the boundaries. Two outputs are generated every cycle. Finally, the proposed architecture was implemented in behavioral verilog-HDL with FPGA (virtex2), the result of which occupied about 400 slices, and was operated in about 120MHz.