This paper proposes an efficient VLSI architecture for implementation of 2-D lifting-based discrete wavelet transform (DWT). The whole architecture was optimized in efficient pipeline and parallel design way to speed up and achieve higher hardware utilization. Adopted time division multiplex (TDM) design to realize the prediction step and update step using the same architecture, which reduced the size of the circuit. Exploited embedded mirror symmetric boundary extension technique to optimize the architecture for 1-D DWT. The architecture was coded in Verilog HDL, implemented in a FPGA, and verified by a real-time platform which comprises a CMOS image sensor, a FPGA and a PC.