Discrete Wavelet Transforms has surpassed its counterparts due to its attractive properties, and hence been adopted by image processing algorithms. However, with the emergence of real-time resource constrained embedded imaging platforms, DWT manifests as a bottleneck. This article presents a hardware implementation for 2-D DWT. An area-efficient, parallel and pipelined architecture is proposed with a modified image scan coupled with "multiplier-free" multiplications. Through simulations and implementation, the proposed scheme proves to be a fast, area and power efficient solution for DWT.