Efficient line-based architectures for two-dimensional discrete wavelet transform (2-D DWT) are presented in this paper. We first present a four-input/four-output architecture for direct 2-D DWT that 1-level decomposition of a N×N image could be performed in approximately N2/4 intra-working clock cycles (ccs), where the parallelism among four subbands transforms in lifting-based 2-D DWT is explored. By using this four-input/four-output architecture, we propose a novel pipelined architecture for multilevel 2-D DWT that can perform a complete dyadic decomposition of N×N image in approximately N2/4 ccs. Performance analysis and comparison results demonstrate that, the proposed architectures have faster throughput rate and good performance in terms of production of throughput rate and hardware cost, as well as hardware utilization. The proposed pipelined architecture could be an efficient alternative for high-speed and/or low-power applications.