In this brief an efficient folded architecture (EFA) for lifting-based discrete wavelet transform (DWT) is presented. The proposed EFA is based on a novel form of the lifting scheme that is given in this brief. Due to this form, the conventional serial operations of the lifting data flow can be optimized into parallel ones by employing parallel and pipeline techniques. The corresponding optimized architecture (OA) has short critical path latency and is repeatable. Further, utilizing this repeatability, the EFA is derived from the OA by employing the fold technique. For the proposed EFA, hardware utilization achieves 100%, and the number of required registers is reduced. Additionally, the shift-add operation is adopted to optimize the multiplication; thus, the proposed architecture is more suitable for hardware implementation. Performance comparisons and field-programmable gate array (FPGA) implementation results indicate that the proposed EFA possesses better performances in critical path latency, hardware cost, and control complexity.