Using the lifting scheme to construct VLSI architectures for discrete wavelet transforms outperforms using convolution in many aspects, such as computation complexity and boundary extension. Nevertheless, the critical path of the lifting scheme is potentially longer than that of convolution. Although pipelining can reduce the critical path, it will prolong the latency and require more registers for a 1D architecture as well as larger memory size for a 2D line-based architecture. In this paper, an efficient VLSI architecture is proposed to provide a variety of hardware implementations for improving and possibly minimizing the critical path and memory requirements of lifting-based discrete wavelet transforms by flipping conventional lifting structures. By case studies of a JPEG2000 defaulted filter and an integer filter, the efficiency of the proposed flipping structure is shown.