In this paper, a scheme for the design of a high-speed pipeline VLSI architecture for the computation of the 1-D discrete wavelet transform (DWT) is proposed. The main focus of the scheme is on reducing the number and period of clock cycles for the DWT computation with little or no overhead on the hardware resources by maximizing the inter- and intrastage parallelisms of the pipeline. The interstage parallelism is enhanced by optimally mapping the computational load associated with the various DWT decomposition levels to the stages of the pipeline and by synchronizing their operations. The intrastage parallelism is enhanced by decomposing the filtering operation equally into two subtasks that can be performed independently in parallel and by optimally organizing the bitwise operations for performing each subtask so that the delay of the critical data path from a partial-product bit to a bit of the output sample for the filtering operation is minimized. It is shown that an architecture designed based on the proposed scheme requires a smaller number of clock cycles compared to that of the architectures employing comparable hardware resources. In fact, the requirement on the hardware resources of the architecture designed by using the proposed scheme also gets improved due to a smaller number of registers that need to be employed. Based on the proposed scheme, a specific example of designing an architecture for the DWT computation is considered. In order to assess the feasibility and the efficiency of the proposed scheme, the architecture thus designed is simulated and implemented on a field-programmable gate-array board. It is seen that the simulation and implementation results conform to the stated goals of the proposed scheme, thus making the scheme a viable approach for designing a practical and realizable architecture for real-time DWT computation.