The paper presents a VLSI design rule, namely, an embedded instruction code (EIC), for the discrete wavelet transform (DWT). Our approach derives from the essential computations of DWT, and we establish a set of multiplication instructions, MUL, and the addition instruction, ADD. In addition, we propose a parallel arithmetic logic unit (PALU) with two multipliers and four adders, called 2M4A. With these requirements, the DWT computation paths can be calculated more efficiently with limited PALUs. Furthermore, since the EIC is operated under the PALU, the number of needed inner registers depends on the wavelet filters' length. Besides, the boundary problem of DWT has also been resolved by the symmetric extension. Moreover, the two-dimensional inverse DWT (2D IDWT) can be completed using the same PALU as for 2D DWT; the only changes needed to be made are the instruction codes and coefficients. Our chip supports up to six levels of decomposition and versatile image specifications, e.g., VGA, MPEG-1, MPEG-2 and 1024×1024 image sizes.