In this paper, a boundary postprocessing technique is proposed to compute the discrete wavelet transform (DWT) near block boundaries. The basic idea is to take advantage of available lifting filterbank factorizations to model the DWT as a Finite State Machine (FSM). The proposed technique can reduce the size of auxiliary buffers in block-based DWT implementations and reduce the communication overhead between adjacent blocks. Two new DWT system architectures, Overlap-State sequential and Split-and-Merge parallel, are presented using this technique. Experimental results show that, for the popular (9, 7) filters, the size of auxiliary buffers can be reduced by 42% and that the parallel algorithm is 30% faster than existing approaches.