This paper presents efficient VLSI architectures of the shape-adaptive discrete cosine transform (SA-DCT) and its inverse transform (SA-IDCT) for MPEG-4. Two of the challenges encountered during the exploitation of more efficient architectures for the SA-DCT and SA-IDCT are addressed. One challenge is to handle the architectural irregularity due to the shape-adaptive nature. The other one is to provide acceptable throughput using minimal hardware. In the algorithm-level optimization, this work exploits the numerical properties found in the transform matrices of various lengths, and derives a fine-grained zero-skipping scheme for the IDCT which can perform 22.6% more zero-skipping than the common vector-based coarse-grained zero-skipping scheme does. In the architecture-level design, the 1-D variable-length DCT/IDCT architectures designed on the basis of the numerical properties are proposed. An auto-aligned transpose memory that aligns the data of different lengths is also incorporated. In addition, a zero-index table is also included in the transpose memory to support the fine-grained zero-skipping in the SA-IDCT. The synthesized designs of the SA-DCT and SA-IDCT are implemented using UMC 0.18-mum technology. The SA-DCT architecture has 26 635 gates, and its average cycle-throughput is 0.66 pixels/cycle, which is comparable to other proposed architectures. On the other hand, the SA-IDCT architecture has 29 960 gates, and its cycle-throughput is 6.42 pixels/cycle. While decoding for CIF@30FPS, the SA-IDCT is clocked at 0.7 MHz, and the power consumption is 0.14 mW. Both the throughput and power consumption of the proposed SA-IDCT architecture are an order better than those of the existing SA-IDCT architectures.