This paper proposes a high-performance direct two-dimensional transform coding IP design for MPEG-4 AVC/H.264 video coding standard. Because four kinds of 4 × 4 transforms, i.e., forward, inverse, forward-Hadamard, and inverse-Hadamard transforms are required in a H.264 encoding system, a high-performance multitransform accelerator is inevitable to compute these transforms simultaneously for fitting real-time processing requirement. Accordingly, this paper proposes a direct 2-D transform algorithm which suitably arranges the data processing sequences adopted in row and column transforms of H.264 CODEC systems to finish the data transposition on-the-fly. The induced new transform architecture greatly increases the data processing rate up to 8 pixels/cycle. In addition, an interlaced I/O schedule is disclosed to balance the data I/O rate and the data processing rate of the proposed multitransform design when integrated with H.264 systems. Using a 0.18-μm CMOS technology, the optimum operating clock frequency of the proposed multitransform design is 100 MHz which achieves 800 Mpixels/s data throughput rate with the cost of 6482 gates. This performance can achieve the real-time multitransform processing of digital cinema video (4096 × 4 2048@30 Hz). When the data throughput rate per unit area is adopted as the comparison index in hardware efficiency, the proposed design is at least 1.94 times more efficient than the existing designs. Moreover, the proposed multitransform design can achieve HDTV 720p, 1080i, digital cinema video processing requirements by consuming only 0.58, 2.91, and 24.18 mW when operated at 22, 50, and 100 MHz with 0.7, 1.0, and 1.8 V power supplies, respectively.