Transform engine is a critical part of the video codec, and increased coding efficiency often comes at the cost of increased complexity in the transform module. In this work, we propose a shared transform engine for H.264/AVC and VC-1 video coding standards, using the structural similarity and symmetry of the transforms. An approach to eliminate an explicit transpose memory in 2-D transforms is proposed. Data dependency is exploited to reduce power consumption. Ten different versions of the transform engine, such as with and without hardware sharing and with and without transpose memory, are implemented in the design. The design is fabricated using commercial 45-nm CMOS technology, and all implemented versions are verified. The shared transform engine without transpose memory supports Quad Full-HD (3840 × 2160) video encoding at 30 fps, while operating at 0.52 V, with a measured power of 214 μ W. This highly scalable design is able to support 1080 p at 30 fps, while operating down to 0.41 V, with measured power of 79 μW and 720 p at 30 fps, while operating down to 0.35 V, with measured power of 43 μW. Hardware sharing saves 30% area compared with individual H.264 and VC-1 implementations combined. Eliminating an explicit transpose memory using a 2-D (8 × 8) output buffer reduces area by 23% and power by 26%. Ideas proposed here can potentially be extended to future video coding standards such as HEVC.