Skip to Main Content
The emergence of programmable graphics processing units (GPU) has led to increasing interest in off-loading numerically intensive computations on to graphics hardware. DCT/IDCT is widely adopted in modern image/video compression standards and is usually one of the most computationally expensive parts. We present several techniques for efficient implementation of DCT/IDCT on generic programmable GPU, using direct matrix multiplication. Our experimental results demonstrate that the speed of IDCT on a GPU using the proposed techniques can well exceed that on a CPU with MMX optimization.