Skip to Main Content
We studied the efficient implementation of a motion estimation algorithm for H.264/AVC on TMS 320C64x, a VLIW (very long instruction word) SIMD (single instruction multiple data) digital signal processor. H.264 motion estimation algorithms demand much arithmetic operations especially because of the variable block size optimization. The SAD (sum of absolute difference) reuse method is chosen not only to reduce the computation but also to utilize the regular algorithmic structure, which is essential for efficient implementation in parallel and pipelined processors. We applied a few techniques, such as loop length increase for efficient software pipelining, multi-block SAD computation for reducing memory access overhead, block processing for cache miss minimization, and improved quarter-pixel processing. The implementation results show that a real-time implementation of ME for D1 size (720*480) video is possible using a 720 MHz TMS320C6416 digital signal processor.