Skip to Main Content
An intra-prediction mode with 4×4 block and 16×16 block sizes for luma component and 8×8 block size for chroma component is used in H.264 to improve the rate-distortion performance. However, the computational complexity of H.264 encoder is drastically increased due to the various intraprediction modes. Recently efficient hardware architectures were proposed for the fast execution of H.264/AVC intraprediction mode selection. This paper proposes an efficient pipelining method for the 4×4 blocks intra-prediction mode selection. In particular, we exploit the GPU's streaming architecture at 4 × 4 intra-prediction mode selection in H.264/AVC and we develop a special strategy including instruction optimization and taking full advantage of shared memory to further exploit the fine-grained parallelism of GPUs. Experimental results up to about 3×speedup of our GPU-based algorithms over the implementations on sequential CPUs.