Skip to Main Content
This work presents a parallel GPU-based solution for the Motion Estimation (ME) process in a video encoding system. We propose a way to partition the steps of Full Search block matching algorithm in the CUDA architecture. A comparison among the performance achieved by this solution with a theoretical model and two other implementations (sequential and parallel using OpenMP library) is made as well. We obtained a O(n^2/log^2n) speed-up which fits the proposed theoretical model considering different search areas. It represents up to 600x gain compared to the serial implementation, and 66x compared to the parallel OpenMP implementation.