Skip to Main Content
In order to take advantage of the byte-type data parallelism in the existing single-instruction multiple-data (SIMD) technique, this paper introduces the concept of 8-bit partial sums, obtained by a 4-bit right-shift operation on the sum of the 16 luminance values in a column of a 16 x 16 block of a video frame. Since these partial sums are of only eight bits, eight of them can be processed concurrently in a single 64-bit SIMD register. A method of employing these partial sums in order to speed up a given block motion-estimation algorithm is then proposed. The notion of the 8-bit partial sums is extended to the four-level case. It is shown that there are 15 possible methods of utilizing these multilevel 8-bit partial sums to accelerate a block motion-estimation algorithm without any loss of accuracy of the algorithm. Each of these 15 methods is used in the full-search algorithm to determine the one that provides the lowest computational complexity. This method is adopted as the chosen scheme to accelerate various block motion-estimation algorithms. Extensive simulations are carried out on eight video sequences showing that substantial speed-up can be achieved when the chosen scheme is incorporated with the various motion-estimation algorithms. The simulation results also demonstrate that the implementation on SIMD architectures can further accelerate the execution of the proposed scheme by more than 93% percent.