By Topic

H.264 Color Components Video Decoding Parallelization on Multi-core Processors

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Baaklini, E. ; Dept. of Inf. Technol. & Comput., Arab Open Univ., Beirut, Lebanon ; Sbeity, H. ; Niar, S. ; Amaneddine, N.

Multiprocessor-system-on-a-chip will be the dominating architecture in embedded systems as it provides an increase in concurrency improving the performance of the system rather than increasing the clock speed which affects the power consumption of the system. However, concurrency needs to be exploited in order to improve the system performance in the different applications'environments. The new emerging H.264/AVC coding standard is designed to cover a wide range of applications (real-time conversational services such as videoconferencing, video phone, etc.). It has many new features that require complex computations compared to previous video coding standards. This coding standard will be a challenging workload for future MPSoC embedded systems. Exploiting the different levels of parallelism for video codec applications can be done at the data level, the functional level, or both simultaneously. Our intention in this paper is to explore the natural existent parallelism in the H.264 decoder software itself without any modification to the encoder phase, rather than forcing parallelization techniques. Our novel idea is based on the fact that the H.264 decoder decodes the luminance and chrominance signals separately, but the decoder is implemented in a way to decode them in series. Our approach is to execute the different decoding phases of the luminance signals in parallel to the chrominance signals. Using two cores to decode the luma and the chroma signals in parallel gives a gain of 15-20% of the decoding processing time and combining them the functional pipelined implementation over four cores or more, the gain can reach 60% compared to the current sequential execution.

Published in:

Digital System Design: Architectures, Methods and Tools (DSD), 2010 13th Euromicro Conference on

Date of Conference:

1-3 Sept. 2010