Multiprocessor-system-on-a-chip will be the dominating architecture in embedded systems as it provides an increase in concurrency improving the performance of the system rather than increasing the clock speed which affects the power consumption of the system. However, concurrency needs to be exploited in order to improve the system performance in the different applications'environments. The new emerging H.264/AVC coding standard is designed to cover a wide range of applications (real-time conversational services such as videoconferencing, video phone, etc.). It has many new features that require complex computations compared to previous video coding standards. This coding standard will be a challenging workload for future MPSoC embedded systems. Exploiting the different levels of parallelism for video codec applications can be done at the data level, the functional level, or both simultaneously. Our intention in this paper is to explore the natural existent parallelism in the H.264 decoder software itself without any modification to the encoder phase, rather than forcing parallelization techniques. Our novel idea is based on the fact that the H.264 decoder decodes the luminance and chrominance signals separately, but the decoder is implemented in a way to decode them in series. Our approach is to execute the different decoding phases of the luminance signals in parallel to the chrominance signals. Using two cores to decode the luma and the chroma signals in parallel gives a gain of 15-20% of the decoding processing time and combining them the functional pipelined implementation over four cores or more, the gain can reach 60% compared to the current sequential execution.