Skip to Main Content
This paper presents two parallelization methods for H.264 video decoder software on an embedded multicore processor. In parallelizing the H.264 video decoder software on a typical embedded multicore processor with shared memory, there are two problems. One problem is the computational load imbalance among cores. The other problem is memory access contention. The first method is coarse, flexible partitioning adapted for the H.264 decoding functions, which can balance the load with only a small number of synchronizations. The second method is preloading based on predicting the execution time, which can reduce memory access contention and the redundant waiting time for other cores. Experimental results demonstrate that our proposed two methods can achieve significant improvement of the consumed cycles for decoding H.264 bitstreams of QVGA size from 217 to 96 Mcps (2.3 times speedup) as compared to a non-parallel decoder.