Skip to Main Content
The H.264 decoder has a sequential, control intensive front end that makes it difficult to leverage the potential performance of emerging manycore processors. Preparsing is a functional parallelization technique to resolve this front end bottleneck. However, the resulting parallel macro block (MB) rendering tasks have highly input-dependent execution times and precedence constraints, which make them difficult to schedule efficiently on manycore processors. To address these issues, we propose a two step approach: (i) a custom preparsing technique to resolve control dependencies in the input stream and expose MB level data parallelism, (ii) an MB level scheduling technique to allocate and load balance MB rendering tasks. The run time MB level scheduling increases the efficiency of parallel execution in the rest of the H.264 decoder, providing 60% speedup over greedy dynamic scheduling and 9-15% speedup over static compile time scheduling for more than four processors. The preparsing technique coupled with run time MB level scheduling enables a potential 7times speedup for H.264 decoding.