Skip to Main Content
The H.264/AVC deblocking filter is becoming the performance bottleneck of H.264/AVC parallelization on many-core platform. Efficient parallelization of the deblocking filter on a many-core platform is challenging, because the deblocking filter has complicated data dependencies, which provide insufficient parallelism for so many cores. Furthermore, parallelization may have significant synchronization and load imbalance overhead. At present, research on the parallelizing deblocking filter on a many-core platform is rare and focuses on data-level parallelization. In this paper, we propose a three-step framework considering task-level segmentation and data-level parallelization to efficiently parallelize the deblocking filter. First, we review the entire deblocking filter process in 4 × 4 block edge-level and divide it into two parts: 1) boundary strength computation (BSC) and 2) edge discrimination and filtering (EDF), which increases the parallelism. Then, we apply the Markov empirical transition probability matrix and Huffman tree (METPMHT) to the BSC, which alleviate the load imbalance problem. Finally, we use an independent pixel connected area parallelization (IPCAP) for the EDF, which increases the parallelism and reduces the synchronization. In experiments, we apply our parallel method to the deblocking filter of the H.264/AVC reference software JM15.1 on the Tile64 platform without any Tile64 platform-based optimizations. Compared to the well-known 2D-wavefront method, the proposed method achieves on average 14.85, 17.83, and 10.60 times speed-up for QCIF, CIF, and HD videos using 62 cores, respectively.