Skip to Main Content
In this paper, we present two efficient and low power H.264 deblocking filter (DBF) hardware implementations that can be used as part of an H.264 video encoder or decoder for portable applications. The first implementation (DBF_4times4) starts filtering the available edges as soon as a new 4times4 block is ready by using a novel edge filtering order to overlap the execution of DBF module with other modules in the H.264 encoder/decoder. Overlapping the execution of DBF hardware with the execution of the other modules in the H.264 encoder/decoder improves the performance of the H.264 encoder/decoder. The second implementation (DBF_16times16) starts filtering the available edges after a new 16times16 macroblock is ready. Both DBF hardware architectures are implemented in Verilog HDL and both implementations are synthesized to 0.18 mum UMC standard cell library. Both DBF implementations can work at 200 MHz and they can process 30 VGA (640times480) frames per second. DBF_4times4 and DBF_16times16 hardware implementations, excluding on-chip memories, are synthesized to 7.4 K and 5.3 K gates respectively. These gate counts are the lowest among the H.264 DBF hardware implementations presented in the literature. Our hardware implementations are more cost effective solutions for portable applications. DBF_16times16 has 36% less power consumption than DBF_4times4 on a Xilinx Virtex II FPGA on an Arm Versatile PB926EJ-S development board. Therefore, DBF_4times4 hardware can be used in an H.264 encoder or decoder for which the performance is more important, whereas DBF_16times16 hardware can be used in an H.264 encoder or decoder for which the power consumption is more important.