This paper describes the design and VLSI implementation of a highly efficient, single-port SRAM-based deblocking filter. It can achieve 204 cycles/macroblock throughput for H.264/AVC real-time decoding. Several deblocking filter designs in the literature have been compared and the possibility of realizing them in a pipeline is studied. Eventually we came up with a completely new design which has a five-stage pipeline with gated clock to increase system throughput while reducing power. Data hazards and structure hazards, which are the two most critical issues for a pipelined filter, are analyzed and resolved. Efficient memory organization for both on-chip SRAM and transposition buffers is employed. By using innovative hybrid edge filtering sequence and out-of-order memory update scenario, we obtain zero stall cycle in normal pipeline flow, making the best out of a pipelined architecture. Compared with existing designs, our design achieves at least 18% clock cycle reduction, as well as 20% lower power consumption owing to its efficient pipeline and memory architecture. The total gate count is comparable to other designs in literature without using any expensive two-port or dual-port on-chip SRAMs.
Published in:
Circuits and Systems for Video Technology, IEEE Transactions on
(Volume:18
,
Issue:
3
)
Date of Publication: March 2008