Skip to Main Content
Intra prediction is the most important technology in H.264/AVC intra frame encoder. But there is extremely complicated data dependency and an immense amount of computation in intra prediction process. In order to meet the requirements of real-time coding and avoid hardware waste, this paper presents a parallel and high efficient H.264/AVC intra prediction architecture which targets high-resolution (e.g. 4k×2k) video encoding applications. In this architecture, the optimized intra 4×4 prediction engine can process sixteen pixels in parallel at a slightly higher hardware cost (compared to the previous four-pixel parallel architecture). The intra 16×16 prediction engine works in parallel with intra 4×4 prediction engine. It reuses the adder-tree of Sum of Absolute Transformed Difference (SATD) generator. Moreover, in order to reduce the data-dependency in intra 4×4 reconstruction loop, a block-level and mode-level co-reordering strategy is proposed. Therefore, the performance bottleneck of H.264/AVC intra encoding can be alleviated to a great extent. The proposed architecture supports full-mode intra prediction for H.264/AVC baseline, main and extended profiles. It takes only 163 cycles to complete the intra prediction process of one macroblock (MB). This design is synthesized with a SMIC 0.13μm CMOS cell library. The result shows that it takes 61k gates and can run at 215MHz, supporting real-time encoding of 4k×2k@40fps video sequences.