Skip to Main Content
Variable-block-size motion estimation (VBSME) is a major contributor to H.264/AVCpsilas excellent coding efficiency. However, its high computational complexity and memory requirement make design difficult. In this paper, we propose a memory-efficient hardware architecture for full-search VBSME (FSVBSME). Our architecture consists of sixteen 2-D arrays each consists of 16 times16 processing elements (PEs). Four arrays form a group to match in parallel four reference blocks against one current block. Four groups perform block matching for four current blocks in a consecutive and overlapped fashion. Taking advantage of reference pixel overlapping between multiple reference blocks of a current block and between search windows of several adjacent current blocks, we propose a novel data reuse scheme to reduce memory access. Compared with the popular Level C data reuse method, our design can save 98% of on-chip memory access with only 27% of memory overhead. Synthesized into a TSMC 130nm CMOS cell library, our design takes 453K logic gates and 2.94 K bytes of on-chip memory. Running at 130 MHz, it is capable of processing 1920 times 1088 30 fps video with 64times64 search range (SR) and two reference frames (RF). We suggest a criterion called design efficiency for comparing different related work. It shows that our design is 27% more efficient than the best design to date.