Skip to Main Content
This work presents a domain-specific memory subsystem based on a two-level memory hierarchy. It targets the application domain of video post-processing applications including video enhancement and format conversion. These applications are based on motion compensation and/or broad class of content adaptive filtering to provide the highest quality of pictures. Our approach meets the required performance and has sufficient flexibility for the application domain. It especially aims at the implementation-wise most challenging applications: compute-intensive and bandwidth-demanding applications that provide the highest quality at high picture resolutions. The lowest level of the memory hierarchy, closest to the processing element, the L0 scratchpad, is organized specifically to enable fast retrieval of an arbitrarily positioned 2-D block of pixels to the processing element. To guarantee the performance, most of its addressing logic is hardwired, leaving a user a set of API for initialization and storing/loading the data to/from the L0 scratchpad. The next level of the memory hierarchy, the L1 scratchpad, minimizes the off-chip memory bandwidth requirements. The L1 scratchpad is organized specifically to enable efficient aligned block-based accesses. With lower data rates compared to the L0 scratchpad and aligned block access, software-based addressing is used to enable full flexibility. The two-level memory hierarchy exploits prefetching to further improve the performance.