Current embedded systems are usually designed for data-dominated applications, but they have a tight energy and time budget. Scratch-pad memories are completely software-controlled memories with predictable behaviour and good performance and energy characteristics, thus they tend to become a standard feature in many embedded systems. However, their predictability is not helping if the application accesses its data dynamically, when the addresses of the accessed data depend on the application's input. In such cases, predetermining the scratch-pad content at design-time is not always possible as the compiler cannot predict the runtime input. Moreover, in this case, both data reuse and data placement in the scratch-pad are inefficient because chunks of data already stored cannot be efficiently reused and combined with the runtime accessed data blocks. State-of-the art techniques copy each new data block to the scratch-pad without considering whether portions of them are already in it. Such dynamic temporal locality cannot be predicted or exploited by the compiler. The authors here present a system architecture, strongly connected to the system's scratch-pad and the processor's compiler, which is able to efficiently exploit run-time data reuse in the scratch-pad by being capable of holding valuable information, such as the exact data contents of the scratch-pad at runtime, and using it to do all the necessary operations for placing each new data block in scratch-pad. It is fine tuned for applications with run-time reuse between rectangular data blocks. The application domain of the proposed architecture is multimedia applications with run-time reuse, certain applications with linked lists and multi-threaded applications. It operates in a time and energy-efficient manner when compared with existing scratch-pad architectures without the authors' scratch-pad accelerator engine, showing its higher normalised performance and lower normalised energy consumption. Experime- tal results show up to 2.5 times performance increase compared with existing scratch-pad architectures and 5 times compared with cache architectures and energy decrease up to 1.9 and 3.9 times, respectively.