Skip to Main Content
Designing efficient cache, memory, and storage subsystem for modern embedded systems supporting a variety of applications is a great need. Embedded systems are being deployed with multicore processors to help parallel and distributed computing in order to meet the requirements for increased processing speed. Multiple cores offer manifold options to organize multi-level caches. A mixture of cache memory hierarchies are proposed to satisfy the requirements of high-performance low-power multicore embedded systems. In this paper, we investigate the impact of CL2 organizations on the performance and power consumption for multicore embedded systems. We simulate two 4-core architectures, one with shared CL2 and the other one with private CL2s. We use MPEG4, FFT, MI, and DFT applications/algorithms in our experiment. Simulation results depict that the mean delay and total power consumption significantly vary with the variations of CL2 organization and applications. It is observed that reductions in total power consumption and mean delay per task of up to 43% and 36%, respectively, are possible with optimized CL2, with an optimal choice of 256 KB CL2 cache, 64 B CL2 line size, and 8-way CL2 associativity level.