Skip to Main Content
In massive tile-based multi-core architectures, it is important that the execution of the packets of a particular flow takes place in a set of cores physically close to each other in order to minimize the average latency to the common data structures across the local caches of the different cores. An static NUCA implementation provides a substrate for a cost-effective implementation of a cache sharing mechanism. However, a careful mapping of the different data structures in the system's memory, along with a smart load-balancing mechanism of the packets to the different cores, is fundamental in order to avoid long latencies to remote data. This work proposes a methodology for load balancing packets to cores in an S-NUCA tile-based architecture with a large number of cores.