Skip to Main Content
Clustered micro architectures represent a viable solution for addressing wire delays in communication-bound architectures by partitioning monolithic data path structures into smaller components. While supporting high frequencies, clustered processors usually degrade the instruction throughput due to the inter-cluster communication delays and non-balanced workload distribution. In this paper, we propose and evaluate novel instruction steering policies to reduce or eliminate cross-cluster communication delays while respecting workload balance. Our first technique hides the inter-cluster communication latencies by examining operand readiness information. The proposed policy steers instructions with two register sources to the cluster predicted to generate the last-produced operand. While the later-produced operand is being generated, the transport of the early-produced operand can occur in parallel, hiding the communication delay. Our second technique steers an entire group of instructions co-renamed in a cycle to the same cluster if the number of intra-group register dependencies exceed a threshold. This is done in a round-robin fashion in order to reduce impact on workload balancing.