Skip to Main Content
Summary form only given. Reducing the effect of hot spots is increasingly important to gain performance out of modern processor clusters. Traditionally, compiler techniques have been used for static analysis of hot spot patterns in parallel applications. The operating system then performs the optimization to reduce the overhead of hot spots. However, hot spots cannot be avoided due to the dynamic nature of applications. We propose a new hot spot optimization scheme based on a broadcast-based optical interconnection network, the SOME-Bus, where each node has a dedicated broadcast channel to connect with other nodes without any contention. The scheme introduces additional hardware to considerably reduce the latency of hot spot request/acknowledges. Hot spots are assumed to be identifiable either through static analysis, or by a run-time profiler. Our scheme then provides a way to cache these hot spot blocks much closer to the network/channel, thereby providing a very low latency path between the input and the output queues in the network. The technique has been implemented in a SOME-Bus simulator, and verified with popular parallel algorithms like matrix-matrix multiplication. Preliminary results show that the scheme results in the reduction of completion times of applications by up to 24% over a system without channel caching.