Skip to Main Content
This paper presents AGAMOS, a technique to modulo schedule loops on clustered microarchitectures. The proposed scheme uses a multilevel graph partitioning strategy to distribute the workload among clusters and reduces the number of intercluster communications at the same time. Partitioning is guided by approximate schedules (i.e., pseudoschedules), which take into account all of the constraints that influence the final schedule. To further reduce the number of intercluster communications, heuristics for instruction replication are included. The proposed scheme is evaluated using the SPECfp95 programs. The described scheme outperforms a state-of-the-art scheduler for all programs and different cluster configurations. For some configurations, the speedup obtained when using this new scheme is greater than 40 percent, and for selected programs, performance can be more than doubled.