Skip to Main Content
Fat-tree networks are the most popular topology among indirect networks in today's supercomputers. Current supercomputers are generally operated in a shared environment under the control of a job scheduler, executing many parallel applications simultaneously. The competition between these applications to use the same network resources causes a degradation in the applications' performance. The application that has to wait for the network resources occupied by another application's messages is said to be experiencing inter-application contention. The extent of degradation caused by inter-application contention is known to depend on multiple factors: the network topology, the routing scheme, the task-placement, etc. Note that these factors also a direct intra-application contention. Our work evaluates the impact of inter-application contention for actual competing HPC workloads under different routing schemes in slimmed Fat Trees. In contrast with previous works, which focus mostly on individual application's performance, we take a more system-centric view. Our work estimates the amount of system performance loss that inter-application contention contributes in current HPC systems, which we have measured to be around a 10%. We also present a projection of the impact of inter-application contention in the near and mid -term future HPC systems, scaling the node computational power and network link speeds to foreseeable values. Our results suggest that the increase in network speed does not need to keep the same fast pace as the increase in computation power, but it still needs to be scaled up. Our projection for future HPC systems shows that inter-application contention can cause a 15% throughput loss even with link speeds of 40 Gb/s for some application mixes. The difference in impact for a chosen application when running with different mixes leads to the performance variability described in previous works, but our work sets a better bound on the variability than st- - udies performed with an injection of network noise. Finally, we found a high correlation between the communication volume of the applications in a workload and the amount of inter-application contention they experience.
Date of Conference: 18-20 Aug. 2010