Loading [MathJax]/extensions/MathMenu.js
Demonstrating improved application performance using dynamic monitoring and task mapping | IEEE Conference Publication | IEEE Xplore

Demonstrating improved application performance using dynamic monitoring and task mapping


Abstract:

This work demonstrates the integration of monitoring, analysis, and feedback to perform application-to-resource mapping that adapts to both static architecture features a...Show More

Abstract:

This work demonstrates the integration of monitoring, analysis, and feedback to perform application-to-resource mapping that adapts to both static architecture features and dynamic resource state. In particular, we present a framework for mapping MPI tasks to compute resources based on run-time analysis of system-wide network data, architecture-specific routing algorithms, and application communication patterns. We address several challenges. Within each node, we collect local utilization data. We consolidate that information to form a global view of system performance, accounting for system-wide factors including competing applications. We provide an interface for applications to query the global information. Then we exploit the system information to change the mapping of tasks to nodes so that system bottlenecks are avoided. We demonstrate the benefit of this monitoring and feedback by remapping MPI tasks based on route-length, bandwidth, and credit-stalls metrics for a parallel sparse matrix-vector multiplication kernel. In the best case, remapping based on dynamic network information in a congested environment recovered 48.9% of the time lost to congestion, reducing matrix-vector multiplication time by 7.8%. Our experiments focus on the Cray XE/XK platform, but the integration concepts are generally applicable to any platform for which applicable metrics and route knowledge can be obtained.
Date of Conference: 22-26 September 2014
Date Added to IEEE Xplore: 01 December 2014
Electronic ISBN:978-1-4799-5548-0

ISSN Information:

Conference Location: Madrid, Spain

Contact IEEE to Subscribe

References

References is not available for this document.