Skip to Main Content
Performing modeling and visualization of task-based parallel algorithms is challenging. Libraries such as Intel Threading Building Blocks (TBB) and Microsoft's Parallel Patterns Library provide high-level algorithms that are implemented using low-level tasks. Current tools present performance at this lower level. Developers like to tune and debug at the same level as the coding abstraction, so in this paper we propose tools and a two step methodology that target this level of abstraction. In the first step, the system level metrics of utilization and overhead are collected to determine if performance is acceptable. If a problem is suspected, the second step of our methodology projects these metrics on to the algorithms contained in the application. Using these projections many common performance issues can be quickly diagnosed. We demonstrate our methodology using a prototype implementation that is integrated with the Intel Threading Building Blocks library. We show the flexibility of the approach by analyzing three applications, including a client-server benchmark that uses a parallel_for nested within a parallel pipeline.