Skip to Main Content
Distributed Shared-Memory (DSM) multiprocessors provide an attractive combination of cost-effective commodity architecture and, thanks to the shared-memory abstraction, relative ease of programming. Unfortunately, it is well known that tuning applications for scalable performance in these machines is time-consuming. To address this problem, programmers use performance monitoring tools. However, these tools are often costly to run, especially if highly-processed information is desired. In addition, they usually cannot be used to experiment with hypothetical architecture organizations. In this paper, we present Scal-Tool, a tool that isolates and quantifies scalability bottlenecks in parallel applications running on DSM machines. The scalability bottlenecks currently quantified include insufficient caching space, load imbalance, and synchronization. The tool is based on an empirical model that uses as inputs measurements from hardware event counters in the processor. A major advantage of the tool is that it is quite inexpensive to run: it only needs the event counter values for the application running with a few different processor counts and data set sizes. In addition, it provides ways to analyze variations of several machine parameters.