Skip to Main Content
Fine tuning the performance of large parallel programs is a very difficult task. A profiling tool can provide detailed insight into the utilization and communication of the different processors, which helps identify performance bottlenecks. In this paper we present a profiler for the fine-grained parallel programming language Split-C, which provides a simple global address space memory model. As our experience shows, it is much more challenging to profile programs that make use of efficient, low-overhead communication. We incorporated techniques which minimize profiling effects on the running program. We quantify the profiling overhead and present several Split-C applications which show that the profiler is useful in determining performance bottlenecks.