Skip to Main Content
PGAS languages' support of a global name space facilitates the expression of parallel algorithms, since communication is implicit. This is especially convenient when writing irregular applications with data-dependent, dynamically changing communication patterns. However, programming in a shared memory style, with no explicit control of communication, may result in poor performance. The problem may be due to weaknesses of current implementations of PGAS languages or limitations inherent in these languages. To clarify which is the case, we discuss an implementation in UPC of the Barnes-Hut algorithm. A literal port of a good quality shared-memory implementation (merely replacing shared arrays with partitioned global ar- rays) achieves abysmal performance more than 1000 times worse than a message-passing implementation. We achieve in UPC a performance comparable to message-passing with a series of optimizations. Most of these optimizations could be performed with limited changes in the source code using an enhanced run-time and a few language extensions or pragmas. We discuss the implications to the programmer, the compiler and PGAS languages themselves.