Parallel versions of a representative N-body application that uses L. Greengard and V. Rokhlin's (1987) adaptive fast multipole method (FMM) are presented. While parallel implementations of the uniform FMM are straightforward and have been developed on different architectures, the adaptive version complicates the task of obtaining effective parallel performance owing to the nonuniform and dynamically changing nature of the problem domains to which it is applied. The authors propose and evaluate two techniques for providing load balancing and data locality, both of which take advantage of key insights into the method and its typical applications. Using the better of these techniques, they demonstrate 45-fold speedups on galactic simulations on a 48-processor Stanford DASH machine, a state-of-the-art shared address space multiprocessor, even for relatively small problems. They also show good speedups on a two-ring Kendall Square Research KSR-1. Finally, they summarize some key architectural implications of this important computational method.