Skip to Main Content
In phylogenetic inference, which aims at finding a phylogenetic tree that best explains the evolutionary relationship among a given set of species, statistical estimation approaches such as maximum likelihood (ML) and Bayesian inference provide more accurate estimates than other nonstatistical approaches. However, the improved quality comes at a higher computational cost, as these approaches, even though heuristic driven, involve optimization over multidimensional real continuous space. The number of possible search trees in ML is at least exponential, thereby making runtimes on even modest-sized datasets to clock up to several million CPU hours. Evaluation of these trees, involving node-level likelihood vector computation and branch-length optimization, can be partitioned into tasks (or kernels), providing the application with the potential to benefit from hardware acceleration. The range of hardware acceleration architectures tried so far offer limited degree of fine-grain parallelism. Network-on-chip (NoC) is an emerging paradigm that can efficiently support integration of massive number of cores on a chip. In this paper, we explore the design and performance evaluation of 2-D and 3-D NoC architectures for RAxML, which is one of the most widely used ML software suites. Specifically, we implement the computation kernels of the top three functions consuming more than 85% of the total software runtime. Simulations show that through appropriate choice of NoC architecture, and novel core design, allocation and placement strategies, our NoC-based implementation can achieve individual function-level speedups of 390x to 847x, speed up the targeted kernels in excess of 6500x, and provide end-to-end runtime reductions up to 5x over state-of-the-art multithreaded software.