Skip to Main Content
Accelerators such as graphics processing units (GPUs) provide an inexpensive way of improving the performance of cluster systems. In such an arrangement, the individual nodes of the cluster are directly connected to one or more accelerator devices via PCI Express. This results in a static mapping of accelerators onto compute nodes, where each accelerator can only be accessed from exactly one compute node. While this static mapping enables efficient data transfers between a given accelerator and the compute node it belongs to, differing computational demands across jobs may, however, produce either underutilized accelerators or nodes whose computational demands cannot be satisfied with the number of accelerators available to them. In particular, smaller numbers of GPUs available per node may enforce explicit MPI parallelism across compute nodes where it is not necessary. To address this limitation, we propose a novel accelerator-cluster architecture in which network-attached accelerators are dynamically assigned to compute nodes. This allows not only their optimal utilization but also a more precise match between application requirements and accelerator hardware. We outline the general concept of our dynamic architecture and show that it can offer substantial benefits to certain classes of applications without significantly harming the performance of others.