Skip to Main Content
This paper presents several challenges and solutions in designing an efficient Message Passing Interface (MPI) implementation for embedded FPGA applications. Popular MPI implementations are designed for general-purpose computers which have significantly different properties and trade-offs than embedded platforms. Our work focuses on two types of interactions that are not present in typical MPI implementations. First, a number of improvements designed to accelerate software-hardware interactions are introduced, including a Direct Memory Access (DMA) engine with MPI functionality; the use of non-interrupting, non-blocking messages; and a proposed function, called MPI_Coalesce, to reduce the function call overhead from a series of sequential messages. These improvements resulted in a speed-up of 5-fold compared to an embedded software-only MPI implementation. Next, a novel dataflow message passing model is presented for hardware-hardware interactions to overcome the limitations of atomic messages, allowing hardware engines to communicate and compute simultaneously. This dataflow model provides a natural method for hardware designers to build high performance, MPI systems. Finally, two hardware cores, Tee cores and message watchdog timers, are introduced to provide a transparent method of debugging hardware MPI designs.