Skip to Main Content
In this paper we present thread flux (TFlux), a complete system that supports the data-driven multithreading (DDM) model of execution. TFlux virtualizes any details of the underlying system therefore offering the same programming model independently of the architecture. To achieve this goal, TFlux has a runtime support that is built on top of a commodity operating system. Scheduling of threads is performed by the thread synchronization unit (TSU), which can be implemented either as a hardware or a software module. In addition, TFlux includes a preprocessor that, along with a set of simple compiler directives, allows the user to easily develop DDM programs. The preprocessor then automatically produces the TFlux code, which can be compiled using any commodity C compiler, therefore automatically producing code to any ISA. TFlux has been validated on three platforms. A Simics-based multicore system with a TSU hardware module (TFluxHard), a commodity 8-core Intel Core2 QuadCore-based system with a software TSU module (TFluxSoft), and a Cell/BE system with a software TSU module (TFluxCell). The experimental results show that the performance achieved is close to linear speedup, on average 21x for the 27 nodes TFluxHard, and 4.4x on a 6 nodes TFluxSoft and TFluxCell. Most importantly, the observed speedup is stable across the different platforms thus allowing the benefits of DDM to be exploited on different commodity systems.