Skip to Main Content
The increasing computational and communication demands of the scientific and industrial communities require a clear understanding of the performance trade-offs involved in multi-core computing platforms. Such analysis can help application and toolkit developers in designing better, topology aware, communication primitives intended to suit the needs of various high end computing applications. In this paper, we take on the challenge of designing and implementing a portable intra-core communication framework for streaming computing and evaluate its performance on some popular multi-core architectures developped by Intel, AMD and Sun. Our experimental results, obtained on the Intel Nehalem, AMD Opteron and Sun Niagara 2 platforms, show that we are able to achieve an intra-socket small message latency between 120 and 271 nanoseconds depending on the hardware platform, while the inter-socket small message latency is between 218 and 320 nanoseconds. The maximum intra-socket communication bandwidth ranges from 0.179 (Sun Niagara 2) to 6.5 (Intel Nehalem) GBytes/s. We were also able to obtain an inter-socket communication performance of 1.2 and 6.6 GBytes/s for AMD Opteron and Intel Nehalem, respectively.