Skip to Main Content
The main contributors to message delivery latency in message passing environments are the copying operations needed to transfer and bind a received message to the consuming process/thread. A significant portion of the software communication overhead is attributed to message copying. Recently, a set of factors, such as poor performance/power efficiency and limited design scalability in monolithic designs, has been leading high-performance processor architectures toward designs that feature multiple processing cores on a single chip (a.k.a. CMP). In this work we study and quantify the latency of different techniques to facilitate the receiving and binding the arrived data in message passing applications running on a cell processor, which is an asymmetric chip-multiprocessors.