Enhancing the performance of tiled loop execution onto clusters using memory mapped network interfaces and pipelined schedules | IEEE Conference Publication | IEEE Xplore