Skip to Main Content
Main memory latencies have a strong impact on the overall execution time of the applications. The need of efficiently scheduling the costly DRAM memory resources in the different motherboards is a major concern in cluster computers. Most of these systems implement remote access capabilities which allow the OS to access to remote memory. In this context, efficient scheduling becomes even more critical since remote memory accesses may be several orders of magnitude higher than local accesses. These systems typically support interleaved memory at cache-block granularity. In contrast, in this paper we explore the impact on the system performance when allocating memory at OS page granularity. Experimental results show that simply supporting interleaved memory at OS page granularity is a feasible solution that does not impact on the performance of most of the benchmarks. Based on this observation we investigated the reasons of performance drops in those benchmarks showing unacceptable performance when working at page granularity. The results of this analysis lead us to propose two memory allocation policies, namely on-demand (OD) and Most-accessed in-local (Mail). The OD policy first places the requested pages in local memory, once this memory region is full, the subsequent memory pages are placed in remote memory. This policy shows good performance when the most accessed pages are requested and allocated before than the least accessed ones, which as proven in this work, is the most common case. This simple policy reaches performance improvements by 25% in some benchmarks with respect to a typical block interleaving memory system. Nevertheless, this strategy has poor performance when a noticeable amount of the least accessed pages are requested before than the most accessed ones. This performance drawback is solved by the Mail allocation policy by using profile information to guide the allocation of new pages. This scheme always outperforms the baseline block interleavin- policy and, in some cases, improves the performance of the OD policy by 25%.