I. Introduction
State-of-the-art page tables use the radix-tree organization [5], [42]. On a TLB miss, address translation proceeds by walking a tree of pages that progressively direct the search to the leaf that contains the desired translation. This approach uses memory efficiently and has been highly optimized with multiple caching structures over decades. However, it is hardly scalable. The reason is that, to obtain the correct translation, the system potentially needs to perform up to four memory accesses in sequence. Each access uses as its address the value returned by the previous access. This process can be slow and does not leverage the memory-level parallelism afforded by modern processors. Furthermore, this process is getting slower, as another level is being added to the translation tree to cover the larger memory needs of emerging applications [40], [41].