Introduction
During the past three decades, data-oriented attacks have evolved from a theoretical exercise [1] to a serious threat [2]–[7]. During the same time, we have witnessed a plethora of effective security mechanisms that prompted attackers to investigate new directions and exploit less explored corners of victim systems. Specifically, recent advances in Control Flow Integrity (CFI) [8]–[12], Code Pointer Integrity (CPI) [13], [14], and code diversification [15]–[17] have significantly raised the bar for code-reuse attacks. In fact, CFI schemes have been adopted by Microsoft [18], Google [19], and LLVM [20].
Code-reuse attacks chain short code sequences, dubbed gadgets, to hijack an application’s control-flow. It suffices to modify one control-flow structure, such as a function pointer or a return address, with the start of a crafted gadget chain, to cause an application to perform arbitrary computation. In contrast, data-oriented attacks completely avoid changes to the control flow. Instead, these attacks aim to modify non-control data to cause the application to obey the attacker’s intentions [5]–[7]. Typically, an attacker leverages memory corruption vulnerabilities that enable arbitrary read or write primitives to take control over the application’s data. Stitching together a chain of data-oriented gadgets, which operate only on data, allows an attacker to either disclose sensitive information or escalate privileges, without violating an application’s control flow. In this way, data-oriented attacks remain under the radar, despite code-reuse mitigations, and can have disastrous consequences [3]. We anticipate further growth in this direction in the near future, and emphasize the need for practical primitives that eliminate such threats.
Researchers have suggested different strategies to counter data-oriented attacks. Data Flow Integrity (DFI) [21] schemes dynamically track a program’s data flow. Similarly, by introducing memory safety to the C and C++ programming languages, it becomes possible to completely eliminate memory corruption vulnerabilities [22]–[25]. While both directions have the potential to thwart data-oriented attacks, they lack practicality due to high performance overhead, or suffer from compatibility issues with legacy code. Instead of enforcing data flow integrity, researchers have started exploring isolation techniques that govern access to sensitive code and data regions [26]–[28]. Still, most approaches are limited to user space, focus on merely protecting a single data structure, or rely on policies enforced by a hypervisor.
In this paper, we leverage virtualization extensions of Intel CPUs to establish selective memory protection (xMP) primitives that have the capability of thwarting data-oriented attacks. Instead of enhancing a hypervisor with the knowledge required to enforce memory isolation, we take advantage of Intel’s Extended Page Table pointer (EPTP) switching capability to manage different views on guest-physical memory, from inside a VM, without any interaction with the hypervisor. For this, we extended Xen
We use xMP to protect two sensitive kernel data structures that are vital for the system’s security, yet are often disregarded by defense mechanisms: page tables and process credentials. In addition, we demonstrate the generality of xMP by guarding sensitive data in common, security-critical (user-space) libraries and applications. Lastly, in all cases, we evaluate the performance and effectiveness of our xMP primitives.
In summary, we make the following main contributions:
We extend the Linux kernel to realize xMP, an in-guest memory isolation primitive for protecting sensitive data against data-oriented attacks in user and kernel space.
We present methods for combining Intel’s EPTP switching and Xen
altp2m to control different guest-physical memory views, and isolate data in disjoint xMP domains.We apply xMP to guard the kernel’s page tables and process credentials, as well as sensitive data in user-space applications, with minimal performance overhead.
We integrate xMP into the Linux namespaces framework, forming the basis for hypervisor-assisted OS-level virtualization protection against data-oriented attacks.
Background
A. Memory Protection Keys
Intel’s Memory Protection Keys (MPK) technology supplements the general paging mechanism by further restricting memory permissions. In particular, each paging structure entry dedicates four bits that associate virtual memory pages with one of 16 protection domains, which correspond to sets of pages whose access permissions are controlled by the same protection key (
A benefit of MPK is that it allows user threads to independently and efficiently harden the permissions of large memory regions. For instance, threads can revoke write access from entire domains without entering kernel space, walking and adjusting page tables, and invalidating TLBs; instead, threads can just set the write disable bit of the corresponding
Although Intel announced MPK in 2015 [34], it was integrated only recently, and so far only to the Skylake-SP Xeon family, which is dedicated to high-end servers. Hence, a need for similar isolation features remains on desktop, mobile, and legacy server CPUs. Another issue is that attackers with the ability to arbitrarily corrupt kernel memory can (i) modify the per-thread state (in kernel space) holding the access permissions of protection domains, or (ii) alter protection domain bits in page table entries. This allows adversaries to deactivate restrictions that otherwise are enforced by the MMU. Lastly, the isolation capabilities of MPK are geared towards user-space pages. Sensitive data in kernel space thus remains prone to unauthorized access. In fact, there is no equivalent mechanism for protecting kernel memory from adversaries armed with arbitrary read and write primitives. Consequently, there is a need for alternative memory protection primitives, the creation of which is the main focus of this work.
B. The Xen altp2m Subsystem
Virtual Machine Monitors (VMMs) leverage Second Level Address Translation (SLAT) to isolate physical memory that is reserved for VMs [35]. In addition to in-guest page tables that translate guest-virtual to guest-physical addresses, the supplementary SLAT tables translate guest-physical to host-physical memory. Unauthorized accesses to guest-physical memory, which is either not mapped or lacks privileges in the SLAT table, trap into the VMM [36], [37]. As the VMM exclusively maintains the SLAT tables, it can fully control a VM’s view on its physical memory [29], [30], [38], [39]. Xen’s physical-to-machine subsystem (
Unfortunately, protecting data through a single global view (i) incurs a significant overhead and (ii) is prone to race conditions in multi-vCPU environments. Consider a scenario in which a guest advises the VMM to read-protect sensitive data on a specific page. By revoking read permissions in the SLAT tables, illegal read accesses to the protected page, e.g., due to malicious memory disclosure attempts, would violate the permissions and trap into the VMM. At the same time, for legal guest accesses to the protected page frame, the VMM has to temporarily relax its permissions. Whenever the guest needs to access the sensitive information, it has to instruct the VMM to walk the SLAT tables—an expensive operation. More importantly, temporarily relaxing permissions in the global view creates a window of opportunity for other vCPUs to freely access the sensitive data without notifying the VMM.
The Xen alternate p2m subsystem (
C. In-Guest EPT Management
Xen
To pick up the above scenario, the guest can instruct the system to isolate and relax permissions to selected memory regions, on-demand, using Xen’s
Threat Model
We expect the system to be protected from code injection [43] through Data Execution Prevention (DEP) or other proper W^X policy enforcement, and to employ Address Space Layout Randomization (ASLR) both in kernel [44], [45] and user space [15], [46]. Also, we assume that the kernel is protected against return-to-user (ret2usr) [47] attacks through SMEP/SMAP [35], [48], [49]. Other hardening features, such as Kernel Page Table Isolation (KPTI) [50], [51], stack smashing protection [52], and toolchain-based hardening [53], are orthogonal to xMP—we neither require nor preclude the use of such features. Moreover, we anticipate protection against state-of-the-art code-reuse attacks [4], [54]–[56] via either (i) fine-grained CFI [57] (in kernel [58] and user space [59]) coupled with a shadow stack [60], or (ii) fine-grained code diversification [61], [62], and with execute-only memory (available to both kernel [31] and user space [16]).
Assuming the above state-of-the-art protections prevent an attacker from gaining arbitrary code execution, we focus on defending against attacks that leak or modify sensitive data in user or kernel memory [16], [31], by transforming memory corruption vulnerabilities into arbitrary read and write primitives. Attackers can leverage such primitives to mount data-oriented attacks [7], [63] that (i) disclose sensitive data, such as cryptographic material, or (ii) modify sensitive data structures, such as page tables or process credentials.
Design
To fulfil the need for a practical mechanism for the protection of sensitive data, we identify the following requirements:
❶ Partitioning of sensitive kernel and user-space memory regions into individual domains.
❷ Isolation of memory domains through fine-grained access control capabilities.
❸ Context-bound integrity of pointers to memory domains.
xMP uses different Xen
Although the x86 architecture allows for memory partitioning through segmentation or paging ❶, it lacks fine-grained access control capabilities for effective memory isolation ❷ (e.g., there is no notion of non-readable pages; only non-present pages cannot be read). While previous work isolates user-space memory by leveraging unused, higher-privileged x86 protection rings [64], isolation of kernel memory is primarily achieved by Software-Fault Isolation (SFI) solutions [31]. Even though the page fault handler could be extended to interpret selected non-present pages as non-readable, switching permissions of memory regions that are shared among threads or processes on different CPUs can introduce race conditions: granting access to isolated domains by relaxing permissions inside the global page tables may reveal sensitive memory contents to the remaining CPUs. Besides, each permission switch would require walking the page tables, and thus frequent switching between a large number of protected pages would incur a high run-time overhead. Lastly, the modern x86 architecture lacks any support for immutable pointers. Although ARMv8.3 introduced the Pointer Authentication Code (PAC) [65] extension, there is no similar feature on x86. As such, x86 does not meet requirements ❷ and ❸.
In this work, we fill this gap by introducing selective memory protection (xMP) primitives that leverage virtualization to define efficient memory isolation domains—called xMP domains—in both kernel and user space, enforce fine-grained memory permissions on selected xMP domains, and protect the integrity of pointers to those domains (Figure 1). In the following, we introduce our xMP primitives and show how they can be used to build practical and effective defenses against data-oriented attacks in both user and kernel space. We base xMP on top of x86 and Xen [40], as it relies on virtualization extensions that are exclusive to the Intel architecture and are already used by Xen. Still, xMP is by no means limited to Xen, as we further discuss in § VIII-D. Furthermore, xMP is both backwards compatible with, and transparent to, non-protected and legacy applications.
A. Memory Partitioning through xMP Domains
To achieve meaningful protection, applications may require multiple disjoint memory domains that cannot be accessible at the same time. For instance, an xMP domain that holds the kernel’s hardware encryption key must not be accessible upon entering an xMP domain containing the private key of a user-space application. The same applies to multi-threaded applications in which each thread maintains its own session key that must not be accessible by other threads. We employ Xen
The straightforward way of associating an
To accommodate n xMP domains, we define n+1
An alternative approach to using
The system configures n + 1
B. Isolation of xMP Domains
We establish a memory isolation primitive that empowers guests to enforce fine-grained permissions on the guest’s page frames. To achieve this, we extended the Xen interface to allow utilizing
Consider an in-guest application that handles sensitive data, such as passwords, cookies, or cryptographic keys. To protect this data, the application can use the memory partitioning primitives that leverage
This scheme combines the best of both worlds: flexibility in defining policies, and fine-grained permissions that are not available to the traditional x86 MMU. Our primitives allow in-guest applications to revoke read and write permissions on data pages, without making them non-present, and to configure code pages as execute-only, hence satisfying requirement ❷.
C. Context-bound Pointer Integrity
For complete protection, we have to ensure the integrity of pointers to sensitive data within xMP domains. Otherwise, by exploiting a memory corruption vulnerability, adversaries could redirect pointers to (i) injected, attacker-controlled objects outside the protected domain, or (ii) existing, high-privileged objects inside the xMP domain.
As x86 lacks support for pointer integrity (in contrast to ARM, in which PAC [65], [66] was recently introduced), we protect pointers to objects in xMP domains in software. We leverage the Linux kernel implementation of SipHash [67] to compute Keyed-Hash Message Authentication Codes (HMACs), which we use to authenticate selected pointers. SipHash is a cryptographically strong family of pseudorandom functions. Contrary to other secure hash functions (including the SHA family), SipHash is optimized for short inputs, such as pointers, and thus achieves higher performance. To reduce the probability of collisions, SipHash uses a 128-bit secret key. The security of SipHash is limited by its key and output size. Yet, with pointer integrity, the attacker has only one chance to guess the correct value; otherwise, the application will crash and the key will be re-generated.
To ensure that pointers cannot be illegally redirected to existing objects, we bind pointers to a specific context that is unique and immutable. The
Modern x86 processors use a configurable number of page table levels that define the size of virtual addresses. On a system with four levels of page tables, addresses occupy the first 48 least-significant bits. The remaining 16 bits are sign-extended with a value dependent on the privilege level: they are filled with ones in kernel space and with zeros in user space [68]. This allows us to reuse the unused, sign-extended part of virtual addresses and to truncate the resulting HMAC to 15 bits. At the same time, we can use the most-significant bit 63 of a canonical address to determine its affiliation—if bit 63 is set, the pointer references kernel memory. This allows us to establish pointer integrity and ensure that pointers can be used only in the right context ❸.
Contrary to ARM PAC, instead of storing keys in registers, we maintain one SipHash key per xMP domain in memory. After generating a key for a given xMP domain, we grant the page holding the key read-only access permissions inside the particular domain (all other domains cannot access this page). In addition, we configure Xen
Implementation
We extended the Linux memory management system to establish memory isolation capabilities that allow us to partition ❶ selected memory regions into isolated ❷ xMP domains. During the system boot process, once the kernel has parsed the
Note that (i) is responsible for managing (contiguous) pages frames, (ii) manages memory in sub-page granularity, and (iii) supports only page-multiple allocations. To provide maximum flexibility, we extend both (i) and (ii) to selectively shift allocated pages into dedicated xMP domains (Figure 3); (iii) is transparently supported by handling (i). This essentially allows us to isolate either arbitrary pages or entire slab caches. By additionally generating context-bound authentication codes for pointers referencing objects residing in the isolated memory, we meet all requirements ❶-❸.
A. Buddy Allocator
The Linux memory allocators use get-free-page (
Extensions to the slab and buddy allocator facilitate shifting allocated pages and slabs into xMP domains enforced by Xen
During the assignment of allocated pages to xMP domains, we record the
B. Slab Allocator
The slab allocator builds on top of the buddy allocator to subdivide allocated pages into small, sub-page sized objects (Figure 3), to reduce internal fragmentation that would otherwise be introduced by the buddy allocator. More precisely, the slab allocator maintains slab caches that are dedicated to frequently used kernel objects of the same size [70]. For instance, the kernel uses a cache for all
Every slab cache groups collections of continuous pages into so-called slabs, which are sliced into small-sized objects. Disregarding further slab architecture details, as the allocator manages slabs in dedicated pages, this design allows us to place selected slabs into isolated xMP domains using the underlying buddy allocator. To achieve this, we extend the slab implementation so that we can provide the
C. Switches across Execution Contexts
The Linux kernel is a preemptive, highly-parallel system that must preserve the process-specific or thread-specific state on (i) context switches and (ii) interrupts. To endure context switches, and also prevent other threads from accessing isolated memory, it is essential to include the index of the thread’s (open) xMP domain into its persistent state.1
1) Context Switches:
In general, operating systems associate processes or threads with a dedicated data structure, the Process Control Block (PCB): a container for the thread’s state that is saved and restored upon every context switch. On Linux, the PCB is represented by the
2) Hardware Interrupts:
Interrupts can pause a thread’s execution at arbitrary points. In our current prototype, accesses to memory belonging to any of the xMP domains are restricted in interrupt (IRQ) context. (We plan on investigating primitives for selective memory protection in IRQ contexts in the future.) To achieve this, we extend the prologue of every interrupt handler and cause it to switch to the restricted view. This way, we prevent potentially vulnerable interrupt handlers from illegally accessing protected memory. Once the kernel returns control to the interrupted thread, it will cause a memory access violation when accessing the isolated memory. Yet, instead of trapping into the VMM, the thread will trap into the in-guest
3) Software Interrupts:
The above extensions introduce a restriction with regard to nested xMP domains. Without maintaining the state of nested domains, we require every thread to close its active domain before opening another one; by nesting xMP domains, the state of the active domain will be overwritten and lost. Although we can address this requirement for threads in process context, it becomes an issue in interrupt context: the former executes (kernel and user space) threads that are tied to different
In contrast to hardware interrupts that disrupt the system’s execution at arbitrary locations, the kernel explicitly schedules software interrupts (
The Linux kernel configures 10
To approach this issue, we leverage the callback-free RCU feature of Linux (
D. User Space API
We grant user processes the ability to protect selected memory regions by extending the Linux kernel with four new system calls that allow processes to use xMP in user space (Figure 4). Specifically, applications can dynamically allocate and maintain disjoint xMP domains in which sensitive data can remain safe (❶-❷). Furthermore, we ensure that attackers cannot illegally influence a process’ active xMP domain state by binding its integrity to the thread’s context (❸).
Linux has provided an interface for Intel MPK since kernel v4.9. This interface comprises three system calls,
User-space applications interact with the Linux kernel through
Contrary to the MPK implementation of Linux, we do not use the unprivileged
Use Cases
We demonstrate the effectiveness and usefulness of xMP by applying it on: (a) page tables and process credentials, in the Linux kernel; and (b) sensitive in-process data in four security-critical applications and libraries.
A. Protecting Page Tables
With Supervisor Mode Execution Protection (SMEP) [48], the kernel cannot execute code in user space; adversaries have to first inject code into kernel memory to accomplish their goal. Multiple vectors exist that allow attackers to (legitimately) inject code into the kernel. In fact, system calls use the routine
Our goal is to leverage xMP to prevent adversaries from illegally modifying (i) page table contents and (ii) pointers to page tables. At the same time, xMP has to allow the kernel to update page table structures from authorized locations. With the exception of the initial page tables that are generated during the early kernel boot stage, the kernel uses the buddy allocator to allocate memory for new sets of page tables. Using the buddy allocator, we move every memory page holding a page table structure into a dedicated xMP domain, to which we grant read-write access permissions (§ V-A), and limit the access of remaining domains to read-only. As the kernel allocates the initial page tables statically, we manually inform Xen
In addition, we extend the kernel’s process and thread creation functionality to protect the integrity of every
Still, we cannot bind the
We highlight that even with KPTI [51], [74] (the Melt-down mitigation feature of Linux that avoids simultaneously mapping user and kernel space), it is possible to authenticate
B. Protecting Process Credentials
Linux kernel credentials describe the properties of various objects that allow the kernel to enforce access control and capability management. This makes them an attractive target of data-oriented privilege escalation attacks.
Similarly to protecting paging structures, our goal is to prevent adversaries from (i) illegally overwriting process credentials in
Linux prepares the slab cache
As every
C. Protecting Sensitive Process Data
An important factor for the deployment of security mechanisms is their applicability and generality. To highlight this property, we apply xMP to guard sensitive data in OpenSSL under Nginx,
Evaluation
A. System Setup
Our setup consists of an unprivileged domain
B. Performance Evaluation
To evaluate the performance impact of xMP we conducted two rounds of experiments, focusing on the overhead incurred by protecting sensitive data in kernel and user space. All reported results correspond to vanilla Linux vs. xMP-enabled Linux (both running as
1) Kernel Memory Isolation:
We measured the performance impact of xMP when applied to protect the kernel’s page tables (PT) and process credentials (Cred) (§ VI-A and § VI-B). We used a set of micro (LMbench v3.0) and macro (Phoronix v8.6.0) benchmarks to stress different system components, and measured the overhead of protecting (i) each data structure individually, and (ii) both data structures at the same time (which requires two disjoint xMP domains).
Table I shows the LMbench results, focusing on latency and bandwidth overhead. This allows us to get some insight on the performance cost at the system software level. Overall, the overhead is low in most cases for both protected page tables and process credentials. When protecting page tables, we notice that the performance impact is directly related to functionality that requires explicit access to page tables, with outliers related to page faults and process creation (
To investigate the cause of the performance drop for the outliers (UNIX socket I/O,
Performance impact of xMP on Nginx with varying file sizes and number of connections (X-axis: [file size (KB)]-[# requests]).
Table II presents the results for the set of Phoronix macro-benchmarks used by the Linux kernel developers to track performance regressions. The respective benchmarks are split into stress tests, targeting one specific system component, and real-world applications. Overall, with only a few exceptions, the results show that xMP incurs low performance overhead, especially for page table protection. Specifically, we observe a striking difference between the read (R) and write (W) Threaded I/O tests: while the
2) In-Process Memory Isolation:
We evaluated the overhead of in-process memory isolation using our xMP-protected versions of the Nginx and mbed TLS servers (§ VI-C). In both cases, we used the server benchmarking tool
C. Scalability of xMP Domains
Hardware-based memory isolation features, similar to xMP, support only a small number of domains. For instance, Intel MPK and ARM Domain Access Control (DAC) implement only 16 domains. Nevertheless, we investigate scenarios in which a high number of domains becomes necessary. Modern infrastructures massively deploy OS-level virtualization (i.e., containers), for which Linux namespaces [88] provide an essential building block by establishing different views on selected global system resources. By integrating xMP into Linux namespaces to isolate selected system resources (§ VI), we establish (i) the foundation for hypervisor-assisted OS-level virtualization, and (ii) the means to evaluate the scalability of xMP domains.
To that end, we introduce xMP namespaces to isolate process page tables. (Note that xMP namespaces can be extended to isolate arbitrary data structures.) Specifically, we use the
To measure the impact of an increasing number of xMP domains, we customized the Phoronix Hackbench scheduler stress test. Our adjustments cause the benchmark to place groups of 10 processes each (five senders and five receivers exchanging 50K messages) into separate xMP namespaces. In its standard configuration, Xen supports up to 10
We compare the overhead of an xMP-capable Linux kernel with a vanilla one. Figure 6 shows the scheduling overhead of up to 250 distinct xMP namespaces. (Again, results are means over 10 runs.) Overall, the isolation overhead accumulates linearly with the number of xMP domains—each domain contains the page tables of 10 processes. However, by increasing the number of processes (250 xMP domains correspond to 2.5K processes), the time required to schedule and run each stress test (i.e., 10 processes exchanging 50K messages) amortizes the overhead, which can even drop to about 2%. Further, this experiment presents the ability of our prototype to scale up to 250 distinct isolation domains, an order of magnitude more than what can be achieved by existing schemes, like Intel MPK and ARM DAC (16 domains).
Performance impact of up to 250 xMP domains on the scheduler, measured using the customized Phoronix Hackbench stress test.
Lastly, note that in the experiment above, page tables are assigned to isolated xMP domains during process creation, but are populated while the benchmark is executing, due to copy-on-write and dynamic memory allocations. Therefore, the experiment also captures the management overhead of our prototype when it dynamically propagates changes to the corresponding restricted domain views.
D. Security Evaluation
We evaluated the security of our memory protection primitives using real-world exploits against (i) page tables, (ii) process credentials, and (iii) sensitive data in user space. Despite a strong attacker with arbitrary read and write primitives to kernel and user memory, by meeting the requirements ❶-❸, our system blocks illegal accesses to sensitive data.
1) Attacking the Kernel:
We assume an attacker who aims to elevate their privilege using an arbitrary read and write primitive in kernel memory. To evaluate this scenario, we used a combination of real-world exploits that achieve the aforementioned capability. We first reconstructed an exploit to bypass KASLR [76]. The
In the next step, we implemented two different attacks that target (i) the page tables and (ii) the credentials of a given process, respectively. In the first attack, we used the write primitive to modify individual page table entries of the target process. This allowed us to grant the write permission to (an otherwise execute-only mapped) kernel code page with a rarely used system call handler, which is overwritten with shellcode that disables SMEP and SMAP in the
To systematically evaluate xMP, we consider attacks that can be equally applied to all kernel structures. We generalize the attack vectors against sensitive kernel structures in the following strategies. Under our threat model, attackers can:
directly modify the data structure(s) of interest;
redirect a pointer of the targeted data structure to an injected, attacker-controlled instance;
redirect a pointer of the targeted data structure to an existing instance with higher privileges.
xMP withstands modification attempts of the protected data structures (❶-❷), as only authorized kernel code can enter the associated xMP domains. For instance, when protecting page tables, without first hijacking the kernel’s execution, the attacker reaches an impasse on how to modify page tables isolated in xMP domains. Injecting code is thus prevented in the first place. Alternatively, the attacker can modify a thread’s pointer to a sensitive data structure. In this case, the modified value must comply with the added context-bound integrity (❸) that is enforced on every context-switch or right before accessing the sensitive data structure (§ IV-C). Since attackers do not know the secret key, they cannot compute an HMAC that would validate the pointer’s integrity. Consequently, attackers cannot redirect the pointer to an injected data structure.
To sidestep the secret key, attackers could overwrite the pointer with an existing pointer (holding a valid HMAC) to a data structure instance with higher privileges. Yet, as pointers to xMP-protected data are bound to the thread’s context (❸), attackers cannot redirect pointers to instances belonging to other threads. Note that attackers would have to overwrite the
2) Attacking User Applications:
We chose Heartbleed [3] as a representative data leakage attack due to its high impact. As a result of the lack of a bounds check of the attacker-controlled
3) Attacking Protection Primitives:
Our user-space API does not use the
Further, mediating the execution of
4) I/O Attacks:
Compromised I/O devices or drivers can access memory that holds sensitive data. To address this threat, the VMM should confine device-accessible memory (i) by employing the system’s IOMMU (e.g., Intel VT-d [89]) or (ii) by means of SLAT. The former strategy ensures that sensitive memory in one of the xMP domains will not be mapped by the translation tables of the IOMMU; sensitive data structures become inaccessible to devices. In the latter approach, without IOMMU, the guest is likely to use bounce buffers (e.g., in combination with Virtio [90]) or directly access the devices. In both cases, a corrupted device or driver would access guest-virtual addresses, which are regulated by Xen’s
Discussion
A. Limitations
The Linux callback-free RCU feature [91] relocates the processing of RCU callbacks out of the
Currently, we manually instruct the kernel when to enter a specific xMP domain. Instead, we could automate this step by instructing the compiler to bind annotated data structures to xMP domains. In addition, the compiler could instrument kernel code with calls that enter/leave the xMP domain immediately before/after accessing the annotated data structure.
Also, we do not support nested xMP domains. In fact, we prohibit entering domains, without first closing the active domain; by nesting xMP domains, the state of the opened domain will be overwritten. To address this, the kernel needs to securely keep track of the previously opened xMP domains by maintaining a stack of xMP domain states per thread. Note that this relates to adding xMP support in IRQ contexts.
B. Intel Sub-Page Write Permission
Intel announced the Sub-Page Write-Permission (SPP) feature for EPTs [35] to enforce memory write protection on sub-page granularity. Specifically, with SPP, Intel extends the EPT with an additional set of SPP tables that determine whether a 128-byte sub-page can be accessed. Selected 4KB guest page frames with restricted write permissions in the EPT can be configured to subsequently walk the SPP table to determine whether or not the accessed 128-byte block can be written.
Once this feature is implemented in hardware, it will enrich xMP in terms of performance and granularity. Let us consider the use case of protecting process credentials. Once initialized, the credentials themselves become immutable. However, meta information, such as reference counters, must be updated throughout the lifetime of the
C. Execute-Only Memory
A corollary of the lack of non-readable memory (§ II-A) is that the x86 MMU does not support execute-only memory— code pages have to be readable as well. This has allowed adversaries to mount Just-In-Time ROP (JIT-ROP) attacks [73], which can bypass code randomization defenses. By reading code pages, an attacker can harvest ROP gadgets and construct a suitable payload on the fly. A defense against JIT-ROP attacks is thus to enforce execute-only memory to prevent the gadget harvesting phase [9], [31], [32], [92]. By defining execute-only xMP domains for code pages, xMP can offer similar protection.
D. Alternative Hypervisors and Architectures
Xen is by no means the only system on which xMP can be integrated. Other hypervisors that implement (or can be extended with [93]) similar functionality to Xen’s
Related Work
While the possibility of non-control data (or data-oriented) attacks has been identified before [1], Chen et al. [2] were the first to demonstrate the viability of data-oriented attacks in real-world scenarios, ultimately rendering them as realistic threats. With FLOWSTITCH [94], Hu et al. introduced a tool that is capable of chaining, or rather stitching together, different data-flows to generate data-oriented attacks on Linux and Windows binaries, despite fine-grained CFI, DEP, and, in some cases, ASLR, in place. Hu et al. [5] further show that data-oriented attacks are in fact Turing-complete. They introduce Data-Oriented Programming (DOP), a technique for systematically generating data-oriented exploits for arbitrary x86-based programs. Similarly, Carlini et al. [4] achieve Turing-complete computation by using a technique they refer to as Control Flow Bending (CFB). In contrast to DOP, CFB is a hybrid approach that relies on the the modification of code pointers. Still, CFB bypasses common CFI mechanisms, by limiting code pointer modifications in a way that the modified control-flows comply with CFI policies. Ispoglou et al. [7] extend the concept of DOP by introducing a new technique they coin as Block-Oriented Programming (BOP). Their framework automatically locates dispatching basic blocks, in binaries that facilitate the chaining of block-oriented gadgets, which are then chained together to mount a successful attack.
On the other hand, researchers have started to respond to data-oriented attacks. For instance, DataShield [95] associates annotated data types with security sensitive information. Based on these annotations, DataShield partitions the application’s memory into two disjoint regions, and inserts bounds checks that prevent illegal data flows between the sensitive and non-sensitive memory regions. Similar to our work, solutions based on virtualization maintain sensitive information in disjoint memory views [26], [27], [96]. While MemSentry [27] isolates sensitive data, SeCage [26] additionally identifies and places sensitive code into a secret compartment. Both frameworks leverage Intel’s EPTP switching to switch between the secure compartment and the remaining application code. Yet, in contrast to our work, MemSentry and SeCage are limited to user space. Also, SeCage adds complexity by duplicating and modifying code that would normally be shared (e.g., libraries) between the secret and non-secret compartments.
EPTI [96] implements an alternative to KPTI using memory isolation techniques similar to xMP. PrivWatcher [28] leverages virtualization to ensure the integrity of process credentials. Contrary to our solution, PrivWatcher creates shadow copies of
Conclusion
In this paper we propose novel defenses against data-oriented attacks. Our system, called xMP, leverages Intel’s virtualization extensions to set the ground for selective memory isolation primitives, which facilitate the protection of sensitive data structures in both kernel and user space. We further equip pointers to data in isolated memory with authentication codes to thwart illegal pointer redirections. We demonstrate the effectiveness of our scheme by protecting the page tables and process credentials in the Linux kernel, as well as sensitive data in various user applications. We believe that our results demonstrate that xMP is a powerful and practical solution against data-oriented attacks.
ACKNOWLEDGMENT
We thank Christopher Roemheld and Joseph Macaluso for helping us with the Linux kernel extensions and the use cases regarding user applications, respectively. Further, we thank our shepherd, Yuval Yarom, and the anonymous reviewers for their valuable feedback. This work was supported in part by the European Union’s Horizon 2020 research and innovation programme, under grant agreement No 830892 (SPARTA), the Office of Naval Research (ONR), through awards N00014-17-1-2891 and N00014-17-1-2788, and the National Science Foundation (NSF), through award CNS-1749895. Any opinions, findings, conclusions, or recommendations expressed herein are those of the authors and do not necessarily reflect the views of the aforementioned supporters.