Introduction
With the increasing popularity of IoT devices and the increasing number of security and privacy risks associated with these devices [1], IoT device security has drawn significant attention from both academia and industry. These IoT devices have limited storage, memory and processing capabilities but are used in critical infrastructure such as e-health, smart energy and industry 4.0, and make up the biggest share in the envisioned billions of IoT devices. As a result, ARM has released TrustZone into Cortex-M processors (Cortex-M23 and Cortex-M33) – TrustZone-M [2], which brings Trusted Execution Environment (TEE) also to battery-powered resource-constrained IoT devices.
TrustZone technology is a security extension incorporated into ARM processors. It provides a mechanism to partition system on chip (SoC) resources (e.g., memory, peripherals, etc.) in two worlds (or zones): the Secure world and the Normal world. The Secure world has a small code-base and runs only security-critical operations, whereas the Normal world runs all other apps. Keeping the Secure world’s source code minimal is important for having a trusted and formally verified TEE code-base [3]. It is impossible to guarantee a bug-free software stack within an IoT device running multiple apps, third-party libraries, drivers, and protocols. Therefore, a TEE becomes inevitable for security-critical operations such as secure boot, crypto operations, software/firmware update, remote attestation, and handling authentication peripherals (i.e., fingerprint reader, biometric scanner, etc.).
Challenge. Although TrustZone-M provides a hardware-based TEE, which effectively isolates security-critical operations from untrusted software components, it lacks mechanisms for secure cross-world communication, i.e., between the TEE and the untrusted environment in an IoT device. TrustZone-M allows direct function calls between the two worlds, i.e., an app in the Normal world can directly call a secure function in the Secure world using an API call; however, TrustZone does not provide authentication mechanism for the incoming call from a Normal world app. This allows an attacker to run malicious code to falsify calls and repeatedly pass maliciously-crafted messages to find vulnerabilities in secure software. This weakness was exploited in certain Motorola phones with a TrustZone-based TEE.1 Moreover, TrustZone-M does not provide a mechanism to protect messages transferred between the two worlds. Most TrustZone-based TEEs use a shared memory area allocated in the Normal world memory region, as a channel for transmission of messages. Unfortunately, this channel is vulnerable to attacks if the Normal world is compromised (Fig. 1). As a result, attackers could intercept and manipulate messages transferred through this channel. Hence, malicious apps with privileged access in the Normal world can compromise the confidentiality, integrity and authentication of communication between the two TrustZone worlds. High-end mobile devices with TrustZone-A support may use traditional crypto-based communication approaches like using shared session key based encryption and MAC operations [4]. However, such solutions are too expensive, in terms of computation and power consumption, when used in resource-constrained IoT devices that are expected to run on batteries for months or years.
Overview of ShieLD Architecture that introduces cross-world/cross-zone secure communication in a constrained IoT device for ARM TrustZone-M based TEE. While the apps in the Secure world (TEE) are safe, Normal world OS can have vulnerabilities allowing malicious apps to compromise the communication between a legitimate app (IoT App) in Normal world and its corresponding secure app in the Secure world. ShieLD’s Secure Monitor uses MPU_Vault for shielded cross-world secure communication.
ShieLD. In this paper, we present the design, implementation and evaluation of ShieLD, a framework for securing cross-zone/cross-world communication in IoT devices that support TrustZone-M. ShieLD provides the ability for legitimate apps to use a protected shared memory zone, we call it MPU_Vault, as a medium to establish a secure channel for exchanging messages between a Normal world app and the secure services in the Secure world, without using cryptographic operations to protect message confidentiality. ShieLD protects the MPU_Vault using the Memory Protection Unit (MPU), which limits the access permissions of a Normal world memory region in terms of accessibility by a certain CPU execution context. We leverage the MPU in a novel way to establish a secure communication channel between the two worlds, which to the best of our knowledge is the first attempt in not just solving the secure communication problem for this class of devices but also in using the MPU for this purpose. To establish ShieLD communication framework; challenges like task legitimacy, task interruption and termination, task code integrity, and privileged software integrity were also identified and resolved.
Usecase Scenario. Constrained IoT devices that have security critical sensors involving biometric data, such as a fingerprint sensor can utilize TrustZone-M for trusted execution. The firmware of such sensors is placed in the Secure world to protect its functionality. On the other hand, IoT apps are added by developers in the Normal world which use the sensors by calling secure functions in the Secure world (as depicted in Fig. 1). In this scenario, the security guarantees of TrustZone could be compromised due to vulnerable cross-world communication channel. An attacker could intercept and manipulate messages transferred between worlds to analyze the behavior of secure firmware,2 like was possible in the case of a TrustZone-based digital rights management solution developed by Discretix [5]. ShieLD avoids such an attack by enabling an MPU_Vault that is only accessible by the legitimate app in the Normal world. A malicious app in the Normal world can abuse the communication channel to arbitrarily send a crafted message to the secure service in the Secure world to launch different attacks, as in the recent case of Boomerang attack [6]. In order to avoid this attack, ShieLD authenticates the requesting app thereby preventing malicious apps to make a call with malicious input data.
Contributions. The main contributions of this paper are:
Secure communication channel design: We propose ShieLD, a framework to build secure communication between the two worlds of TrustZone available on the resource-constrained IoT devices. We present a lightweight message protection scheme using the MPU, while providing similar security assurances as provided by typical crypto operations.
Implementation and Evaluation: We demonstrate the ShieLD practicality via proof–of–concept implementation and evaluate its performance using the TrustZone-M (on Musca–A2 Test Chip Board). In our experimental evaluation, ShieLD achieved nearly 5x lower execution time as compared to message encryption approach which provides similar security guarantees.
The rest of the paper is structured as follows. Related work is discussed in Section 2. In Section 3, we discuss in detail and link the technologies and building blocks needed to understand ShieLD. Section 4 describes the threat model. Section 5 explains a usecase of ShieLD followed by the ShieLD design in Section 6. We discuss the design and architecture of related work SeCReT in comparison to ShieLD in Section 7. The implementation of a proof-of-concept of ShieLD is explained in Section 8. Section 9 follows with performance evaluation and results. We bring an important discussion about scalability of ShieLD in Section 10. The security analysis of the proposed design is discussed in Section 11 and limitations in Section 12. We conclude the paper with Section 13.
Related Work
We categorize the related work into three major areas: TEE, cross-world communication in TEEs and MPU-based isolation techniques.
2.1 Trusted Execution Environment (TEE)
A TEE ensures that sensitive code and data are stored, processed and protected in an isolated, trusted environment. Researchers have provided TEEs based on virtualization, e.g., Terra [7], that rely on the hypervisor to create and manage TEEs without any modification to existing hardware. Others have leveraged the Trusted Computing Group’s Trusted Platform Module (TPM) to provide higher assurance, e.g., [8], [9]. More recently hardware extensions are also proposed for the creation of TEEs. For example, Intel’s Software Guard Extensions (SGX) [10], [11] is an architecture designed to support enclaves that are isolated execution environments for code and data within an application space. ARM TrustZone [12] is a widely adopted TEE in embedded systems and mobile devices. vTZ [13] is designed aiming at virtualizing TrustZone in the ARM architecture. Iso-X [14] offers a higher memory allocation flexibility than SGX; however, it is not supported by current processors. Keystone [15] is another newly emerging open-source TEE framework, providing customizable software enclaves for RISC-V. These TEEs are targeting high-end devices and are not suitable for low-power IoT devices.
Since the introduction of TrustZone-M[2] in Cortex-M family, resource-constrained devices have started implementing systems and solutions utilizing TrustZone-M [16], [17]. Securing the process of firmware update is a crucial phase in IoT life-cycle. ASSURED [17] extends TUF [18], which is an update framework resilient to key compromise. SAFES [16] is an architecture for self-measuring code integrity in sand-boxed environment at each I/O event (running on embedded devices) implemented on a Cortex-M33 processor with TrustZone enabled. [19] is a Trusted-M-assisted virtualization architecture featuring the ARM Musca-A platform. CoreLockr-TZ [20] provides a platform for IoT devices that allows apps running in the Normal world to access secure services by making API calls. Such recent research shows the usefulness of TrustZone-M for constrained devices.
2.2 Cross-World Communication
The security of cross-world communication channel is of ultimate importance in ARM Cortex-A, Intel and other architectures as well. Several solutions attempt to provide a secure communication infrastructure between partitions. LTZVisor [21] and TZ-VirtIO [22] are systems based on ARM Cortex-A utilizing TrustZone to setup hardware-assisted virtualization of the system into secure and non-secure partitions. VM communication is established using VirtIO [23] and takes place such that every exchange of data between partitions is carried out through a shared memory (non-secure) which is overseen by the trusted hypervisor. InkTag [24] provides a virtualization architecture supporting Intel’s VMX hardware virtualization support. It allows the applications to define access policies for its own secure files, so in the event of corruption of privileged Operating System, the files remain inaccessible. The communication between untrusted OS and trusted applications is maintained by InkTag using HAP page tables [24]. Virtual Ghost [25] uses compiler instrumentation like sandboxing to protect the code and metadata of a process from the operating system. SeCReT [4] is a recent and relevant approach aiming to solve the cross-world communication issue for Cortex-A family of microprocessors. It uses session keys protected by TrustZone to protect code and data of the communicating processes. TrustZone has been very recently introduced in the Cortex-M family and to the best of our knowledge, no work is done on secure cross-world communication in TrustZone-M. The above mentioned solutions for cross-world communication are vulnerable to man-in-the-middle attacks, malicious RTOS and hardware tampering [21], [22], [24], [25]. ShieLD eliminates the risk of former two security threats by ensuring the security of cross-world communication for data exchange between processes.
2.3 MPU-Based Isolation Techniques
The ARM Memory Protection Unit (MPU) is an optional programmable unit preferably used as a memory isolation mechanism for lightweight architectures that do not require complex memory management (like low-end devices based on Cortex M3/M4/M23/M33) [26]. Several architectures utilize the MPU to provide thread isolation per task, allocate on-demand stack, establish a virtualization layer to protect critical software components. uVisor [27] utilizes the MPU to isolate groups of tasks and threads and allows them to access resources based on pre-defined access permissions. Tock OS [28] is another open source OS targeting Cortex M3/M4 processor architectures and utilizing the MPU to compartmentalize the system into three security levels. A virtualization layer setup between hardware components and application software using the MPU to prevent misuse of hardware components by third-party applications is presented in [29]. M2MON utilizes the MPU on Unmanned Vehicle (UV) peripherals to defend against several attacks by monitoring the I/O activity [30]. ACES is another proposal that uses the MPU to create isolation and compartmentalization of applications using developer-specified policies in bare-metal embedded systems. It uses an LLVM-based compiler to automatically create an instrumented binary inferred from the developer policies [31]. A combined utilization of off-line static analysis on the firmware to identify process memory; and the MPU to isolate the processes dynamically is demonstrated on real-time microcontrollers in [32].
Technologies/Building Blocks
This section provides background information about the technologies underlying the design of ShieLD: Cortex-M architecture, ARM TrustZone and Memory Protection Unit (MPU).
3.1 Cortex-A and Cortex-M Architecture
The ARM Cortex-A is a series of application processors providing solutions for devices that require a rich processing environment (with commodity OS). They support A64 and A32 instruction sets. In order to undertake complex compute tasks, supporting multiple software applications and modes of operation, it supports virtual memory system architecture with a separation between OS space and space for application programs. On the other hand, the Cortex-M is a series of microcontroller processors that are programmed either bare metal (without libraries) or linked with some libraries that could provide OS-like features. Cortex-M based devices support the T32 instruction set, which is a subset of the A32 instruction set. The processor does not offer complex memory management (no MMUs), cache and often no FPUs either. Moreover, the A and M profile architectures also differ in the way TrustZone security extensions operate on them. One of the differences is the switching mechanism between secure and non-secure state. In TrustZone-A, the transition takes place with an SMC instruction which is implemented in software. In TrustZone-M the transition is implemented in hardware with SG instruction and is hence faster. The security state of a process in TrustZone-A relies on the value of the NS bit, whereas in TrustZone-M, the security state is determined by whether the code being executed resides in the secure memory or the non-secure memory as per the memory map.
3.2 ARM TrustZone-M
ARM TrustZone is a set of hardware security extensions incorporated into recent ARM processors (such as Cortex A8, Cortex A9, Cortex M33, Cortex M23) [2], [35]. TrustZone allows to compartmentalize the device hardware (Flash, SRAM, peripherals) and software resources in two security states or worlds: the Secure or the trusted world (also called TEE) that has minimal code-base and runs only security-critical operations, and the Normal world that has a rich code-base (therefore is vulnerable and unsecure) and runs most user-level applications. Malicious code in the Normal world cannot affect the integrity and confidentiality of the code and data running in the TEE.
TrustZone-M provides similar hardware-based isolation guarantees as conventional TrustZone for Cortex-A family (TrustZone-A) but, unlike TrustZone-A, the transition (or the context switch) between the two worlds depends on the setup of memory map, without the need to enter the secure monitor mode. This design feature makes TrustZone-M more energy-efficient, hence suitable for low-powered IoT devices. Saving and restoring the system context before and after the transition can be handled by the Secure world [36]. In TrustZone-M enabled MCUs, a system designer can define a particular device memory region either as secure or non-secure by configuring the Security Attribution Unit (SAU). An SAU is programmable in the Secure world using memory-mapped SAU registers. A system designer can also use the Implementation Defined Attribution Unit (IDAU) to define a fixed memory map [2]. When the processor is executing code in the secure memory region, it is in the Secure world; otherwise it is in the Normal world. The system designer can further divide the secure memory region into two sections, i.e., secure and non-secure callable (NSC). The secure section contains trusted code and data (including secure stack and heap, and any other secure data) and the non-secure callable (NSC) region contains entry functions (i.e., branch instructions to the actual secure code in the secure memory). The Secure world can access non-secure memory region, but the reverse is not possible and is protected by the TrustZone.
3.2.1 Calling Secure Function
As depicted in Fig. 2a, a non-secure code can interact with a secure function in the Secure world using a direct function call through entry points/APIs in the NSC. The first instruction of the entry point in NSC should be a secure gateway (SG) instruction, which is a CPU instruction introduced to prevent non-secure code from branching into an invalid entry point in secure code. The call to a secure function completes by returning to non-secure code using a new CPU instruction, known as Branch with exchange to Non-secure state (BXNS). The secure function returns to non-secure code by performing
Secure and Normal world switch when (a) a non-secure code calls secure function via a secure function entry in NSC memory region; (b) a secure code calls a non-secure function.
3.2.2 Calling Non-Secure Function
The secure code can make a call to non-secure function directly using a CPU instruction, known as Branch with link and exchange to Non-secure state (BLXNS) (as shown in Fig. 2b). The execution of
3.3 MPU-Based Memory Protection
All Cortex-M processors except Cortex-M0 have an MPU, which is a programmable block inside the processor that can be used to restrict access to a memory region by dividing the entire memory space (including Flash, SRAM) into a number of MPU regions and assigning access permissions to each region. The MPU can be configured to support 8 or 16 regions by privileged software using a series of 32-bit memory mapped registers [26]. For example, the Region Attribute and Size Register (
Threat Model and Goals
Our primary assumption is that an attacker with privileged access can compromise the software (both applications and the operating system) running in the Normal world to mount various attacks on secure software running in the Secure world. The attacker could modify the legitimate applications’ code to repeatedly call secure functions in the Secure world with maliciously crafted parameters. We also assume the attacker could intercept and analyze messages exchanged between the two worlds of TrustZone-M via a shared memory. The goal of the attacker, for example, could be analyzing the behaviour of secure software for bug hunting. We also assume an attacker with privileged access could modify the configuration of the Normal world MPU to change the access permissions of Normal world memory regions protected by its MPU. However, we assume that the attacker cannot directly access the Secure world memory region because of hardware-enforced memory protection mechanisms such as SAU. At hardware level, we assume ShieLD runs on resource-constrained IoT devices that support ARM TrustZone-M hardware protection, which is implemented correctly and is not compromised.
Based on the above threat model and assumptions, we design ShieLD with the following security goals: (i) no software component running in the Normal world besides legitimate ones can access critical resources in the Secure world; (ii) during cross-world communication, only legitimate software component can access MPU_Vault; and (iii) when a legitimate Normal world application is successfully completed or unexpectedly interrupted, the MPU_Vault being used is protected against third-party accesses.
Smart Door Lock - A ShieLD Usecase
In this section, we present a real-world scenario where the need for our proposed mechanism would arise. A Smart Door Lock with an embedded fingerprint sensor has a critical function of biometric authentication which is considered ”very effective” by 92 percent of enterprises according to a recent Ping Identity survey [39]. However, there remains a high risk of unrecoverable loss if biometric data is compromised; because, while a leaked password can be changed, biometric data is immutable and thus cannot be changed.
A TrustZone-enabled Smart Door Lock, considered as a usecase example (Fig. 4), can protect users’ biometric data by 1) mounting the fingerprint sensor (peripheral) to Secure world, and 2) by keeping its firmware in the Secure world memory region to prevent any alterations in its behavior. The fingerprint scanning and authentication services provided by the fingerprint scanner can then be securely used by the applications hosted in the Normal world. The Smart Door Lock App placed in the Normal world communicates with the secure services dealing with secure peripherals (i.e., fingerprint sensor). The lifecycle of the application begins with (i) Fingerprint Sensor receiving a fingerprint, (ii) based on this event, the Smart Door Lock App requests the Verify Fingerprint secure service, (iii) the Verify Fingerprint service verifies the fingerprint received by the Fingerprint Sensor, (iv) based on the verification result, an appropriate response is sent either allowing access or denying it. (v) The last step of the application lifecycle is deletion of task context from the Task Control Block.
A TrustZone enabled Smart Door Lock is considered as a usecase example. 1) The fingerprint sensor (peripheral) mounted to Secure world detects a fingerprint input signal which is handled by its firmware in the Secure world memory region, 2) a Smart Door Lock App hosted in the Normal world requests the fingerprint verification secure service, 3) the service invokes the fingerprint driver for the fingerprint that was received, 4) on receiving the fingerprint, the Verify Fingerprint secure service verifies it, 5) a response is returned to the Door Lock App based on whether the fingerprint was verified.
There are two main security threats that we consider in this paper. First, the applications in the Normal world can reverse engineer the behavior of the Secure world application by exploiting the possibility to call any Secure world function without authentication [4]. Second, the large code-base of the Normal world exposes it to the vulnerabilities which could be exploited by the attackers to intercept the communication between a Normal world application and Secure world code (e.g., firmware of fingerprint sensor).
While the Smart Door Lock with embedded fingerprint reader is presented as an example, the threat scenario focused in this paper applies to all usecases where TrustZone is used to ensure the security of critical functions, such as, e-health devices, industrial sensors, critical infrastructures.
ShieLD Architecture and Design
Keeping the above mentioned scenario as motivation, we present the ShieLD architecture and the details of all the ShieLD operations for enabling and protecting MPU_Vault in this section. Fig. 5 depicts the high level architecture of ShieLD, highlighting its components and basic interactions. The shaded boxes represent ShieLD’s implemented components, while the pattern-filled box represents an existing system component which required modifications to fulfill ShieLD requirements. The Normal world runs the applications usually referred to as tasks in embedded system, and a modified version of an interrupt handler on top of a Real-Time Operating System (RTOS) kernel. The Secure world runs the ShieLD components: secure monitor, app verifier and access control that are designed to implement security mechanisms. A small underlying secure kernel provides basic OS functions for software running in that world (e.g., process management, file access, and memory management).
High-level architecture of ShieLD. The shaded boxes are the new components that we propose and implement for ShieLD, while the pattern-filled box is an existing system component that we extend for ShieLD.
The secure monitor implements the core functionalities of ShieLD. It is comprised of three main modules: interface, app verifier, and access control. The interface module acts as a gateway between the Normal world and other secure monitor’s modules. It receives and handles all API function calls to the secure monitor modules from the Normal world components via ShieLD _veneers. The app verifier is responsible for authenticating a task accessing Secure world resources. The access control module manages and protects access to MPU_Vault for secure cross-world message transmission. We discuss each component in detail and the security mechanisms provided by ShieLD to enable secure communication between the worlds of TrustZone-M in the next sections. We begin with elaborating the steps of cross-zone communication process, then discuss the protection of the MPU_Vault and finally go into details of how we maintain the integrity of ShieLD components in the system.
6.1 Cross-Zone Communication Process
ShieLD uses MPU_Vault, as a secure medium for the transmission of messages in cross-world communication. In this section, we discuss how this MPU_Vault is setup/allocated in a three phased process: allocation request, task authentication, and MPU_Vault setup.
6.1.1 Allocation Request Phase
The MPU_Vault setup is initiated by the tasks in the Normal world. A task that needs access to a service in the Secure world, first allocates an optimal block of memory area in the Normal world, and then sends a request to the secure monitor for its MPU_Vault setup. This is done using the ShieLD _veneer call shown in the following listing; the arguments for this call are passed to the secure monitor using general purpose registers (
The
Listing 1. MPU_Vault Allocation Request
6.1.2 Task Authentication Phase
The second phase is authentication of the requesting task to verify if the task is permitted to access Secure world resources. When the interface module receives a call/request for MPU_Vault allocation from a task in the Normal world, it invokes app verifier module to verify the legitimacy of the requesting task. To this end, an access control list containing tasks and the secure services they are allowed to access is required.
We assume that ShieLD is provided by an IoT device vendor as a part of secure firmware. Moreover, we assume that the device vendor also provides a certificate called hash list (HL in Fig. 5), that describes a list of tasks that are granted access to Secure world services. These assumptions are based on the fact that low-power IoT devices are designed for special applications (such as industrial control systems and automotive systems) and do not run many software modules provided by different vendors. Therefore, legitimate tasks with their access to secure services can be predefined during device manufacturing. HL contains a list of hashes/digests of the tasks and unique identifiers (ID) of corresponding secure services to which tasks are granted access. The hash list can be updated during firmware updates if new applications are added or existing applications are removed. Secure update mechanisms like ASSURED [17] are recently proposed for constrained platforms and further studies show that it is possible to create secure and standard-compliant update mechanisms [40] even for this class of devices. HL is stored in the Secure world, isolated from the Normal world. Furthermore, in case of a software update, we assume that a new HL, with updated hashes, will be provisioned in the secure update process along with the new images of the updated components.
When the app requests creation of the MPU_Vault, the app verifier module calculates the hash of the task code using a collision-resistant hash function and compares it against hash values present in the HL. To calculate a hash value of a running task, the start and end address of the task code are required. The app verifier can retrieve those addresses from (
6.1.3 MPU_Vault Setup Phase
The third and final phase to enable secure cross-world communication is to setup MPU_Vault. After authenticating the requesting task, access control module configures an MPU-protected memory region associated with the requested buffer address and size, and sets the access permission of that region to Read/Write (RW) by configuring MPU_NS registers.
In Fig. 6, we illustrate an example how system memory regions could be configured using MPU to protect MPU_Vault. The figure shows two tasks (Task1 and Task2) running on top of an RTOS kernel. The system memory is organized into four MPU regions with associated access permissions (as discussed in Table 1). An MPU-protected memory region (depicted as MPU_Vault in Fig. 6) is exclusively accessible to Task1. Considering an example scenario, when Task1 is running, the MPU configuration would trigger a fault if Task2 or RTOS kernel tries to access the MPU_Vault.
Example MPU-based memory protection, defining various memory regions and access permissions for two tasks Task1 (with MPU_Vault) and Task2 as well as an RTOS kernel.
Once the MPU region for MPU_Vault is configured, the access control module saves the address and
Finally, the access control module transfers the control flow to the requesting task by executing
MPU_Vault setup: (1) the Task in the Normal world allocates a shared MPU_Vault. (2) the Task sends a request for MPU_Vault setup to the Interface module via the ShieLD _veneers in the NSC memory region. (3) the App Verifier validates the Task if it is legitimate to get access to secure services. (4a) the Interface module returns an error message if the Task is not legitimate. (4b) the App Verifier module creates tskContext if the Task is legitimate. (5) the Access Control module sets up an MPU region for MPU_Vault by configuring MPU_NS. (6) the Access Control module adds the MPU_Vault address and sets the
6.2 MPU_Vault Protection Mechanisms
In this section we discuss the protection of the MPU_Vault during software and hardware interrupts, execution context switch and on task termination. An MPU_Vault is associated to a task existing in the system and should be accessible only when its associated task is executing. However, there are several circumstances when the task gets suspended or terminated. As a result, an attacker with privileged access might compromise the suspended or terminated task’s MPU_Vault if it is still accessible. To avoid such an attack, ShieLD provides the following mechanisms.
6.2.1 Interrupt Control
Tasks are frequently interrupted by software and hardware interrupts (e.g., to react to events like receiving input from sensors). Whenever an interrupt occurs, the processor stops the current task and starts the execution of a predefined routine, known as Interrupt Handler or Interrupt Service Routine (ISR), which handles the interrupt. When the ISR is complete, CPU continues to execute the interrupted task from the point where it was interrupted. This interrupt handling process allows an attacker compromising the ISR or RTOS kernel to access contents of the task’s MPU_Vault.
Most RTOSs implement specific entry code at the starting point of ISR and exit code at the end of ISR that are responsible for saving and restoring the task context respectively. To protect MPU_Vault, we modify the ISR entry and exit code with our trampoline code to redirect the control flow to secure monitor. When an interrupt is received, the entry trampoline transfers the control flow to secure monitor prior to execution of the ISR. The access control module then searches for the task in
When the execution of ISR is complete, access permission of the MPU_Vault has to be set back to read/write before the control is returned to the interrupted task. In order to do so, the return trampoline code that is inserted in the ISR exit code redirects the control flow to secure monitor before the task execution is resumed. The access control module again looks up the task in the
Algorithm 1. ShieLD’s MPU_Vault Lifecycle
Input: ShieLD_request
Output:
Save
Switch to ShieLD using
if MPU_NS exists then
if ShieLD_request == setMPUVault then
if
Set bit 0 of
Set bits [7:0] of
Set bits [2:1] of
Set bit 0 of
Update
Set
else
end
else if ShieLD_request == protectMPUVault then
if Is
Set bit 0 of
Set bits [7:0] of
Set bits [2:1] of
Set bit 0 of
else
end
else if ShieLD_request == unprotectMPUVault then
Set bit 0 of
Set bits [7:0] of
Set bits [2:1] of
Set bit 0 of
else if ShieLD_request == deleteMPUVault then
Set bit 0 of
Set bits [7:0] of
Set bits [2:1] of
Set bit 0 of
Delete
Clear
else
else
end
Return to task by executing
Restore
return
6.2.2 Control-Flow Integrity
When a task is interrupted, the CPU saves the current execution context state (such as Stack Pointer (
6.2.3 MPU_Vault Release
When the task accessing an MPU_Vault is terminated, we assume all of resources including MPU_Vault will be deleted by RTOS kernel. If the task being terminated has an associated MPU_Vault, ShieLD flushes contents of MPU_Vault and removes the task context information from
6.3 Integrity of ShieLD
In this section of ShieLD design, we discuss the protection of ShieLD components and how their integrity is maintained. Components of ShieLD (MPU_NS and trampolines code) residing in the Normal world may be compromised, as the software components that run in the Normal world are untrusted and vulnerable to various attacks. Therefore, it is important to protect the integrity of these components from unauthorized modification.
6.3.1 Trampolines Integrity
Our trampoline code residing in the interrupt handler and task termination routines are part of the RTOS kernel code and vulnerable to attackers. An attacker might tamper with our trampoline code to block MPU_Vault protection mechanisms entirely. Therefore, ShieLD protects the integrity of kernel code which is part of the static region of RTOS kernel. In TrustZone-enabled systems, when the device boots up, we assume that Secure world is booted first, which later transfers control to the Normal world. Before passing the control to Normal world, ShieLD verifies the integrity of kernel and enables MPU protections for its static region. The kernel code with read-only access permissions ensures that ShieLD components in the RTOS kernel are not compromised.
6.3.2 MPU Protection
As mentioned above in Section 3.3, MPU is programmable by privileged software, typically by the RTOS kernel to define access permissions and attributes for memory regions. This allows modification of the MPU_NS configuration by an attacker in the Normal world with privilege access. So the attacker may modify the MPU_NS configuration which is used to protect the MPU_Vault and manipulate the messages transferred via the MPU_Vault. To avoid this, ShieLD is able to setup the MPU_NS to be only writable/programmable from within the Secure world. In ARMv8-M architecture, access to peripherals (such as MPU) is implemented in the form of read/write access to memory address space using Memory–Mapped I/O (MMIO) method, i.e., the registers of MPU can be mapped to a memory region. Therefore, access to MPU could be protected by the MPU itself in the same way as any other memory access. ShieLD maps the MMIO address space of MPU_NS into an MPU region with read-only access permission, thereby preventing modification of the MPU_NS by the RTOS or other privileged software running in the Normal world.
ShieLD versus SeCReT
As mentioned earlier, mobile and conventional IoT devices supporting TrustZone-A are also prone to a vulnerable cross-world communication channel since TrustZone does not guarantee message integrity and source authentication. SeCReT is a sophisticated solution to protect the communication between the Rich Execution Environment (REE) and TEE. It proposes message encryption for communication between the domains; REE and TEE. SeCReT’s design is based on ARMv7-A which specifies the 32 bit ARM architecture. SeCReT proposes using a session key when the Client Application (CA) and Trusted Application (TA) communicate. Protecting the session key requires involvement of SeCReT at different levels in the system like device boot, mode switch, interrupts, page table update and the selected crypto libraries.
SeCReT running in the monitor mode with the highest privilege in the system is responsible for creating the session key, maintaining the list of pre-authorized CAs allowed to invoke the TAs to use the key and storing all information relevant to session key management. SeCReT introduces additional steps in the device’s secure boot sequence to calculate the hashes of the authorized CA static region and stores them in the TEE. SeCReT also inserts trampoline code (i) at the beginning of user mode exception handlers to invoke SeCReT between the user and the kernel mode switches to protect the integrity of the session key and (ii) into kernel code that handle process creation and termination. The trampoline invokes the SMC instruction with arguments that present information necessary to maintain the session key, such as the process descriptor address and kernel stack address. The trampolines are inserted in the kernel static region protected by the kernel integrity monitor. The key value always resides in a designated memory page in the memory. The permission of the memory page for provisioning the key is configured as no-access. SeCReT saves the translation table base register (TTBR) in a structure, as an identifier to lookup the keys against the CA, the hashes of the CA static region, and the value of the session key. SeCReT configures the access permission of the memory page using the Domain Access Control Register (DACR). The key only becomes accessible when the integrity of CA is verified. The integrity monitor also restricts memory page table updates in the REE which protects the key. The usage of the session key is also defined by the libraries used and simple operations could create copies of the key resulting in key compromise. SeCReT proposes instrumentation of the crypto library used for encryption to prevent the keys from being copied out of the protected memory. This prevents the stack from being used to cache a part of the key during use.
SeCReT was designed on ARMv7-A that supports the A profile instruction set which is different from the ARMv8-M profile on which we build the design of ShieLD. Most of SeCReT’s design choices are targeted at the protection of the session key in a multi-threaded environment where processes run with multiple contexts. Moreover, in order to protect the session key, registers available for the Virtual Memory System Architecture (VMSA) like the DACR are used. The ARMv8-M memory management is relatively simpler as there is no virtual memory, address translation, memory page management which are all part of SeCReT’s design considerations. Implementing SeCReT on ARMv8-M will require engineering effort in the following aspects: (i) exploring key management and secure key storage mechanisms for devices based on ARMv8-M, (ii) utilizing the MPU and SAU rather than DACR to manage memory accesses, (iii) designing/instrumenting crypto libraries to prevent duplication of keys to give the same level of guarantees provided by SeCReT, (iv) identifying relevant locations for adding trampoline code in the interrupt vector tables, and (v) optimizing the design to compensate the massive overhead induced by key provisioning and message encryption on M-profile architecture.
Implementation
In this section, we discuss the implementation of a prototype of ShieLD framework on a real hardware board in order to do a thorough experimental evaluation of the framework. ShieLD design was based on ARMv8-M, which is a 32-bit ARM architecture for Cortex-M processors.
8.1 Runtime Environment
The implementation of ShieLD prototype builds on TrustedFirmware-M (TF-M) [43] in the secure side with CMSIS RTOS2 as Normal world OS. Approximately 156 Lines of Code (LOC) were written to modify the Normal world OS and around 203 LOC were added to the TF-M code. TF-M provides a reference implementation of Secure world software for ARMv8-M [44]. It creates the foundations of TEE by providing a set of secure run-time services such as secure storage, cryptography, attestation etc. Additionally, secure boot in TF-M ensures integrity of run-time software and supports firmware upgrade.
8.2 ShieLD Components
As shown in Fig. 5, ShieLD has several components, both in the Normal and Secure worlds. In this section, we describe the implementation of these components. We implemented the secure monitor as a runtime secure service on top of TF-M in the Secure world.
8.2.1 Interface
In our prototype, we implemented ShieLD services as secure functions. ShieLD provides ShieLD _veneers (see Fig. 5) as secure function APIs which can be called by the NS tasks to use ShieLD services. The veneer gateway functions are marked with the nonsecure entry attribute (
Listing 2. Veneer Gateway Function Declaration
8.2.2 App Verifier
To implement
8.2.3 Access Control
TF-M provides a partially implemented API for the MPU. We extend this mpu_armv8m_drv API for ShieLD’s Access Control module. The mpu_armv8m_drv API accesses the Non-Secure MPU using MPU_BASE_NS which is the alias address used to configure MPU_Vault in the NS side.
Performance Evaluation
In this section, we empirically evaluate the performance of ShieLD prototype. We perform a set of microbenchmarks on our Smart Door Lock App replicating the usecase application scenario in Section 5. We choose memory occupancy of the ShieLD code, latency and CPU time as our performance metrics. Memory and power consumption are the most constrained resources in low-power IoT devices. The CPU time is directly proportional to the power consumed by the ShieLD operations. We calculate the overhead induced on the application by enabling ShieLD services and discuss the application-independent and application-dependant results.
9.1 Experimental Setup
We evaluated the performance of ShieLD on the Musca-A2 Test Chip Board by Arm (Fig. 8). The Musca-A2 board implements the ARM CoreLink SSE-200 subsystem featuring dual-core Cortex-M33 with CPU0 enabled at 50MHz [45]. We use TF-M and CMSIS RTOS v2 enabled with ShieLD components for these experiments. The execution time is measured using the CoreSight debug port found on Musca-A2 Test Chip Board (see Fig. 8). The CoreSight debug port contains a 32-bit free running counter that counts CPU clock cycles. The counter is part of the Debug Watch and Trace (DWT) module which we use to measure the execution time of our code. We repeat each calculation 10 times and our standard deviation is 2
TrustZone-M enabled IoT device, ARM Musca-A2 Test Chip Board, we use for implementation and evaluation.
9.2 Communication Overhead of ShieLD and Crypto-Based Communication
Here we compare the total time (round trip time) for a single cross-zone interaction of the (i) ShieLD-enabled communication including the setup and release phases, (ii) crypto-based communication, and (iii) the current unsecure (plaintext) communication. This evaluation is primarily done to benchmark the overheard of ShieLD on resource-constrained IoT devices.
Prior to our work, no solution exists that provide cross-zone secure communication in resource-constrained IoT. A cross-zone secure communication solution exists for powerful conventional TrustZone device, i.e., SeCReT [4] which is discussed in Section 7. Due to platform-level differences, converting the entire mechanism of SeCReT to ARMv8-M based IoT devices supporting TrustZone-M is infeasible. ARMv8-M architecture has relatively limited resources as compared to ARMv7-A, hence the entire SeCReT solution with heavy use of crypto, is not implementable in resource-constrained TrustZone-M devices. Therefore, we only provide a re-implementation of cryptographic components of SeCReT protocol for the sake of comparison of encrypting and decrypting messages on Cortex-M platform. We take the essential components of crypto-based communication mechanism like SHA-512 to ensure integrity and AES for encryption, and compare them with confidentiality and integrity protection mechanisms of ShieLD.
The total time of the ShieLD-enabled communication includes the execution time of verifying integrity of the task, enabling MPU_Vault, message transfer and release of MPU_Vault on task termination. Unlike apps in traditional systems, the task termination is a rare event since the IoT devices are usually continuously active and tasks terminate only when a device is updated or rebooted.
For crypto-based communication, we only enable the confidentiality (encryption) and integrity (MAC) services. Fig. 9 shows the comparison between these three communication modes for different message sizes. In the case of crypto-based communication, we see a visible escalation in total execution time with the increase in message size, this is due to the fact that time consumed by cryptographic operations is directly proportional to the size of the message being encrypted. In contrast, the ShieLD-protected communication has insignificant overhead with the increase of message size. It is an important feature of MPU-based protections that the execution time of setting up and protecting a region remains independent of the size of the message/region. The comparison between these two communication modes is also shown in Table 2, where overhead of individual message sizes are further highlighted. Note that the overhead of the crypto-based communication will be further escalated with authentication and key handling operations. This comparison aims to clarify that if the same mechanism used for Cortex-A based devices is replicated for IoT devices based on Cortex-M processors, the outcome would be same security guarantees as provided by ShieLD but much greater overhead due to limited system resources and key management overhead which will result in slowing down the entire system operation. In real-time OSes, slowing down system responses can have drastic effects on the safety and security of the device and the infrastructure.
Execution time (in microseconds) of ShieLD-enabled communication compared to crypto-based and plaintext cross-world communication.
9.3 ShieLD Overhead on Smart Door Lock Application
We measured the total overhead of using ShieLD protection mechanism in terms of CPU Time on a Smart Door Lock App execution. We considered the operations and control flow of the application as described in Section 5. In a system with ShieLD services enabled, the application lifecycle includes additional steps of execution and the application lifecycle is slightly different: (i) the Smart Door Lock App requests ShieLD to setup an MPU_Vault in the Normal world, (ii) ShieLD sets up an MPU_Vault to be used by the Smart Door Lock App, (iii) when the Fingerprint Sensor receives input, the Smart Door Lock App requests the Verify Fingerprint service to verify the fingerprint input, based on which a response is returned to the Smart Door Lock App using the allocated MPU_Vault, (iv) the release of the MPU_Vault takes place during task termination.
CPU Time Overhead of ShieLD. The major overhead of the proposed solution are application-independent as the overhead of the individual operations (context switch, task authentication, MPU configuration and memory alignment and allocation) remain the same irrespective of the application behaviour and execution. The evaluation results of the Smart Door Lock App are based on the interactions the application has with the secure services. The overhead on the system due to MPU_Vault setup and release is 946
Memory Overhead of ShieLD. As resource-constrained IoT devices have very limited storage, our ShieLD code is highly optimized and increases the firmware binary (TF-M) size only by 100kb, which is 0.04% of the total TF-M binary. The memory overhead of ShieLD components is independent of the applications requesting ShieLD services.
9.4 CPU Time to Setup MPU_Vault
We measured the total CPU Time to setup the MPU_Vault, which involves the following sequence of operations: (i) A task in the Normal world allocates a 32-byte aligned memory region to be configured as MPU_Vault, and makes a call to the secure monitor in Secure world (MPU_Vault Allocation Request); (ii) the task context is saved and the control flow is transferred to secure monitor and the secure context is restored; (iii) the app verifier module calculates the hash of the task’s code binary using SHA-512, gets the task information from the tskLst and compares it against hash values in HL (task authentication); (iv) the access control module configures an MPU region for MPU_Vault; and (v) the Secure world context is saved on secure stack, the execution context of the NS task is restored, and the control flow is transferred back to the task. The time to perform these operations individually is shown in Fig. 10 in which context switch corresponds to the cumulative execution time of step (ii) and (v). The total time required to setup MPU_Vault is 878
Execution time (in microseconds) of operations to setup the MPU_Vault before ShieLD-based communication can begin: (i) Memory alignment and allocation for the MPU_Vault, (ii) Context switch between the secure and non-secure world, (iii) Task integrity verification and source authentication using HL and (iv) MPU configuration to set appropriate permissions
9.5 CPU Time to Protect MPU_Vault
Here we measured the CPU Time to protect the MPU_Vault in case of an interrupt. As mentioned in Section 6.2, every time an interrupt occurs, the ShieLD performs a set of operations to protect the MPU_Vault: (i) redirect control flow to the Secure world using trampoline code before execution of the interrupt handler; (ii) change the access permission of MPU_Vault by re-configuring the MPU_NS; (iii) save registers into the secure protected stack for control flow integrity; and (iv) return the control flow back to the interrupt handler routine. These operations are also independent of the currently executing application. The average CPU Time for the ShieLD protection mechanism during interrupt handling is 185
Scalability of ShieLD in Resource-Constrained IoT
In this section, we discuss the scalability of ShieLD services with increase in number of applications from three perspectives: availability of (i) RAM, (ii) MPU regions and (iii) size of MPU region. IoT devices based on Cortex-M3 are designed to run fewer applications as compared to high-end devices (based on Cortex-A). With kilobytes of RAM, these devices run 1-3 applications on average. If system memory supports the increase in applications, ShieLD services will be valid. Since MPU-based protections are not bound to a memory range i.e., an MPU can protect a memory region ranging from 32 bytes to 4 GB in size [26], increase in MPU_Vault size would not entail any performance overhead on ShieLD. As described with experimental evidence in Section 9.4, performance overhead to enable an MPU-protected region does not increase with the increase in region size. ShieLD services are limited only by the availability of MPU regions. Based on the processor architecture, we have 8-16 memory regions that can be protected using the MPU. Considering that ShieLD protects the MPU_Vault, the requesting task’s code, stack and the RTOS static region using MPU, ShieLD can provide secure communication services to 3 applications simultaneously if the system has 16 available MPU-protected memory regions.
Security Analysis
We assume that an attacker with privileged access can compromise any component running in Normal world including ShieLD components to launch various attacks on the MPU_Vault which resides in the Normal world. The legitimate application (i.e., the owner of the MPU_Vault) is not malicious itself, but can be compromised. We also assume that an unprivileged task can interrupt the ShieLD operations and try to access the MPU_Vault contents. Our assumptions regarding the Secure World being intact and not accessible to any attacker remain applicable. We conduct a security analysis of the system under the above attacker assumptions using an attack tree (Fig. 11). The attack tree is used to itemize the attack vectors and provide a higher level of abstraction for the security analysis of ShieLD. The root of the tree represents the final goal of an attacker, which is to compromise the MPU_Vault by gaining access and modifying its contents (G). The child nodes represent the possible attacks or sub-goals. In order for a parent attack to be successful, at least one of the child attacks need to be successful. Fig. 11 also enlists a few (out of many) instances of vulnerabilities (V) that can be exploited to gain a bare minimum level of control within the Normal world e.g., by including malicious software in the software stack of the device. The list of vulnerabilities represented here is not absolute, but is a subset of the all the vulnerabilities that the Normal world software might have. Some vulnerabilities are specific to attacks, like using the debug facilities to interrupt execution, while others are generic and equally applicable to most of the attacks.
ShieLD’s threat analysis using an attack tree to itemize the attack vectors. The root of the tree represents the main goal of the attacker, which is to access and modify/overwrite the contents of MPU_Vault. The child nodes represent attacks or sub-goals that are needed to achieve the main goal. In this attack tree, if one of the child attack is successful, the parent attack is successful (OR Condition). Each of the attacks is possible by exploiting some vulnerability in the Normal world and a brief list of vulnerabilities is represented on the right side of the figure.
Now, we discuss the attacks one by one (A1 to A9 in Fig. 11) and how ShieLD mechanisms are designed to defend against each of them. We argue for the sufficiency of the defense mechanisms against those attacks and how the attacks fail in order to obtain the expected security guarantees of ShieLD. The attacks A1, A2, A3, A5 and A6 are possible if an attacker has privileged access in the Normal world (e.g., by compromising the Normal world OS). The privileged attacker may then exploit a legitimate task code (A1) by injecting carefully crafted code sequence (A6) or modify the
11.1 Security Analysis of ShieLD versus SeCReT[4]
This section presents a comparative discussion on ShieLD’s and SeCReT’s [4] respective approaches to ensure (i) confidentiality of the transferred message (ii) integrity of the system components and (iii) availability of the secure communication service. ShieLD protects the message confidentiality by using the MPU. Since MPU-protected regions are neither accessible nor modifiable, the message confidentiality is preserved. On the other hand, SeCReT uses cryptography for message confidentiality. ShieLD protects the integrity of its system components like the MPU, trampoline code placed in the Normal world, privileged Normal world software and the
Limitations
In this section, we discuss some of the limitations of ShieLD which also lay the ground for future work and improvements. The security guarantees of ShieLD rely on the correct implementation of TrustZone-M specifications. There exist precise guidelines to implement TrustZone-M capabilities as incorrect implementations could collapse the TEE. As long as TrustZone-M remains resilient against all classes of physical and side-channel attacks, ShieLD guarantees also remain intact. Secure application design and implementation plays a crucial role in the overall system security. Applications that are to be placed in the TEEs should be free of bugs that could be exploited. ShieLD relies on the application developers in this regard. This is another limitation of ShieLD that it does not incorporate code analysis capabilities. As a result, the system remains prone to vulnerabilities like input validation and system bugs leading to buffer overflows. Such flaws in the secure code can be exploited to inject maliciously crafted parameters to install rootkits which are extremely hard to detect. Since the secure code placed in the TEE has access to the entire secure software stack, having third-party applications in the TEE could lead to eavesdropping by curious applications, this presents itself as an interesting research problem and is part of our future work. The scalability of ShieLD with respect to the number of applications ShieLD can protect on a system depends on the resources available. As mentioned in Section 10, the availability of a limited number of MPU regions (8-16 depending on the platform), impacts the number of applications it can protect. Since ShieLD protections depend on the availability of MPU regions, it presents itself as a limitation.
Conclusion
We have presented ShieLD, a framework that enables secure communication, in the presence of a vulnerable software stack (including the normal-world OS), between the two worlds of TrustZone-M within an IoT device. ShieLD exploits the novel use of MPU and enables a secure vault that is exclusively accessible to the legitimate application in the Normal world that wants to access and execute security-critical operations in the Secure world. ShieLD provides similar security services (authentication, confidentiality, and integrity) as provided by the conventional crypto-based secure communication. We have implemented ShieLD in a TrustZone-M enabled IoT device and evaluated its memory and execution time (that translates to power/energy) overhead. Our empirical evaluation shows that ShieLD is extremely efficient when compared with the crypto-based communication protection. Though ShieLD targets IoT devices featuring TrustZone-M, the techniques proposed in this paper could be extended to other TEEs. We plan to extend this work to IoT devices that use the RISC-V architecture, exploiting the Physical Memory Protection (PMP) [47] unit of RISC-V.