Confidential Machine Learning Computation in Untrusted Environments: A Systems Security Perspective

As machine learning (ML) technologies and applications are rapidly changing many computing domains, security issues associated with ML are also emerging. In the domain of systems security, many endeavors have been made to ensure ML model and data confidentiality. ML computations are often inevitably performed in untrusted environments and entail complex multi-party security requirements. Hence, researchers have leveraged the Trusted Execution Environments (TEEs) to build confidential ML computation systems. We conduct a systematic and comprehensive survey by classifying attack vectors and mitigation in confidential ML computation in untrusted environments, analyzing the complex security requirements in multi-party scenarios, and summarizing engineering challenges in confidential ML implementation. Lastly, we suggest future research directions based on our study.


I. INTRODUCTION
The recent advancements in machine learning (ML) and its applications are bringing far-reaching changes across many fields in computing. Many areas of studies made endeavors to improve ML in various aspects or adopted ML for many purposes. However, security issues are posing a formidable threat to ML-based technologies and services. For instance, the robustness of ML models and derived services against malicious actors is under discussion in the context of adversarial machine learning [1][2][3][4][5][6]. Also, existing research has shown that the data used during training can also be leaked through inference results [7][8][9][10]. Protecting ML models and data has been the most important security objective in ML computations, among others. The data used for training ML models often include privacy-sensitive information in a large volume. Hence, failure to maintain the confidentiality of the data can face catastrophic consequences [11]. In the case of ML models, they are often the intellectual properties of service providers.
Secure computation of machine learning workloads has also been a main topic of interest for many systems security researchers for many years. A plethora of works in the realm of systems security have explored the security risks and solutions in protecting the confidentiality of the protected assets such as ML models, programs, and data during computation [12][13][14][15][16]. Confidential ML computation is the most prominent approach in such endeavors. Confidential ML computation adapts trusted execution environments (TEEs) [17,18] to protect the confidentiality of the protected assets such as the data, ML model as well as computations performed on them. TEEs are hardware-supported technologies that can be leveraged to protect sensitive code and data. TEEs have long been adopted in implementing confidential large data computations [19][20][21][22]. More recently, a number of researches have proposed unique challenges in TEE-based confidential ML computation [15,[23][24][25][26].
We provide a systematic and in-depth review of the current state of the confidential ML computation through our comprehensive review of the existing works. We identified largely three prominent topics in confidential ML compu-tation that the accumulation of contributions has shaped. We classify and analyze the existing works in each of the three topics: threats and mitigation of attack vectors in untrusted computation environments, multi-party confidential ML computation, and retrofitting software and hardware architecture for confidential ML computations.
Regarding the first topic, we analyze the security risks in terms of attack vectors, and analyze their ramifications on the confidentiality of ML computations ( § IV). The necessity for performing the computation in an untrusted environment is common for ML computations. The widespread use of cloud computing resources for data-intensive ML computations is the most common example. In many scenarios, ML computations are often performed in untrusted environments ranging from the cloud, edge devices, and end-point devices, hence requiring protection from TEEs. However, applying TEE to sensitive computations is not without its security risks. TEE-protected workloads in untrusted environments such as the cloud face a large attack surface. Many works have shown that the confidentiality of TEE can be compromised through various side-channel attacks [27][28][29][30][31]. There are also attacks that specifically target TEE-based ML computations [30,[32][33][34][35][36]. We also discuss vulnerabilities that might compromise confidential ML computations on the edge ( § V).
Secondly, we review the existing works in the context of the multi-party scenarios and their security requirements ( § VI). Confidential ML computations often must satisfy security requirements for mutually distrusting parties simultaneously [12][13][14]. The participating entities of the computation, such as ML model owner, data contributor, computation platform provider, and service user, are not necessarily the same party. Furthermore, their interests and security risks are frequently at odds, presenting unique security requirements which a confidential computation ML scheme has to satisfy. Such multi-party confidential ML schemes often require more than the simple application of TEEs; rather, a comprehensive framework specific for each scenario must be designed with careful security requirement analysis [12,13,[37][38][39].
Third, we discuss the engineering efforts that resolved the implementation challenges in building confidential ML computation ( § VII). Resolving engineering challenges in confidential ML computation designs and implementations have been discussed in many works. For instance, ML computations on large data had to be split into smaller batch sizes to overcome the limited memory capacity of TEEs [23,40]. Additionally, porting ML frameworks (e.g., Tensorflow and PyTorch) to use TEEs is a daunting challenge, considering the complexity of such software.
This survey reviews over 140 works that contributed to building secure systems for ML and similar data computation, focusing on the two most prevalent TEEs, Intel SGX for cloud computing and ARM TrustZone for edge computing. We summarize the contributions and insights of this paper as the following: • We conduct a thorough survey of attack vectors in confidential ML computation in an untrusted computation platform.
• We categorize the attack vectors on confidential ML computation through our study of existing attacks. We also discuss available mitigations.
• We generalize the TEE-based multi-party ML computation schemes into four scenarios and identify per-party security requirements. Also, we explain the contributions of the existing works according to our generalization.
• We summarize the engineering efforts that seek to optimize software and hardware for secure ML computations.
• Based on our thorough survey, we point out the relatively underexplored topics.

II. BACKGROUND AND RELATED SURVEYS
In this section, we explain the motivation for confidential ML computation that is commonly shared among the literature that we cover. Also, we describe the key techniques and terminologies that are essential for comprehending the contributions of the works that we review. Lastly, we explain the scope of the survey and its uniqueness with respect to existing surveys on the security of ML.

A. CONFIDENTIALITY IN ML COMPUTATION
Confidential ML computation seeks to protect the confidentiality of the assets involved in ML computations in untrusted computing environments. The protected assets can include data, ML model(s), ML programs, and by-products of the computation that can indirectly undermine the confidentiality of the protected assets. In ML computation, it may be desirable to maintain the confidentiality of the data used for training, the program that is used for training, or the resulting ML model. ML computations that can be subject to protection for confidentiality are model training and model inference.
Need for confidential ML computation. ML computations often inevitably take place in untrusted environments for several reasons. First, ML computations are often performed in the public cloud services due to their convenience. Utilizing the cloud computing services is cost-efficient compared to in-house computation infrastructures in many cases. Second, the multiple parties may contribute to an ML process [12,13,37]. For instance, the owner of the data and the party that trains and claims the ownership of the resulting ML model may be different parties but still want confidentiality of their assets. Third, computation platforms can be bound to a specific location due to service quality or security policy. A service that utilizes the results of inferences using an ML model may be latency-sensitive (e.g., a self-driving automobile). In such cases, the ML model may have to relocate to the location of the service for reliability and service quality. Similarly, a security policy may restrict the data from being removed from a designated infrastructure or location (e.g., medical institution). Then, it enforces the ML model or ML program of a different party to be stored in an untrusted environment (from the model owner perspective). For the reasons above, it is clear that confidential ML computation methods are necessary for securing valuable assets of ML computation in untrusted environments such as the cloud. Commercial confidential cloud computing services [41][42][43] that support ML computations represent the customer demand and an emerging market for confidentiality guarantees on their workload in the cloud.

B. TRUSTED EXECUTION AND CONFIDENTIAL ML COMPUTATION
Confidential computing schemes that leverage trusted execution environments have been actively explored through many works to protect data and code execution [19,20,23,44]. While cryptographic methods (e.g., homomorphic encryption) can achieve similar security goals, many academics works and existing services adapt TEE-based schemes due to their practicality in terms of performance [19]. CPU-support for trusted execution. Trusted execution environments (TEEs) included in commodity processor architectures commonly provide hardware memory protection mechanisms that enable the isolation of in-TEE program code and data. Only when the current context is in a trusted execution mode, the isolated code and data become accessible. Hence, the usual programming model using TEEs is to split a program into trusted and untrusted domains. The trusted domain is protected by the hardware mechanisms and only accessed through a strictly controlled interface from the untrusted domains. Protected code inside the TEE and trusted hardware form a Trusted Computing Base (TCB). Intel Software Guard Extensions (SGX) [17] and ARM TrustZone (TZ) [18] are the most prevalent TEEs adopted by existing commercial devices and in many research works. Intel SGX. SGX [17] introduces a set of hardware extensions to the x86 architecture to support an isolated memory space and execution mode called the enclaves. Each process can create its own enclave to store its sensitive code and data. The memory pages that belong to enclaves, called Enclave Page Cache (EPC), are protected by hardware from all privileged software such as the OS kernel, hypervisor, or even the BIOS. A context can enter an enclave only through the strictly controlled interface that is defined by Enclave Calls (ECALLS). SGX's security model defines only the in-enclave program and the CPU as TCB. Due to the dominance of Intel processors in the server market, SGX is adopted for many in-cloud confidential ML computation schemes. ARM TrustZone. ARM TrustZone (TZ) [18] is commonly used on mobile and edge devices due to the ARM processors' prevalence in the market. TZ divides the processor states into the secure world and the normal world. While SGX exclude kernel from its TCB, TZ's secure world also includes a secure kernel. Hence, the TCB of TZ is the whole software stack in the secure world, including the trusted apps and the trusted kernel. Hardware mechanisms strongly enforce the isolation between the two worlds. Although TZ employs a more coarse-grained isolation (two security states) compared to Intel SGX (at each process level), recent proposals also introduce the capability to host a secure enclave to TZ without requiring hardware modifications [45,46]. TEE designs for RISC-V. Apart from SGX and TZ, there are also several research proposals for other ISA. In particular, Sanctum [47], Keystone [48] and CURE [49] are TEE architectures designed for RISC-V. TEEs and confidential computing. Over the years, many works leveraged TEEs for confidential computing systems. Haven [50] enables two-way protection of applications by first employing SGX hardware to protect the code and data inside the enclave, then utilizes software containers to isolate untrusted and unmodified binaries from the host system. Subsequence works follow the same approach for two-way isolation [14,[51][52][53][54]. SCONE [51] enables Docker [55] services to be protected by SGX. Ryoan [14] prevents untrusted data processing modules from leaking information by utilizing NaCl containers [56]. Finally, Graphene-SGX [52] provides an efficient library OS for SGX based on Graphene [57]. There are also works that utilize TEE to protect distributed computation (e.g., MapReduce and Spark), which requires coordination from enclaves on multiple systems [19,21,22]. All the works mentioned above provide a strong baseline on which secure ML systems using TEEs can be built.

C. ALTERNATIVE APPROACHES TO TRUSTED EXECUTION
Besides the hardware support for TEE, there exist techniques that serve as building blocks for confidential ML computation. Our survey does not focus on those approaches, since they are already covered by several surveys. For instance, a survey by [58] reviews privacy-preserving cryptographic techniques to protect privacy in deep learning. The survey by Sagar et al. [59] focuses on cryptographic approaches for protecting confidential ML computation on untrusted platforms. Finally, Ji et al. [60] provides a comprehensive review of differential privacy techniques' applications in machine learning. Therefore, we only explain the techniques briefly here. Homomorphic Encryption. Homomorphic encryption (HE) are cryptographic schemes that allow computation to be performed on encrypted data without being decrypted. Fully homomorphic encryption (FHE) allows arbitrary computations on encrypted data, thereby allowing computations can be performed without breaking the confidentiality of data. However, due to high-performance overhead, many recent works have resorted to TEE as a practical alternative [19]. VOLUME 4, 2016 Secure Multi-party Computation. Secure multi-party computation (MPC) [61][62][63][64] is a confidential computing approach that employs cryptographic protocols, which splits the computation on data shared between multiple parties in a way that no individual party can see the other party's data. MPC mechanisms are supported by various cryptographic building blocks, such as secret sharing, coin tossing, oblivious transfer, zero-knowledge proofs. In this survey, we only discuss TEE-based multi-party computation and leave cryptographic multi-party computations out of scope. Differential Privacy. Differential privacy (DP) places a constraint on the algorithms used to publish aggregate information about a statistical database, limiting the disclosure of private information of records in the database. For example, differentially private algorithms are used by some government agencies to publish demographic information or other statistical aggregates while ensuring the confidentiality of survey responses and by companies to collect information about user behavior while controlling what is visible even to internal analysts. DP is essentially an algorithmic-based data handling, thus is orthogonal to the existence of TEEs. In fact, multiple TEE-based confidential ML systems apply DP to enhance data privacy [65,66].

D. SCOPE OF THIS STUDY AND SIMILAR RELATED SURVEYS
Our survey focuses on the systems endeavors that seek to protect the confidentiality of ML models, data, and ML programs in ML computations. We systematically analyze the in-system attack vectors that might compromise confidential ML computation protected by TEEs and their proposed mitigation methods. We also discuss the case of multi-party computation, where multifold security guarantees have to be simultaneously satisfied for parties with different interests. Furthermore, we survey the problems that one faces when implementing practical TEE-based confidential ML systems. These problems range from current memory capacity limitations to unfriendly TEE development systems. Clearly, we do not focus on adversarial attacks [67][68][69] which aim to harm the integrity of ML services.
To the best of our knowledge, a comprehensive survey from a system security perspective for secure ML computation has not been conducted. Although some surveys that cover the non-system security aspect of machine learning have been presented recently, those surveys focus on a specific type of deployment environment [70,71] or a technique used to achieve security (e.g., differential privacy [60] and cryptography [58,59,72]). Mireshghallah et al. [72] discuss the privacy issues of deep learning systems, but primarily focuses on the indirect threats and algorithmic and cryptographic defenses. A survey by Meurisch et al. [73] discusses both systemic challenges and algorithmic challenges of data protection in general AI services. Our survey, on the other hand, discusses the protection of all valuable assets and is specific to ML computation.

III. DEFINING ENTITIES AND ASSETS IN CONFIDENTIAL ML COMPUTATION
Varying computation environment (e.g., cloud vs. edge) and numerous entities involved in the computation, protected assets, and security goals complicate the security model of confidential ML computations. Hence, unified notations are essential in congregating the contributions made in each work with different attacker models and goals to paint a big picture that shows the current status of confidential ML computation. We establish a list of notations that will be used throughout this paper such that we can explain and classify the works with consistency. Table 1 describes the notations we used to classify the entities participating in ML computation and the protected assets processed by the ML computation. We provide an in-depth explanation of the entities and protected assets in subsections § III-A and § III-B.
Overview of the ML workflow. Figure 1 describes an overview of the general ML workflow, annotated with our notations. Most ML computations start at the data collection phase, where data used for training is collected into a data set. Typically, the data set is preprocessed, i.e., is cleaned and transformed to a form that is easier to process by the ML algorithm. Then, the computation enters the model training phase, where a training program containing the logic for the ML algorithm (e.g., deep neural network (DNN)) fetches batches of training data to calculate the gradient update.
The training program uses gradient updates to update the ML model until it has processed all training data or met predetermined stopping criteria. Finally, after training finish, the trained ML model is deployed for model inference, where the model will be employed to perform predictions. Commonly, the model is deployed in an inference service executed in the cloud that receives users' inference data as queries through an API endpoint. On receiving queries, the service then feeds inference data to the trained ML model to obtain the prediction results, then send the results back to the users. As we discuss in § VI, there are scenarios where the model inference services must be deployed to the edge and users' devices to perform the computation.

A. ENTITIES INVOLVED IN ML COMPUTATION
E-1: Data Owner. Data owners are the ones who contribute data to train an ML model (training data owners, E-1-a) or send their private data to the model owner to request for inference (inference data owner, E-1-b). In most cases, data owners are users of an ML service. Data owner and model owner can be the same entity, as in the case of offloading ML computation to the cloud.
If the data owner is different from the model owner and platform provider, for instance, when a user submits their data to a cloud-deployed ML model for prediction, the data owner wishes to preserve the privacy of their data. Such privacy must be protected even when data is used by the ML model for training and inference. This means that: (1) The ML program that handles data for training or inference

ID
Entity Description

E-1 Data Owner
The one who contributes data to train a model (E-1-a), or the user who sends private data to an ML service for inference (E-1-b) E-2 ML Model/Program Owner Model owner who desires confidentiality of her model and program E-3 Computation Platform Owner The entity who has full control over the device in which ML computation is performed (e.g., cloud service provider, device owner)  The notations (ID) for entities, protected assets and computation performed in confidential ML computation and their description. We explain existing literature and confidential ML scenarios in terms of the shown notation to maintain consistency and enable comparisons of existing works. must respect its privacy, (2) The resulting model must not contain traces of information that trace back to the data owners or violate their privacy (e.g., the model must be differentially private). Both constraints have been addressed by several related works, using TEEs [13,14,65] and using differential privacy techniques [74]. E-2: Model Owner. Model owners are the entity that possesses the intellectual property (IP) of an ML model. In most cases, model owners want to keep their models private, as leaking the model can lead to a detrimental loss in profit for a business or introduce privacy risks. Researchers have demonstrated that sensitive information about training data can be extracted from pretrained ML models [75,76]. As a result, some ML services provide only limited access to the model through API queries. When a model is deployed for inference, the inference data owner could be considered an adversary that wishes to steal the underlying ML model. In this scenario, the model owner also desire that no information about the model be leaked during the ML inferencing process. E-3: Platform Provider. As training modern ML models often requires significant computation power, model owners and data owners often offload the computation to services that provide specialized infrastructure for ML workloads, the platform provider. In the most common case, the plat-form provider is a cloud service provider that is hosting the system running the computation. However, when ML services are getting deployed to the users' devices, the platform provider can also be the user of the service.

ID Protected Asset Description
Platform provider is commonly untrusted by parties that are using it; however, with the introduction of trusted hardware such as Intel SGX [17] and ARM TrustZone [18], the remote client of untrusted platforms can establish trusted and secure execution environments, sometimes referred to as enclaves, on the platform.

B. PROTECTED ASSETS IN ML COMPUTATION
P-1: Data. Data is at the center of building and servicing ML models; data defines the model behavior and the quality of the trained model. In ML training, data is usually aggregated into a data set, a collection of related data used for a specific purpose. There are two types of data utilized in an ML workflow, training data, data employed to optimize the model and inference data, data used to obtain the prediction results and never affect the trained model.
Data used for ML training and inferencing often contains privacy-sensitive information that is directly or indirectly relates to thousands or even millions of individuals. Securing the confidentiality of the data is becoming an important ethical responsibility for research institutions and corporates VOLUME 4, 2016 as their data expands in terms of spectrum and depth, and also failure to secure the confidentiality of the data may result in costly lawsuits. P-2: ML Models. An ML model commonly consists of model architecture and model parameters. Model architecture refers to the metadata that is associated with the topology of an ML model (e.g., the layers of the neural network, the connection between each layer, or the activation functions, etc.). Model parameters, sometimes referred to as model weights, are tuned as a result of model training. The model parameters and architecture are typically stored separately or together by ML frameworks in a compact binary format such as Protocol Buffers [77] or HDF5 [78] that can be efficiently loaded by the applications later for inference. Efforts also have been made to make model files more portable across platforms through the ONNX open format [79].
Developing and training mature ML models often require a tremendous amount of effort, and therefore considered intellectual property to be protected unless specifically declared to be public. It would require a well-thoughtout data modeling plan and a possibly long period of data acquisition. Designing and training a ML model would require highly trained data scientists, not to mention the cost of infrastructure for computing (e.g., cloud computing resources). P-3: ML Program.
The ML program describes the procedures to handle ML models and data, and the procedures in which the ML model interacts with data. Model hyperparameters are typically defined by the ML program instead of residing in a model file. Hyperparameters refer to information of the model associated with the learning process, which includes the learning rate, batch size, and regularization factors. Since the program also contains the trade secrets of an ML service provider, such as the method of processing data or how user's input is handled, they also need to be kept private.

IV. SECURING OFFLOADED ML COMPUTATIONS IN THE CLOUD
Data-intensive ML computations are becoming one of the most common workloads in cloud computing services. Due to Intel's dominance in the cloud computing market, a plethora of works has leveraged Intel SGX [17] to achieve confidential computation in the cloud [13,14,21,23,95]. Microsoft's Azure offers SGX-enabled computing resources [41].
Many works that discuss building secure ML computation in the cloud seek to protect at least one of the following: Data (P-1), ML model (P-2), and ML program (P-3) offloaded by the client. Also, many works assume that a single entity has ownership (E-1, E-2, E-3) of all three protected assets (we discuss multi-party computation separately in § VI). The offloaded workload, while protected by SGX, faces formidable adversaries. Untrusted cloud service infrastructure can launch powerful attacks with system soft-ware (e.g., kernel) privilege [96,97] or even physical access to the hardware [98,99]. An untrusted co-tenant may launch attacks that abuse resource sharing that inevitably occurs in the cloud [30]. Table 2 outlines the reviewed offensive and defensive research works related to ML computations, categorized by the attack vectors (AV) in untrusted environments. We group the attack vectors into three groups, attack vectors caused by untrusted software (SW), attack vectors obtained through physical access and accelerator-related attack vectors. The following subsection ( § IV-A) discusses the basic security guarantees provided by protecting ML computations in SGX. We describe the attack vectors that are not mitigated by SGX by design in § IV-B .In § IV-C, we discuss their impact on confidential ML computation through papers that directly discuss the impact of the attack vectors in confidential ML computations. In § IV-D, we discuss works that extend on the basic security guarantees of SGX to cover side-channels from ML computations and introduce the ability to offload to external accelerators. Finally, in § V, we briefly cover the microarchitectural side-channels of TrustZone, the TEE of edge devices.

A. SGX SECURITY GUARANTEES
SGX enclaves are protected from attack vectors AV-1-a and AV-2-a by design, as explained in § II-B. The confidentiality of computations inside the enclave is preserved with hardware-enforced memory isolation from all other system execution modes (e.g., kernel, hypervisor, or BIOS). Furthermore, SGX's Memory Encryption Engine protects the confidentiality and integrity of the memory used by secure enclaves [100]. Several physical attack vectors such as cold boot attack and DMA accesses (AV-2-a in Table 2) are prevented by SGX hardware, as the memory encryption engine automatically encrypts and decrypts the memory accesses to the EPC region as they leave the CPU package. However, an attacker with physical access could still observe the memory access pattern. Remote attestation. SGX's hardware cryptographic functions and remote attestation feature allow the remote user to verify the identity and the offloaded program's initial state. The remote user can request proof from the enclave and query the Intel CA to verify that the authenticity of the SGXcapable CPU in the cloud. Furthermore, the enclave sends a signed measurement result on its initial program state, thereby proving that it is indeed executing the program that the remote user had sent. These cryptographic exchanges included in the remote attestation procedure become the basis for remote users' trust in the enclave in the untrusted cloud. In other words, through remote attestation, trust is extended from the remote user's machine to the program running in the enclave. Subsequently, the remote user could send the secret to the enclave through an encrypted channel.
In the ML computation case, the secret is data, ML model, and the ML program.  amount of work has used SGX to build secure ML systems by leveraging the security guarantees of SGX (e.g., Occlumency [23], TensorScone [26], secureTF [15] and TF Trusted [101]). Those works use SGX to protect the confidentiality of ML assets when the computation is offloaded to an untrusted cloud. As SGX hardware already provides most of the protection through hardware mechanisms, most works try to solve the engineer challenges associated with adapting ML computation for SGX, e.g., the memory limitation. We will further discuss the challenges that those face in more detail in § VII. Moreover, we cover the works that apply SGX to build secure multi-party ML systems in § VI. In this section, we will only cover techniques that improve upon the basic security guarantees of SGX in protecting the ML computation, namely side-channels mitigation and extending trust to external accelerators.

B. THREATS AGAINST ENCLAVE CONFIDENTIALITY
Protecting the confidentiality of in-enclave protection proved to be a daunting challenge. A large volume of works presented side-channel attacks that undermine the confidentiality of the enclaves through various attack vectors that were not considered in the initial SGX security model. Those attacks collect execution times or access patterns of protected programs to infer sensitive information [28,102,103]. The existence of side-channel is becoming the most formidable challenge in providing security guarantees in ML and other general computations using SGX in the cloud.

1) Software Side-channel Attacks
Controlled channel attacks (AV-1-b). SGX's TCB does not include the OS kernel, yet enclaves must interact with the kernel for services such as system calls and interrupts.
Researchers have exploited side-channels in such interac-tions, called controlled channels. The controlled channel attack on SGX-protected computation was first introduced by Xu et al. [102]. In this attack, a malicious privileged software (i.e., hypervisors and OSes) unmaps enclave pages and monitor page faults of the enclave application to extract the access pattern. The initial controlled-channel attack can only infer a coarse-grained access pattern at the page level. Subsequence works leverage the kernel's ability to raise interrupts in combination with microarchitectural cache side-channels to improve upon the granularity of leaked information [28,97].
Microarchitectural side-channel attacks (AV-1-c). Although applications protected by TEEs have strong confidentiality and integrity of codes and data, microarchitectural side-channel attacks (AV-1-c) are a prevalent threat to TEE-protected computations [28,29,104]. Researchers have shown that the newly discovered microarchitectural attacks on Intel x86 processors also affect SGX-protected executions. Most microarchitectural attacks aim to extract the access pattern by leveraging side-channels branched from shared microarchitectural resources between the victim process and the attacker. Researchers have leveraged microarchitectural features such as the Translation Lookaside Buffer (TLB) channel [105], last branch record (LBR) feature [103], branch predictor [106], line fill buffers (LFB) [107], and cache [27,28,104]) to steal secret information from software protected by SGX.
Adversaries with control over untrusted software within a system can still extract memory access patterns and subprogram execution times of TEE-protected computations through side-channels. VOLUME 4, 2016 2) Attacks with Physical Access (AV-2) Physical attack is another concern for the computations protected by TEEs. Multiple works demonstrate that the bus, where memory accesses and communication packets travel from the CPU to other system components, creates attack vectors that can impact the confidentiality of computation protected by SGX.
Leaking information on the bus (AV-2-b).
In SGX, although enclave memory contents are automatically encrypted as it leaves the CPU package (e.g., store to DRAM), the address access pattern side-channel is still observable by an adversary who has a snooping device set up on the memory bus. For instance, Membuster [98] has shown the feasibility of the bus snooping attack; the work has shown that they could extract the queried English words from an enclave-protected dictionary program along with other examples. Therefore, we recognize the physical as another attack vector that must be considered in confidential computing design.
Adversaries with physical control over the system can leverage access to the bus to capture the memory access pattern of programs protected by TEE.

3) Attacks on Accelerators (AV-3)
ML computations are often highly parallelizable and can immensely benefit from the use of GPUs [108,109] and dedicated ML accelerators [110,111]. However, utilizing accelerators to accelerate confidential computation introduces several issues that could compromise the security of TEEs, as the accelerator hardware itself is not included in the TCB of TEEs.
Insecure communication between CPU and accelerators (AV-3-a). Add-on accelerators that communicate with the processor through an open and unencrypted medium (e.g., system bus) can leak information under powerful attacks such as physical bus snooping or communication path manipulation by the untrusted OS. As most external accelerators do not provide encryption in communication, an attacker can easily reverse-engineer the packets to infer sensitive information about offloaded workloads. We show an existing attack that utilizes this attack vector to compromise assets of ML computation [31] in the next section.
Side-channels in sharing accelerator (AV-3-b). Sidechannels are also inevitable in the accelerator itself since the use of peripheral devices is shared between software in most systems. Enforcing the secure usage of accelerators (e.g., the GPUs) would require hardware modifications to the accelerator architecture itself [89] or to the host-accelerator interfaces [90]. Some research works exploit side-channels from shared GPU usage to extract sensitive information from the program executing on the GPU [27,94,112].

C. KNOWN ATTACKS ON CONFIDENTIAL ML COMPUTATION
A number of works have devised attacks specific to ML computations to leak protected information.

1) Software Side-channel Attacks on ML Computation
Cache side-channels (AV-1-c). The most common microarchitectural attacks on confidential AI discussed in the literature are those which exploit the memory access pattern obtained through cache side-channels to violate the confidentiality of ML models (P-2) or data used in ML computation (P-1). For instance, Privado [80] demonstrated an attack on a secure inference service inside SGX that leaks information about the input data (P-1). In the attack, an attacker infers the output of the TEE-protected ML inference service by observing memory access patterns leaked from side-channels. More importantly, several researchers also demonstrated attacks that are able to recover information about the ML model (P-2) just from the cache sidechannels. Ganred [81] employs a generative adversarial network (GAN) to reconstruct the target ML model from cache side-channel timing information.
Memory access patterns collected from side-channels can be exploited to extract protected inference queries and even ML models.
Risks of page sharing. Existing works have shown that an untrusted co-tenant in the cloud, a less powerful adversary compared to an untrusted cloud administrator, can still undermine the confidentiality of the ML model (P-2). Cache Telepathy [30] exploits shared library pages that are shared among co-tenants to launch cache sidechannels (AV-1-c). More specifically, it shows that systemwide sharing of physical pages that store general matrix multiplication (GEMM) operations can serve as a sidechannel that allows the adversary to extract the deep neural network (DNN) model's architecture of other co-tenants.
Sharing of resources among co-tenants in the cloud may serve as a potential side-channel for eavesdropping on ML computation.

2) Attacks with Physical Access and Attacks on ML Accelerators
Attacks with leaked information on the bus (AV-3-a and AV-2-b). Multiple research works have demonstrated that an attacker can utilize physical access to the bus to extract sensitive information from confidential ML models (P-2). For instance, Hua et al. [82] demonstrate an attack that extracts convolutional neural network (CNN) models deployed on a CNN accelerator. The attack starts with feeding inputs to the accelerators, observing the memory access pattern by the accelerator on the bus, and the timing between accesses to reconstruct the ML. On the other hand, the Hermes attack [31] captures and analyzes PCI-e packets sent from the CPU and to the GPU, then uses the obtained information to reconstruct the entire DNN model.
The use of accelerators through insecure I/O channel may allow an adversary to extract protected ML models.
Side-channels in sharing GPU (AV-3-b). Sharing a GPU in the cloud proved to be a side-channel that can be leveraged by a malicious co-tenant to extract information on a GPU workload. Works have also demonstrated that such information can leak information of ML computation, in particular, the confidential ML model (P-2). Several works exploit the side-channel from sharing the GPUs to recover the DNN architecture [27,94]. Leaky DNN [27] demonstrates an attack that extracts the DNN model inside GPUs by monitoring the resource usage of the victim kernel using GPU's built-in performance counters. DeepPeep [94] combines multiple GPU-based side-channels (e.g., memory footprint, timing, power usage, and kernels percentage) to reverse-engineer the target DNN model offloaded to the GPU.
Sharing of GPU among cloud co-tenants may serve as a side-channel for information leak.
Power and EM side-channels (AV-2-c). Some research works demonstrated that power analysis attacks [113] and Electromagnetic (EM) analysis attacks [84] might compromise confidential ML assets. For instance, several attacks predict the model architecture and parameters observing the power consumption and EM emission [83,84,113,114]. Mitigation against those attack vectors requires careful consideration at the architectural level, which is challenging.

D. EXTENDING SGX'S SECURITY GUARANTEES 1) Side-channel Defenses for Confidential ML Computation
The generally accepted mitigation for side-channel leakages is input-oblivious algorithms. Such algorithms exhibit indistinguishable memory trace regardless of input values. Oblivious computing.
The most common approach is to use compiler transformations to turn input-dependent code into input-oblivious code. Input-oblivious algorithms could prevent most of the attack that aims to leak the memory access pattern of code and data protected by TEEs, e.g., microarchitectural side-channel attacks (AV-1-c) and memory snooping attack (AV-2-b). Making the entire code base of ML frameworks input-oblivious is expensive in terms of performance. For instance, Raccoon [115], a stateof-the-art method to achieve obliviousness, exhibits an average performance overhead of 21.8×, which makes it unsuitable for performance-sensitive ML workloads. Hence, some works modify only the possible source of side-channel in an ML algorithm that may leak sensitive information of data to mitigate the poor performance of the input-oblivious algorithm [12,80]. Making ML programs oblivious. In [12], Ohrimenko et al. make ML computation oblivious at the algorithm level. The authors employ data-oblivious primitives (e.g., oblivious assignment, comparison, array access, and sorting) to construct oblivious ML algorithms that are free of sidechannel. Five oblivious machine learning algorithms are introduced: K-Means, CNN, support-vector machine (SVM), matrix factorization, and decision trees. Privado [80] made two key observation. First, most of the DNN computations only involve linear operations, which are data-oblivious. Second, several types of DNN layers have input-dependent memory access patterns (e.g., ReLU and max-pool layers). The authors propose a framework that applies compiler techniques to eliminate side-channels from DNN models. Overall, we observe that because DNN algorithms mostly consist of input-oblivious operations [12,80], mitigating the side-channels from those algorithms incur minimal overhead (0.02% for CNN training in [12] and 17.18% on average in Privado [80]).
ML algorithms can be transformed such that they are input-oblivious through algorithm redesign or compilerbased techniques.

2) Extending Trust to Accelerators
Most TEE-based confidential ML computation literature we reviewed in this survey assumes CPU-only computation. This is due to the SGX's security model that does not include OS kernel inside its trusted codebase. This means that SGX has to communicate to the accelerators such as GPU via the untrusted medium (i.e., kernel). As we will discuss in this section, the current SGX security model requires the accelerators to actively participate in establishing a secure channel for the inclusion of acceleration in confidential computation boundaries. However, there is no known commercial-grade accelerator with such capabilities. As such, a number of works have proposed ways to utilize acceleration without compromising confidentiality or GPU architectures that include the aforementioned essential feature for confidential computation. Blinding and verification. As computation executed on ML accelerators is subjected to several attack vectors discussed in § IV-B previously, sending plaintext data to accelerators is not secure. Moreover, the integrity of results obtained from external accelerators cannot be guaranteed due to the threat of compromised accelerators. Many works proposed techniques revolving around a combination of blinding and probabilistic result verification to allow workloads to be securely outsourced to external accelerators.
Blinding schemes obfuscate the offloaded workload with a random noise vector (the blinding factor), then de-blind the results upon retrieval. The approach seeks to ensure the confidentiality of the offloaded tasks. On the other hand, probabilistic verification schemes such as the Freivalds' algorithm intend to verify the correctness of the result returned from the untrusted accelerators. The algorithm probabilistically verify the offloaded linear operations (e.g., matrix multiplication) correctness with reasonable accuracy and performance overhead. However, the scheme does not VOLUME 4, 2016 provide deterministic (e.g., cryptographically secure) integrity guarantees.
A number of works apply blinding and result verification to protect the workload offloaded to accelerators [85,87,88]. Out of all, Slalom [85] is the first to propose the use of both methods to build a secure ML inference service which protects the user's input when it is offloaded to an external accelerator. Multiple following works extend on Slalom with improvements of performance and security [86][87][88]. For instance, while Slalom only supports inference workloads, GOAT [86] improves the performance with hyperparameters adjustment and allows offloading training operations. Shad-owNet [87], on the other hand, improve Slalom security by also protecting the offloaded model's confidentiality against the untrusted system.
Performance is one of the biggest issues with applying blinding and verification schemes. The offloaded computation can achieve relatively higher throughput than executing the same workloads in SGX, thanks to specialized acceleration hardware. However, the blinding operation is costly both in terms of execution time and memory usage, which are resources that are scarce in TEE-protected computations. Particularly, the blinding procedures require expensive operations to load encrypted blinding factors into the enclave memory and to blind the workload [85]. Moreover, to blind a batch of data, it requires a similarly sized vector of blinding factor, which effectively halves the usable enclave memory.
Blinding schemes seek to provide confidentiality through obfuscation. Probabilistic verification schemes probabilistically ensure the integrity of offloaded computations. The two approaches provide confidentiality and integrity simultaneously, but their guarantees are not deterministic. Also, using the two techniques incurs significant performance overheads.
GPU architectures that support TEEs. Graviton [89] proposes a GPU architecture that has built-in support for cooperative confidential computing with the host CPU enclave (e.g., SGX), thereby allowing acceleration of confidential workloads. The architecture introduces lightweight modifications to the command processor of the baseline GPU, with functionalities to support remote attestation and secure context management. The authors also introduce mechanisms to securely isolate between individual GPU contexts through protected memory regions and the ability to securely manage its own address space. The prototype was implemented using software changes that emulate the ideal hardware changes, and shows limited performance overhead, only 17 − 33% compared to that of native GPU execution. Most of the reported overheads come from data encryption traffic between CPU and GPU. Also, the TEE-aware GPU hardware can support a remote client to securely utilize GPU computation on an untrusted cloud even without trusted processors such as SGX. For instance, in Telekine [16], Hunt et al. proposed an end-to-end system that allows remote users to construct a secure communication channel to the TEE-enabled GPU using API remoting. In the system, the untrusted server that hosts the TEE-enabled GPU only plays the role of relaying communication. Careful considerations are also made to mitigate side-channels from the timing differences.
The approaches presented in these works indicate that the cooperation of the accelerators is an essential element in securing offloaded ML computation. The accelerator itself must be equipped with capabilities required for secure communication, including secure key storage, secure firmware that can perform self-integrity checking, bidirectional remote attestation, and symmetric key channel encryption support, etc. The accelerator's user can establish a secure communication channel between CPU and accelerators to thwart any eavesdropping attackers (AV-3-a). The architectures should also be carefully designed so that they do not incur side-channels from resource sharing (AV-3-b). Unfortunately, no commercial-grade accelerator architecture meets such requirements, despite the proposals from many academic works. We expect that practical accelerator hardware that is aware of confidential computing would be necessary for the widespread adoption of confidential ML computing.
Secure accelerator architectures retrofit existing accelerator designs (e.g., GPUs) for confidential computing. While they provide low overhead (17%-33%), only simulated research implementations exist since they require hardware modifications to proprietary GPUs.
Hardware modifications to the host system. HIX [90] secures GPU computations by introducing changes to the software and hardware stacks of the host system with a trusted processor. On the software side, the authors propose protecting the GPU driver within trusted enclaves. The key hardware modifications are added to the PCI-e interconnect and the memory management unit (MMU) to secure CPU-GPU communication against untrusted software. We observe that secure accelerator architectures such as Graviton [89] provide stronger security guarantees such as isolation among the GPU context inside GPU. Moreover, the communication between CPU and GPU in HIX is vulnerable to physical bus snooping, as traditional GPUs do not support secure communication channel establishment.
Secure I/O with weaker security guarantees. On the other hand, there are attempts to deploy enclaves inside virtual machines and let the hypervisor mediates the secure use of accelerators [91,93]. That is, they simply assume that the hypervisor is trustworthy and therefore capable of mediating secure connection between the enclaves and I/O devices. Although such methods do not require hardware changes to the accelerator or the host system, those approaches adopt a much weaker security model than that of SGX.
Approaches that introduce host-side modifications or hypervisor-based schemes for secure use of accelerators provide weaker security guarantees than secure accelerator architectures.

V. ARM TRUSTZONE AND COMPUTATION IN THE EDGE
While the majority of literature that proposes attacks and defenses on ML computations assume cloud computing scenarios where x86 processors are prevalent, ARM TrustZone has also been leveraged for confidential ML computations in mobile and edge devices [24,87,[116][117][118][119]. TZ has been employed for protecting ML models deployed on the edge and mobile devices for inference services [24,87,118]. Also, the federated learning scenario that we discuss in § VI-B, necessitate the TZ-based training in mobile devices.
A survey (SoK) paper [120] provides a comprehensive overview of the currently known security vulnerabilities on TZ and its applications. Besides vendor-specific and implementation-specific vulnerabilities, there are microarchitectural attacks on the confidentiality of TZ-protected computation.
Microarchitectural side-channels of TrustZone. In TZenabled CPUs, the cache lines are extended with a nonsecure (NS) bit that segregates cache usage of applications from the normal world and secure world. However, programs from the two worlds have equal rights to content for the cache lines, which creates cross-world side-channels exploited by several works in the literature. ARMageddon [121] introduces a cross-core side-channel attack vector based on cache coherence mechanisms between CPU cores to extract timing information from victim applications. On the other hand, using the cache contention side-channel between the normal and secure worlds, several works (e.g., TruSpy and TruSense) can extract sensitive information from TZ-protected computations [122,123]. Obtaining unprivileged timing sources is another challenge addressed by several attacks, as the performance counters, commonly employed for precise timing information, are often inaccessible from the userspace. Several works utilize alternative sources of timing (e.g., system calls and POSIX functions) to allows the side-channel attack to be performed by normal unprivileged applications [121][122][123]. Prime+Count [124] is a cache attack that assumes the adversaries can control applications from the secure world. The attack exploits the performance monitor unit (PMU) to build a covert channel that extorts data from the secure world to the normal world. Apart from cache side-channels, the shared usage of other microarchitectural features such as the branch target buffer (BTB) is also exploited to leak sensitive information from confidential computations [125]. Nevertheless, the implications of TrustZone side-channels on confidential ML computation are underexplored, and we discuss it further in § VIII-A for this reason.

VI. SATISFYING MULTI-PARTY SECURITY REQUIREMENTS
In this section, we present our study on the existing literature that sought to satisfy varying security requirements in different TEE-based multi-party computation scenarios. We found that there can be multiple security goals in a single scenario, depending on the point of view of different entities. Table 3 illustrates the scenarios that we identified as multi-party ML computation, classified using our definitions from Table 1, and the security requirements from different point of views (e.g., service user versus service provider). The term multi-party computation often refers to a subfield in cryptography. However, we use the term multi-party computation or multi-party ML computation to strictly refer to TEE-based approaches in this work.
ML computation scenarios where there are more than one party involved have been extensively explored in the existing works. In addition to protecting computations with TEEs from the untrusted cloud, the entities (E-1, E-2, and E-3) that contribute their data (P-1), ML model (P-2), or ML programs (P-3) are different parties and often mutually distrusting. As diverse stakeholders join and ownership of the assets are subdivided in the multi-party computation scenarios, the security requirements for confidential ML applications have higher standard than a simple offloading scenario. More participants lead to increased communication complexity, widened boundary of communication, and larger attack surface. In contrast, data owners and model owners (or the service providers) desire the privacy of their assets regardless of the situation.
Most TEE-based multi-party machine learning (MPML) schemes rely on the remote attestation capability of TEEenabled processors to bridge the trust gap between distrusting parties. Remote attestation allows remote parties to verify the integrity of code and data of a hosted enclave. Through remote attestation, each party can verify the code executing on the shared enclave before sending the protected asset to it. Ohrimenko et al. [12] are the first to propose a TEE-powered privacy-preserving MPML system, in which multiple data-contributing parties employ a trusted enclave hosted by a cloud provider to train a shared ML model while keeping their data secret from each other. The evaluation shows that the TEE-based approaches have a competitive execution time compared to cryptographic-based approaches for MPML.

A. COLLABORATIVE ML AND MULTI-PARTY MLAAS
1) Scenario S-1: Collaborative ML (multiple data contributors, one shared model) Collaborative ML involves mutually distrusting data contributors who seek to train a co-owned ML model without revealing their data [12,66]. All mutually distrusting parties (E-1-a) have identical security requirements on their data (P-1), that is, data is not leaked to other contributors and the untrusted computing platform. The scenario is often discussed along with the traditional MPC [61-64] that seeks  . Scenario S-1 Collaborative ML. Mutually distrusting Data contributors (E-1) contribute their data P-1 to collectively train a model that is to be shared among them. The TEE-based ML computation system must ensure the confidentiality of the data for each contributor.
to provide cryptographic primitives for computing a result of common interest without revealing the data from multiple contributors. While the motivations are vastly similar, we only discuss TEE-based implementations works that focus on ML computations. Trustworthy collaborative ML. Figure 2 illustrates the common approach to build a trustworthy collaborative ML scheme using TEEs. The same approach is used by Ohrimenko et al. [12]. The proposed systems allow the contributing parties to verify the ML model and program via remote attestation before sending data. The system's goal is to enforce trustworthy ML computation such that no party can learn of the other party's contributed data directly or indirectly, and only allow the final ML model to be visible to the contributors. The key idea is to allow the contributors to verify that only privacy-preserving ML training algorithms can be performed inside the enclave. The proposed system employs an enclave trusted by all data owners to perform machine learning operations on private data. Moreover, the authors provide data-oblivious machine learning algorithms as an additional feature to prevent memory access sidechannels. Later, Myelin [66] improves upon the system of [12] by applying differential privacy algorithms to the output of the trained model, enhancing the data privacy of the resulting model. TEE-based collaborative ML computation enforces only privacy-preserving algorithms that all contributors agree upon to be used.

2) Scenario S-2: Multi-party Machine Learning as a Service (Different data contributors and model owners)
We use the term multi-party machine learning as a service (MPMLaaS) to refer to multi-party computation scenarios in which the service provider provides a service based on the ML model and ML program, and allows the users to process their data using the service. Figure 3 illustrates the approach for satisfying the security requirements commonly used in the literature [13,14,37]. This scenario differs from S-1, in which the mutually distrusting data contributors had the same security requirements; Not only that the security requirements from the service provider and the service users differ, they are also in conflict [13,14]. Conflict in security requirements. We see that simultaneously satisfying the confidentiality of ML models (P-2) and programs (P-3) from the service provider and service  FIGURE 3. Scenario S-2 multi-party MLaaS. MLaaS service provider(E-2) owns the ML model (P-2) and ML program (P-3) to service users (E-1) who input their data (P-1). Since data confidentiality depends on the ML model and how an ML program trains or inferences using the data, a conflict in security requirement arises.
user data (P-1) results in a conflict [13,14]. Assume that the service provider does not reveal the contents of the ML model and ML program used in her SGX-protected service to the service users. The service user cannot inspect the program's logic and thus cannot be assured that the service would respect their privacy. It should be noted that the conventional SGX-based secure cloud computing methods only verifies the integrity and authenticity of the SGX enclave itself and the in-enclave program. Hence, several works have proposed methods to resolve the issue. Data flow control with SFI and containers. A number of works have leveraged Software Fault Isolation (SFI) to provide mitigation to the issue. The general approach is to employ SFI to SGX-based MPMLaaS or SaaS frameworks such that the data flow control is enforced according to a given policy, thereby ensuring user data confidentiality without revealing program code. Ryoan [14] creates SGXprotected sandbox instances using Native Client [56] to enforce the prevention of information leakage. Native Client provides SFI-based sandboxing that confines the interaction between the host and the client (e.g., through system calls) from inside the sandboxes to the host system. The technique has been adopted to complement SGX's guarantees with capabilities to isolate data processing modules, creating twoway sandboxes [13,14,37]. The later work, Chiron [13], bridges the conflict in confidential ML computation by employing Ryoan containers; it ensures that an untrusted program cannot leak data (from the data owner's (E-1) perspective) and maintain the confidentiality of ML at the same time. Finally, Perun [37] further generalizes MPMLaaS parties into stakeholders and employs a trusted security policy manager enclave to enforce the security policies of each stakeholder and to provision the secret between stakeholders' enclaves.
Applying SFI-based sandboxing for data leakage prevention can ensure ML program/model confidentiality and data protection simultaneously.

1) Scenario S-3: On-device ML
Deploying ML models to the edge or mobile devices has become commonplace progressively. For instance, realtime inferences may be required for autonomous vehicles. Smartphone applications today often include an ML model that performs on-device inferences. The key security requirements are (1) to protect the confidentiality of ML models (P-2) that can be the intellectual property of the service provider against curious or malicious device owner who attempts to reverse engineer the ML model and (2) to prevent leakage or abuse of on-device user data (P-1) by distributed ML service. The work of Sun et al. [126] performs a large-scale study on mobile applications that employ on-device ML. The authors point out that 41% of the analyzed ML apps do not protect the ML model, and 66% of ML apps that attempted to secure their ML model adopted insufficient protection. This particular work confirms the current state of confidential ML computation in the edge and mobile devices straightforwardly; demand for confidentiality in on-device ML is on the rise, and research opportunities lie on this.
Protecting ML models in mobile TEEs. Figure 4 illustrates the simplified approach for utilizing TEEs to protect ML models on edge devices, which is to encrypt the ML model and only store it in plaintext inside of TEE-protected memories. MLCapsule [38] is one of the preliminary works that allow a service provider to deploy an inference service on a user's device provides the same security guarantees and same levels of control over the model (by the model owner) as typical server-side execution. However, it uses SGX, a mostly cloud-based TEE, for the implementation and evaluation. The majority of confidential on-device ML researches leverages ARM TrustZone, a TEE commonly found in the edge. Notably, Offline Model Guard (OMG) [118] leverages SANCTUARY [45], an SGXlike isolated execution environment powered by TZ to protect the computation of ML computation on the user's device. The authors also show how it can securely obtain the user's data from peripheral devices such as microphones and sensors, with one of TZ's features. While most proposed works on on-device ML focus on deploying pre-trained ML for inference, PrivAI [73] allows the service users to personalize proprietary ML models (i.e., update the model with the user's data) by deploying the training process inside the TEE on the user's device. Finally, instead of protecting the model's confidentiality, DarkneTZ [24] hides only the sensitive layers of a DNN to defend against the membership inference attack. Apart from the aforementioned works, the remaining works on secure on-device ML address challenges incurred by the architectural limitations [116,117], or secure usage of GPU [87], which we already covered in the other sections ( § VII).
Mobile ML applications often employ insufficient protection for the model. TEEs are required to securely protect the ML model on the user's device.

2) Scenario S-4: Federated Learning
Federated learning (FL) is a distributed ML training method that exploits the parallelism of multiple machines to train a global model. It allows FL participants to train local models using their data, and collects only the participant's gradient updates. FL excludes the data collection step of traditional ML workflows, so it attains an advantage on privacy over others ML paradigms [71,127]. TEE can further fortify FL, which is already privacy-preserving to a certain degree without TEEs.
FL introduces a two-fold security requirement for each of the participating party. From the FL participant(client) perspective, the service provider can be dishonest and backchannel user data (P-1). The device owner, who performs training on his or her device, also wants confidentiality of the collected gradient from training. On the other hand, the service provider may want the deployed ML model/program to be protected (P-2, P-3) or trained without malicious alterations from service users and attackers with access to the device. Server-side secure gradient aggregation. Although only the gradient obtained by the clients is sent to the server in the FL paradigm, several works demonstrated that these updates might leak unintentional private data [131][132][133]. The secure aggregation algorithm proposed by Bonawitz et al. [134] solves this issue by leveraging secure MPC. However, secure aggregation cannot provide full coverage. For example, there is no guarantee that the server correctly implements the protocol. Multiple works [39,119,130] perform the gradient aggregation process inside a server-side TEE to protect the gradient of participants from adversaries. The systems use remote attestation to verify the authenticity of FL servers. Integrity of client-side computation. Other works employ client-side TEEs to ensure that client-side FL computation is executed correctly. A malicious FL client could impair the integrity of the global model by sending erroneous gradient updates to the FL server [135]. Zhang et al. proposed TrustFL [128] for client-side computation integrity. In their scenario, honest servers do not trust their clients (or data owners). TrustFL leverages the TEE of clients to identify the training processes and ensure the integrity of results. SEAR [129] employs TEEs to solve the Byzantine Generals Problem that arises in FL due to malicious participants. End-to-end protection. Figure 5 demonstrates a design for an FL system protected by TEEs on both the participants' devices and the server. A similar approach is employed by PPFL [119], which utilizes TEEs on both the mobile device and the server to conceal training and data aggregation. On the device side, the entire training process happens inside a TEE to prevent tampering. On the server-side, SGX protects the data aggregation program from the untrusted cloud service provider. Their work shows that the proposed system is robust to data reconstruction, property inference, and membership inference attacks.
Compared to on-device ML deployment, the necessity of a trusted gradient aggregator emerges in federated learning. Also, because of its decentralized nature, FL encounters a few known challenges of decentralization. Table 4 shows the summary of works that we discussed in this section. The table lists works that fall into our defined multi-party ML scenarios, the TEEs they utilized, protected assets (i.e., data, ML model, and ML program), and the computation performed (i.e., training or inference). Cloud-based ML deployment scenarios. S-1 and S-2 commonly seeks to shield the computation from the untrusted cloud infrastructure using server TEEs. Also, they both assume mutually distrusting service users and ensures the data confidentiality to each user [12][13][14]66]. That is, there are multiple parties that have the same security requirements in case of S-1.

Approach
Year TEE

S-3: On-device ML
Vannostrand et al. [116] 2019 TZ -eNNclave [117] 2020 TZ -Offline Model Guard [118] 2020 TZ -PrivAI [73] 2020 SGX -DarkneTZ [24] 2020 TZ = provides property; = partially provides property; = does not provide property; C = client-side TEE; S= server-side TEE; -Not applicable; * protected against membership inference; † In FL, the local gradients obtained from data are protected instead of data. S-2 differs from S-1 in that the party that owns the ML model and ML program that performs training or inferences, and the service users are also mutually distrusting. Hence, there exists two point of views (E-1 and E-2) with differing security requirements as shown in Table 3. For this reason, works such as Chiron [13] proposed SFI-based data flow assurence to resolve the conflict. Edge-based ML deployment scenarios. Both S-3 and S-4 involve deploying ML computations to the users' devices. In case of S-3, the service provider desires the confidentiality of the ML model deployed in user devices. On the other hand, since the inference using ML model is performed locally, user's data and inference results can be contained inside the device. Hence, Many existing works proposed TEE-based solution for secure on-device ML model protection and safeguarded inferences [24,38,73,87,[116][117][118].
For federated learning (S-4), the ML models are often not considered confidential as they are to be trained in the user devices. Instead, the service providers need to ensure correct execution (i.e., integrity) of the client-side training. On the other hand, the clients want confidentiality of the aggregated gradients from training.

VII. ENGINEERING CHALLENGES IN BUILDING ML COMPUTATIONS INSIDE TEE
In this section, we discuss the engineering challenges in building systems for confidential ML computation. Besides security issues, retrofitting existing software and hardware to achieve confidential ML has also been an important issue in the field of systems security.

A. MEMORY LIMITATIONS OF TEES
Limited memory capacity in SGX. Various works that apply confidential ML to the cloud struggles with the inherent memory limitation in SGX. As the size of data used for training is ever-growing, the memory capacity limitation of SGX may hinder the adoption of the technology. Hence, many works proposed optimizations that allow dataintensive computations to run in SGX enclaves.
The current version of Intel SGX (v1.0) has a hard limit of 96MB (excluding the 32MB reserved memory) [100]. This memory limitation also leads to additional performance overhead from swapping memory pages from EPC memory to non-EPC memory [136]. Due to this reason, several works evaluate their approach with the memory limitation [20]. Other works employ the simulation mode of SGX for evalu- VOLUME 4, 2016 ation, with the hope that future versions of SGX will support larger memory size [95]. Currently, some cloud providers provide virtual machines with Intel SGX 2.0 that allow up to 1TB of EPC memory [41,137]. However, processors with Intel SGX 2.0 have yet to be widely available [138], and their mechanisms are still unclear. We expect that the elimination of the memory limit would likely solve many of the technical challenges that we mention in this section. Memory capacity in ARM TrustZone. ARM TrustZone, a TEE commonly deployed for mobile and edge ML computations, also suffers from memory limit issues. While Trustzone does not have a hardware-set limit, the amount of memory has to be decided for the normal world and secure world (Trustzone-side) during boot and cannot be changed afterward. Since most of the applications run in the normal world, and normal world performance directly impacts user experience, the secure world tends to have a small default memory limit. Splitting ML workload into batches. As the sizes of ML models often far exceed the memory capacity of most TEEs, the most straightforward way to overcome the memory limitation is to split the ML models into smaller partitions and process them one by one inside enclaves. The partitioning of most DNN models can be done without much effort, as DNN models are often subdivided into multiple layers. The remaining challenges are how to partition the models and what partitioning schemes offer the most security benefits. To this end, Vannostrand et al. [116] experimented with several partitioning schemes, namely layer-based partitioning, sub-layer partitioning, and branched partitioning. The authors found that each partitioning scheme is useful depending on the model size. Selective layer protection. Several works aim to identify the minimal set of sensitive layers, e.g., layers that contain information of the input [24,40,87,117], and only protect those layers inside secure enclaves. Notably, Gu et al. [40] proposed a ternary partitioning scheme, where the ML model is partitioned into a sensitive section that consists of the input layers and output layers and the remaining nonsensitive layers. Occlumency [23], on the other hand, assumes that the DNN model does not require confidentiality and proposes techniques to load parts of the model into the enclave secure memory from insecure memory during inference. On edge, Mo et al. [24] propose protecting only the last layer of a DNN inside TruztZone to thwart whitebox membership inference attacks (MIA) effectively.
Moreover, a security problem arises from memory limitation. Malicious entities could observe the access pattern when data is moved in and out of an enclave's secure memory for computation. Previous works have shown that access patterns alone could leak sensitive information, even when the memory content is encrypted [139]. To solve the access pattern leakage, applying an oblivious computing algorithm has been suggested to guarantee the safeness of the model. Several works adapted ORAM schemes [140][141][142] or secure hardware storage [143] to extend the memory limitation and eliminate the access pattern side-channel. Trustore [143] is a proposal that utilizes a PCI-e FPGA device to provide additional storage for SGX enclaves that also provides oblivious memory guarantees. Optimizing memory usage of ML model. Several works reduce the ML model size through the use of TensorFlow Lite, a framework designed for a mobile device with limited capability [26,101]. The framework applies model reduction through quantization to reduce the memory footprint of ML models. On the other hand, other works target the inefficiency of memory usage by enclave applications. In particular, Vessels [144] point out that the default scheme for memory usage in SGX employs a large allocation size and has low memory reusability. Based on the observation, the authors propose mechanisms to optimize the memory usage in ML computation that reduce up to 90% of the overall memory footprint while also improving the execution time.
Partitioning ML models and optimizing the memory usage helps partially overcome the memory limitation of TEEs.

B. PERFORMANCE OVERHEAD
TEE Transition overhead. Use of TEEs requires additional performance overhead. A common source of performance in TEEs is the cost of transition. Just as the transition between kernel mode and user mode, the transition between the untrusted domain and TEE also adds overhead. However, the transition between the untrusted code and enclave is rather costly, as explored in the works that profiled the performance characteristics of Intel SGX [100]. As [145] provides, the number of cycles consumed for entering an enclave (OCALL) exiting (ECALL) is around 35 times that of an average system call. Generally, the frequency of the transition between the untrusted domain and TEE domain is directly proportional to performance overhead. Coping with limited memory. TEEs often have memory limitations, as mentioned above, and workarounds proposed in previous works (e.g., [23,40,88]) introduce additional overhead. Since many such strategies for coping with limited memory involve splitting the workload into smaller batches, they also increase the transition overhead.
The memory limitation of SGX is one of the predominant sources of performance overhead in SGX, especially for ML. Also, ML applications in SGX should minimize the number of TEE state transitioning.

C. PORTING ML FRAMEWORKS IN TEES
Cost of porting ML programs to TEE. Porting complex programs such as the ML frameworks (e.g., Tensorflow [146], PyTorch [147]) to use SGX is a daunting task for developers. The SGX SDK offers a set of build tools and APIs in low-level languages (C/C++). Many ML frameworks are complex and include components that are in higher-level languages (e.g., Python).
Deploying ML frameworks inside TEEs. Due to the high cost of porting complex ML frameworks, many existing works have proposed using software containers such as Docker [55] and Library OSes (LibOSes) [50,52,95,138] as alternative ways to deploy programs to cloud enclaves. Software containers virtualize all interaction of a program with the containing system (e.g., an SGX enclave), allowing the code to be portable across platforms. Such approaches allow ML frameworks to run in TEEs without much modification to their codebase. SCONE [51] is a Linux container system that protects Docker containers [55] with SGX enclaves. Later, TensorSCONE [26] introduces TensorFlow to SCONE containers, while secureTF [15] introduces a secure distributed machine learning framework built upon TensorFlow and SCONE. Library OSes [50,52,95,138] are types of containers that include all dependencies to a program, including an OS kernel emulation layer and modified standard libraries. LibOSes allows rapid and convenient deployment of programs, but the trusted code base inside an enclave significantly increases. The availability of a secure version of ML frameworks that are widely used allows cloud users to deploy secure ML computation on the cloud with ease.
Software containers allow unmodified ML programs to run inside SGX but significantly increases the trusted codebase.

VIII. FUTURE DIRECTIONS
Through our survey, we found issues that are underexplored or cannot be resolved with the discoveries in the existing works. We provide our outlook on the possible future directions in this section.

A. CONFIDENTIAL ML IN EDGE AND END-POINT DEVICES
A predominant number of works that we reviewed contributed to SGX-based cloud ML computations. On the other hand, we observed that the number of works focused on ML computations using ARM TrustZone is relatively scarce. However, on-device ML computations have become more common [126], and works that explore confidential ML in the edge and end-point are on the rise. Consequentially, a few works have discussed the security issues exist in the scenario (e.g., [24,117,118]). Moreover, although the security vulnerabilities and side-channels of TrustZone are wellstudied (systemized by Cerdeira et al. [120]), little is known about their implications on confidential ML computations. We expect that more in-depth exploration of the issues in deploying confidential ML computations to the edge will follow in the future.

B. SECURE ACCELERATOR ARCHITECTURES
An accelerator such as GPU plays a crucial role in achieving high performance in ML training and inference. However, there is no commercial-grade accelerator architecture cur-rently available. All SGX-based confidential ML computation papers that we reviewed assume CPU-only computation for this reason. Graviton [89] proposes a GPU architecture that can ensure the confidentiality of the workload offloaded from Intel SGX. However, Graviton design includes hardware modifications to the GPU, which is not feasible due to proprietary architecture and firmware. Hence, the authors emulate hardware modifications to evaluate their design. SafeTPU [148] is another proposed secure accelerator architecture but focuses on verifiable computation results and does not provide confidentiality. We expect that research towards secure acceleration of ML computations will be on the rise. For instance, various accelerators (e.g., neural processing units (NPUs)) designed with confidential ML computing considerations will be of great contribution to the field.

C. CONFIDENTIAL ON-LINE TRAINING.
Most works in secure on-device ML deployment only discuss the ML model's confidentiality during inference. We found that secure updates (e.g., personalizing pre-trained ML models) are rarely discussed in the existing works. Personalization of ML models can be useful in several reallife scenarios. For instance, ML models for self-driving cars would need to be adapted to the users' surrounding environment for better performance. We expect that designing a system that has to support inference and training at the same time will face extended attack surfaces and unique problems.

D. COMPREHENSIVE APPROACH TO MODEL CONFIDENTIALITY
This survey covered the existing works that discovered attacks on model confidentiality through in-system attacks, such as side-channels in the untrusted cloud. However, ML model confidentiality can also be undermined through API attacks that reconstruct the model based on the repeated inference results [32,149]. The attacker in the untrusted cloud who can launch side-channels attacks on an ML model in service can also make inferences to the service. However, the ramifications of such an attack that combines in-system attack vectors and API attacks have not been explored. We expect a comprehensive approach to model confidentiality that combinest the existing works that discuss threats in an ad-hoc manner could be of great value as future work.

IX. CONCLUSION
This survey provides a comprehensive study of the security and engineering challenges in implementing various types of confidential ML computation. We systematically summarized and categorized the existing works on advancing confidential ML computation in several aspects; we discussed known in-system attack vectors and proposed mitigations, solutions for each multi-party ML scenario, and engineering challenges in implementing secure and confidential ML computation. Lastly, we presented our view on research VOLUME 4, 2016 opportunities based on the literature that we reviewed. We hope that our survey can serve as a comprehensive overview to systems security researchers and industries who seek to build confidential ML computation systems. SIWON HUH received the B.S. degree in mathematics and computer science and engineering from Sungkyunkwan University in 2021. He is currently pursuing a master's degree in computer science and engineering, Sungkyunkwan University, Suwon, South Korea. His research interests include blockchain identity management and artificial intelligence security.