A Survey and Guideline on Privacy Enhancing Technologies for Collaborative Machine Learning

As machine learning and artificial intelligence (ML/AI) are becoming more popular and advanced, there is a wish to turn sensitive data into valuable information via ML/AI techniques revealing only data that is allowed by concerned parties or without revealing any information about the data to third parties. Collaborative ML approaches like federated learning (FL) help tackle these needs and concerns, bringing a way to use sensitive data without disclosing critically sensitive features of that data. In this paper, we provide a detailed analysis of state of the art for collaborative ML approaches from a privacy perspective. A detailed threat model and security and privacy considerations are given for each collaborative method. We deeply analyze Privacy Enhancing Technologies (PETs), covering secure multi-party computation (SMPC), homomorphic encryption (HE), differential privacy (DP), and confidential computing (CC) in the context of collaborative ML. We introduce a guideline on the selection of the privacy preserving technologies for collaborative ML and privacy practitioners. This study constitutes the first survey to provide an in-depth focus on collaborative ML requirements and constraints for privacy solutions while also providing guidelines on the selection of PETs.

The rest of the paper continues as follows. First, some 103 background information about privacy objectives for ML is 104 provided in section II. Then related survey works are given 105 in section III. Section IV covers different collaborative ML 106 techniques. Potential privacy and security and attacks on 107 collaborative ML techniques and threat models are provided 108 in section V. PETs and current applications to prevent pri-109 vacy attacks in collaborative ML are presented in section VI. 110 In section VII, we propose a methodology on how PETs 111 should be chosen under different constraints and require-112 ments. Section VIII provides a discussion on open issues and 113 future directions. The paper concludes in section IX. 114 115 ML itself is about understanding data and benefiting from 116 it. When we say data, we refer to a broad variety of data 117 sources, including (but not necessarily limited to) application 118 specific data, telemetry data, behavioral data or personally 119 identifiable information (PII). In the ML context, data can 120 be used during training and inference phases. Data content 121 may include private indicators explicitly like PII that require 122 special care. Even if the data does not include PII, it may 123 be possible to extract sensitive information from the data 124 if it includes potentially linkable or interpretable auxiliary 125 information. Studies [6], [7], [8] unveil that model parameters 126 are as important as data since they may leak information 127 about the processed data. Therefore, where the data is kept 128 and used is as important as its content. During the ML life 129 cycle, which comprises training, model creation, and model 130 inference, privacy issues may exist depending on the threat 131 and data ownership model. We generalize privacy concerns 132 into the following three categories: 133 1) PRIVACY FOR TRAINING DATA 134 Privacy breaches for training data may occur via unauthorized 135 direct access to the data. Training data exposure may also 136 happen in Machine Learning as a Service (MLaaS) setting, 137 where the model can be trained on the cloud. Monitoring by 138 the data owner may not be possible if the cloud service does 139 not provide any auditing mechanism on how and where the 140 data is processed and stored during training and how it is 141 erased after the training. In addition to protecting training data 142 from unauthorized access, practitioners may want to protect 143 the data against third parties who utilize the data to create an 144 ML model. 145

2) PRIVACY FOR MODEL INFERENCE
146 Even though the data or models are isolated from the adver-147 sary, vulnerabilities remain during the inference phase. In this 148 phase, there can be two different privacy concerns: 1) protec-149 tion of the model parameters, and 2) protection of query data. 150 In the first case, the adversary may collect the query/result 151 pairs and try to recover the model parameters. The transfer-152 ability property of many models and memory of deep neural 153 networks (DNNs) enables this kind of attack [6]. For the 154 second type of privacy threat, the user of the model, who 155 sends queries to the model, may want to learn query result 156 without revealing any information about the query itself.
proposal, anonymizing the sender of the model updates is 210 enough to mitigate privacy attacks while enabling possible 211 integration of solutions against security attacks. Since the 212 sender will be known by neither the server nor the other 213 clients, there will be no privacy threat and since there will 214 be no encryption requirement for the updates sent by the 215 clients to the server, the server can analyze the updates to 216 prevent security attacks. To hide the identity of the sender 217 client, the survey proposes a peer-to-peer privacy-enhanced 218 data forwarding solution. 219 Boulemtafes et al. [12] provided a multi-level taxonomy 220 from a deep learning perspective. The study considered the 221 related works by dividing them into three ML phases; learn-222 ing, inference, and release. For each phase, they investigated 223 existing works with a different multi-level classification 224 providing performance metrics. For the learning phase 225 taxonomy, they presented works both for collaborative and 226 traditional ML settings. Then each setting was analyzed in 227 two categories as server-based and server-assisted. In the 228 server-based category, all training is done on servers while 229 in the server-assisted setting the training is performed collab-230 oratively like in FL. In the inference phase, server-based and 231 server-assisted PET studies were given. However, collabora-232 tive and traditional ML settings were not considered in this 233 phase. For the model release phase, only differential privacy 234 techniques were considered. The usage of confidential com-235 puting technologies is not considered in the study and privacy 236 preserving collaborative ML studies were covered only under 237 the learning phase, not under the inference and the release 238 phases. 239 Li et al. [13] provided a taxonomy for federated learning, 240 which classifies FL considering data distribution, machine 241 learning model, privacy mechanism, communication archi-242 tecture, the scale of federation, and motivation of federation. 243 The authors analyzed mostly FL system building blocks and 244 presented a comparison with conventional federated database 245 and cloud systems. Existing studies were summarized with 246 different aspects without dealing with the issue of privacy in 247 detail. A design guideline was provided for FL considering 248 effectiveness, efficiency, privacy, and autonomy as the design 249 factors. Although their perspective resembles ours, our guide-250 line focuses more deeply on privacy aspects. 251 Yin et al. [14] provided a 5W-based (who, what, when, 252 where, why) taxonomy of privacy leakages in FL. The authors 253 presented state of the art privacy preserving solutions cov-254 ering HE, SMPC, DP, and other perturbation techniques. 255 Solutions using trusted execution environment are not consid-256 ered. Our study differentiates from that study by considering 257 other collaborative ML techniques, including trusted execu-258 tion environments as a privacy enhancing tool, comparing the 259 PETs and providing a guideline.   However, these techniques do not provide as strong of privacy 297 guarantees as PETs do. 298 Previous surveys consider FL challenges and issues with 299 different aspects providing privacy perspective at some level. 300 In comparison with the previous works, our study aims to 301 investigate all aspects of privacy attacks and solutions cover-302 ing broader set of PETs for not only FL but also decentralized 303 learning and split learning. In addition, we aim to support 304 practitioners by giving a guideline on selecting the most 305 appropriate privacy enhancing technologies based on their 306 needs and the collaborative setting. 308 In recent years, collaborative utilization of the distributed 309 data owned by different data owners is in great demand since 310 data is distributed among different entities. Depending on the 311 application scenarios, collaborative learning can be generally 312 classified as cross-device and cross-silo learning. In cross-313 device learning, the clients are mobile or IoT devices having 314 limited computing power and possibly unreliable communi-315 cation. In contrast, the clients in cross-silo learning settings 316 are a small number of organizations (e.g., medical) with reli-317 able communications. It is important to understand the core 318 challenges of the collaborative ML settings such as communi-319 cation, computation, cost and privacy requirements to design 320 efficient models. In this section, we provide background 321 information on collaborative ML models. We consider Fed-322 erated Learning, Decentralized Learning, and Split Learning 323 as collaboration methods. There can be other methods where 324 parties can collaborate, but for generalization and common 325 understanding, we limit the study to these three methods. 326 Centralized learning may also be considered a collaboration 327 technique since the data is shared by end devices to contribute 328 to the global model. On the other hand, there is no decoupling 329 during the model training in this setting. For the sake of 330 completeness, we provide a definition of this model, but it 331 is not considered in the privacy discussions for the rest of the 332 paper.

334
In centralized learning, illustrated in Figure 1, each client 335 transfers its own data to the server, where data is aggregated, 336 and then the model training is performed centrally. As a result, 337 one single model is produced, which can be made available 338 to the clients via either sending model to the clients or enable 339 97498 VOLUME 10,2022 inference over the server. In this kind of setting, the server 340 usually has more computing power than clients, e.g., the 341 server is located in the cloud or has access to larger computing 342 resources. Although this gives computation flexibility, it may 343 not be preferred when communication cost is a concern as 344 all data is transferred to the server. Additionally, data transfer 345 may not be allowed due to user privacy or legislative reasons.  Only the server is allowed to update the parameters in the 358 global model. One example case is Google Keyboard [25] 359 which includes features like text auto-correction, word com-360 pletion, and next word or emoji prediction. Without collecting 361 sensitive raw data from users, the models are trained col-362 laboratively using FL. The result computed by the server is 363 returned to each user. A principal advantage of this approach 364 is that there is no need to directly access raw training data for 365 model training.

366
FL can be classified as horizontal FL (HFL), vertical FL 367 (VFL), and federated transfer learning (FTL) with respect 368 to how data is distributed over a sample or feature spaces 369 among clients [26]. In HFL, the samples are different for 370 each data owner, but they share the same feature space. For 371 this scenario, a server can aggregate the information from 372 different data owners. In VFL, there is a large overlap in 373 the sample spaces among multiple clients, but the feature 374 spaces are different. A variety of secure models are proposed 375 for VFL, including association rule mining, decision tree, 376 and Naïve Bayes classifier. In [27], the authors propose a 377 secure machine learning where data is partitioned in fea-378 ture space. FL can be implemented using the most popular 379 ML algorithms such as neural networks, decision tree, and 380 linear/logistic regression where data is horizontally parti-381 tioned [13]. In the case of vertical FL, a more complex mecha-382 nism to decompose the loss function at each party is required 383 [28], [29]. Being in the same feature space and having the 384 same distribution are key assumptions for training and testing 385 data in many machine learning algorithms. However, in many 386 real-world applications, we may have two or more domains 387 of interest where users' data have different feature spaces 388 and follow a different data distribution. FTL is a new learn-389 ing framework emerging in recent years where the data has 390 little overlap over the sample and feature space. Benefiting 391 from transfer learning [30] where knowledge (features and 392 weights) from previously trained models is used for training 393 newer models, FTL brings solutions for both the sample and 394 feature space. Using FTL, the complementary knowledge is 395 transferred across domains in a federation; thus, flexible and 396 effective models can be built for the target domain using the 397 information from the other source domains.

398
FL approaches may differ in optimization strategies that 399 are performed during training cycles. The most preferred 400 optimization method is SGD. FL approaches can use SGD 401 in different settings. Federated SGD (FedSGD) and Feder-402 ated Averaging (FedAVG) are the two algorithms that are 403 widely adopted in the FL implementations [22]. FedSGD 404 implements a single batch gradient calculation on the local 405 model. In each round, clients perform the gradient calculation 406 for a single batch and send the result to the server. Then the 407 server aggregates the gradient updates from each client then 408 applies the update. However, this requires a large number of 409 rounds to achieve the desired convergence. FedAVG suggests 410 an improvement to reduce the number of rounds by increas-411 ing the number of gradient calculations on the local model. 412 In FedAVG, each client performs the iteration multiple times 413 (called epoch number) on the gradient. Then, the locally 414 updated weights are sent to the server. 415 VOLUME 10, 2022  Figure 4. In this way, without sharing raw data, a round of 454 forward propagation is being completed. Then, at the server, 455 the gradients are back propagated until the cut layer. The 456 gradients at the cut layer are sent back to clients. Until the 457 learning task is converged, this process is continued. In split 458 learning, the activations and gradients are communicated 459 just from the split layer, unlike other methods where the 460 parameters resulting from local training tasks are shared. 461 There is a difference between split learning and federated 462 learning regarding computation overload on the clients and 463 how quickly split learning converges. Also, in terms of com-464 munication bandwidth, when the number of clients or the 465 model size is large, split learning is more communication 466 efficient. If the training data size is large but the number of 467 clients and model size is small, FL is more communication 468 efficient [35]. Comparied to FL, SL provides better model privacy 470 because the ML model is split between the clients and the 471 server, which is useful for two reasons [35]. It offers model 472 privacy since the users, and the server has no access to each 473 other's model. Also, the processing workload at the client 474 side can be significantly reduced by assigning computation 475 of only a small part of the network to the clients considering 476 the clients' capacity, which makes this method more suitable 477 for resource constrained devices. In terms of speed, because 478 of the sequential nature of ML model training across the in 479 SL, it is significantly slower than FL.

480
However, there remain questions regarding whether pri-481 vacy leakage exists stemming from the cut layer activation 482 information sent by the client to the server during the train-483 ing process as the activation information may leak a piece 484 of information about the training data. In [36], the authors 485 showed that it is possible to reconstruct the raw data from 486 the activation values in the intermediate split layer, which 487 are passed to the server. In their threat model, the server is 488 honest-but-curious and tries to reconstruct the raw data from 489 the activated vector of the cut layer.

491
A brief comparison of different key aspects of collaborative 492 ML approaches is given in Table 2. Most of the collaborative 493 97500 VOLUME 10, 2022   It is not realistic to assume that on each client, the local 508 data is always IID.

509
• Unbalanced data size: The local training data size may 510 vary for each client, e.g., the usage of the mobile service 511 or application by some users may be much heavier than   is increased and is highly scalable with respect to the num-541 ber of model parameters. On the other hand, FL is more 542 communication efficient when the number of data samples 543 are increased but the number of clients and model size are 544 small [39]. SL outperforms FL in terms of accuracy and 545 requires lower computation resources per client. For setup 546 with a large number of clients, SL requires lower computation 547 bandwidth per client in comparison to FL [34]. Decentralized 548 learning does not need a central authority to be trusted as 549 federated learning does. Additionally, DecL is more resilient 550 than FL because there is no single point of failure as in FL. 551 However, DecL is generally slower to converge compared to 552 FL [40].

555
In this section, we first give a brief introduction to our threat 556 model in a collaborative ML setting and focus on each col-557 laborative ML model. Then, we cover security attacks in ML 558 to make this work more comprehensive. Finally, we explain 559 existing privacy attacks in machine learning.

561
This section examines potential threats in collaborative 562 machine learning, which enable us to understand the attack 563 model and construct mechanisms to defend the ML process 564 against attacks from a privacy perspective [41]. For collabora-565 tive ML, we can consider training dataset, the model parame-566 ters, and hyper-parameters as assets which are sensitive and at 567 risk of attacks. The data owners, the model owner, the model 568 consumers, and the adversary are those actors which are 569 specified in our threat model. The data owners own the data 570 and are not willing to share the data due to security reasons or 571 privacy considerations. The model owner who does not nec-572 essarily own the data, creates an ML model from the training 573 data using data mining and machine learning techniques. The 574 model owner is unwilling to share its model with other parties 575 to prevent model inversion attacks and does not want to create 576 a poisoned model because of the attacks from malicious data 577 owners. The model consumers are those who use the service 578 provided by the model owner through some programming 579 or user interface. The adversary, as a usual consumer, may 580 access the interfaces and access all communication between 581 the parties. Additionally, the adversary may have a priori 582 knowledge of data or models.

583
Adversarial knowledge is a key factor in determining vari-584 able attack surfaces against ML models. The adversary might 585 VOLUME 10, 2022 have limited, partial, or full knowledge of model architecture, 586 hyper-parameters, or training setup. From the dataset point 587 of view, in majority of the works, it is assumed that the 588 adversary has only some knowledge about the data distri-589 bution but not the training data samples. Attacks can be 590 classified as black-box, white-box, and partial white-box with 591 respect to the knowledge of the adversary [42], [43]. In black-592 box attacks, the adversary does not know about the model 593 parameters, architecture, or training data [44], [45]. MLaaS,  The adversary cannot access to the model parameters as in 603 the white-box attacks. 604 We consider two main types of possible adversarial behav-605 ior known as honest-but-curious and malicious [46]. The    [48]. Alternatively, the malicious server 644 can inspect clients' update and tamper training process by 645 modifying each client's view on the global model and extract 646 more information about training data.

647
However, the honest-but-curious (white evil) server only 648 observes the updates and wants to gain information about 649 clients' local data. In addition, an honest-but-curious client 650 can only observe global updates and launch an attack to gain 651 knowledge about other clients' local data. In Figure 6, where we describe decentralized learning 653 attack model, there is no aggregator server, and all nodes 654 are going to send updates to each other. A malicious node 655 observes updates from all other participants and can modify 656 its own parameters to tamper with the training process and 657 gain knowledge about other nodes' data. However, an honest-658 but-curious node can only observe the updates and the global 659 model to launch the attack to extract information about train-660 ing data. In Figure 7, the attack model against split learning is 662 depicted. The honest-but-curious server follows the opera-663 tions as specified, and it wants to gain information about 664 the raw data stored on the client. The server has access to 665 the activated vector of the cut layer sent by the participants 666 during the forward propagation. It aims to reconstruct the raw 667 data of the clients' data. Alternatively, an honest-but-curious 668 node has access to the gradients sent by the server during the  or labels of the data. The adversary may also want to modify 693 the output labels or input features of training data or also 694 may want to alter the ML model directly by tampering with 695 the ML algorithm process. Poisoning attacks enable adver-696 saries to insert backdoors or trojans to the model either at 697 training time or after initial model training [49]. For example, 698 Gu et al. [50] inserted stop sign images with a special sticker 699 (the backdoor trigger) into the training set and labeled them 700 as speed limit signs. In this way, a backdoor in a street sign 701 classifier is generated where the common street signs are 702 classified properly, but the stop sign possessing the backdoor 703 trigger are incorrectly classified as speed limit signs. Thus, 704 simply by placing a sticker on any stop sign, the adversary can 705 trick the model to classify it as a speed limit causing potential 706 accidents in self-driving cars.

707
The invisibility of model updates generated by each client 708 leads FL to be vulnerable to model-poisoning attacks. In order 709 to add backdoors to the joint model, a malicious client can use 710 model replacement. The adversary can act as a single client 711 or by colluding with multiple clients to modify a classifier to 712 assign desired labels [51]. In FL, user-level differential pri-713 vacy can be used to defend against targeted poisoning attacks. 714 In [52], the authors implement the sybil attack, a model 715 poisoning attack on differential privacy based federated learn-716 ing and explore some protection mechanisms. The adversary 717 arranges manipulation of model updates by creating several 718 fake clients or colluding compromised clients. 719 2) EVASION ATTACKS 720 These attacks occur during the inference/testing phase. The 721 adversary aims to perturb the input samples at inference/test 722 time to ML classifier to cause a misclassification. For exam-723 ple, the adversary can change some pixels in the image of 724 the 'stop' sign, which causes it to be predicted as a 'Speed 725 Limit' sign by the classification model. In [53], the authors 726 proposed data transformations, including dimensional reduc-727 tion as a defense mechanism against evasion attacks. They 728 demonstrate that the adversarial success rates are reduced at 729 a fixed budget but are not completely solved.

731
Privacy concerns in machine learning may arise in many cases 732 such as sharing a public dataset, participating in a training 733 procedure using sensitive data to generate a model, sharing 734 the learned model publicly, and sharing query results with 735 the end user. In all such cases, the privacy of an individual's 736 data or a service provider's model is at risk. In the following, 737 we explain such attacks which can arise both at training or 738 testing phase of machine learning. These kinds of attacks can be executed by one or more collab-741 orating parties who want to learn sensitive information about 742 other parties' data. For example, in the federated learning 743 scenario, the server sends the updated model to the data own-744 ers in each iteration. The difference between two consecutive 745 models sent by the server can be used by semi-honest or 746 VOLUME 10, 2022 malicious data owners to recover some information about the 747 private inputs of other data owners. Alternatively, the server 748 can be honest-but-curious and tries to learn some extra infor-749 mation about the private inputs, for example, by linking the 750 model updates of each client. There are solutions to mitigate 751 such attacks, like secure aggregation and making FL server 752 oblivious to hide the data owners' identities [54].  Nasr et al. [48] showed that the membership inference 782 attack is more effective than the black-box one when data 783 from the training dataset is accessed by the adversary. In the 784 honest-but-curious scenario, in order to differentiate between 785 members and non-members, the model parameters and gra-786 dients are used as an input to train another model. In the 787 malicious case, instead of gradient decent, the adversary can 788 perform gradient ascent by altering the gradient updates for 789 the data whose membership is questionable. If the data is used  These attacks, also known as exploratory attacks, are oracle 802 attacks where the goal is to obtain parameters or structure of 803 the model by inspecting the model's predictions, including 804 the probabilities returned from each class. An adversary with 805 access to prediction API or model outputs tries to rebuild a 806 surrogate model that approximately matches the target model. 807 This attack can be implemented by querying the prediction 808 API and learning the predictions for the input feature vectors. 809 When there is no constraint in the number of queries and the 810 queries themselves, the adversary can construct a model sim-811 ilar to the target model by querying many times and using the 812 inference results as training inputs. One of the trivial solutions 813 to prevent such attacks is to limit the number of queries from 814 users. The most obvious countermeasures for ML services 815 are to remove the confidence values and only output the class 816 labels [6]. Embedding a watermark can be another approach 817 for IP (Intellectual property) protection of ML models to 818 determine that the model is stolen. Watermarking is regarded 819 as selecting a set of inputs (i.e., a trigger set) that are labeled 820 randomly and can be used by a legitimate model owner along 821 with normal training data to generate a watermarked model. 822 To demonstrate ownership of the model, the surrogate model 823 is queried with a trigger set. If incorrect labels are matched 824 with enough predictions, it can be concluded that the model 825 has been stolen. In this way, a legitimate model owner can 826 detect misuse of their models [57]. In addition, in [58] the 827 authors propose an active defense that perturbs prediction to 828 poison adversary's training objectives. They claimed that the 829 accuracy of the adversary could be decreased by up to 65%, 830 where the defender accuracy is not affected significantly. 831

832
A model inversion attack is another oracle attack type that 833 uses a priori information about the model and auxiliary data 834 to explore training data or other sensitive data. The inferred 835 information enables the adversary to reconstruct the data 836 sample used to train the model, which may violate the privacy 837 of an individual whose personal information is included in the 838 data [41]. 839

840
Property inference is the ability to infer properties other 841 than those explicitly encoded as features where the model 842 producer did not intend to share. Inferring the fraction of the 843 data that comes from a certain class, for example in a patient 844 dataset the aim of attacks is to infer the fraction of men and 845 women when such information was not an encoded attribute. 846 The information was learned unintentionally from the model, 847 which is not related to the training task [19]. In collaborative 848 settings, one of the clients can be maliciously trying to infer 849 the uncorrelated features of the dataset, e.g., if the model is 850 for gender classifier, adversary tries to infer the facial id of 851 the picture. The attack can be performed passively or actively. 852 In the passive setting, the adversary saves the snapshots of the 853 joint model at different rounds reflecting aggregated gradi-854 ents. Then, the adversary calculates the difference between 855 these gradients and tries to infer information based on the 856 assumption that the gradient updates can leak the features 857 of the input data learned by the model to predict the output. 858 In the active setting, the adversary attaches an extra classi-859 fier for inference e.g., for facial id inference and crafts the The given formulation is called ( , δ)-differential privacy 907 where δ is the relaxation parameter. If δ is neglected, then 908 -differential privacy provides stronger privacy guarantees. 909 is the control parameter for privacy level, denoting privacy 910 budget. There are other approaches like Rényi differential 911 privacy (RDP), that is, an algorithm is (α, )-RDP if the Rényi 912 divergence of order α between any two adjacent databases is 913 no more than . RDP can be preferred for its simple privacy 914 budget accounting. A more detailed mathematical foundation 915 can be found in [ Figure 2, which uses central trust model, the FL 925 server collects the client updates in cleartext. Central DP 926 applications employ the same trust model but add noise on 927 the server. In this way, the server sends the privacy protected 928 model parameters. After receiving from the server, clients 929 perform local training, clip their updates to bound their contri-930 bution, and then send the clipped updates to the server. Server 931 aggregates the updates and again adds noise proportional to 932 sensitivity. In each round of FL this process is repeated.

933
Three mechanisms can be used for noise sampling in DP, 934 Laplace [66], Gaussian, and the exponential [61] mechanism. 935 Gaussian is the most widely used mechanism in Central DP 936 and defined as:

937
For a query function f:D−→R a randomized algorithm M 938 where N indicates the noise from Gaussian distribution with 941 the standard deviation of σ calculated from sensitivity of 942 the f.

943
Central DP requires clients to trust the server as the server 944 controls the aggregation and DP mechanism. If this is not 945 the case, Central DP can be improved from the privacy point 946 of view by adding noise on client side so that clients don't 947 have to trust the server. This model is called Local DP. shuffled and sent to the server. In shuffling methods, privacy 1022 can be adjusted with the use of moderate to enable the 1023 protocol with far smaller error than using only Local DP, 1024 while not solely relying on the trusted server model.

1025
The preference and trade-off between CDP and LDP come 1026 from the trust model of the deployments. CDP cannot provide 1027 privacy protection in cases of the malicious server model. 1028 Although LDP does protect the clients from the malicious 1029 server, it reduces the accuracy of the model. Moreover, the 1030 malicious colliding client model is not taken into account in 1031 the DP itself. Bringing other privacy enhancing techniques 1032 with a hybrid solution would be an alternative way. Without 1033 sacrificing accuracy, it is still possible to protect from the 1034 malicious server model in the solution using SMPC and HE. 1035 On the other hand, these approaches come with a price, 1036 inducing additional communication, and computation costs. 1037 DP allows controlling and tracking the privacy with the 1038 moments accountant method [96] so that the defined privacy 1039 budget given with the via ( , δ) parameters is not exceeded. 1040 In collaborative learning, the iterative nature of the training 1041 algorithm should also be reflected in privacy accounting. 1042 Privacy accounting for multiple iterations can be done using 1043 the composability feature of DP to compute and accumulate 1044 the privacy cost at each round of training.  Ensuring data privacy is a highly valuable advantage of HE.

1098
Without decrypting the data, HE allows multiple computa-1099 tions to be done on encrypted data. HE is especially useful 1100 for privacy-preserving computation over data, whose storage 1101 is outsourced to third parties. In particular, after having been   In the context of collaborative learning, in order to learn 1127 models privately using homomorphic encryption, parties first 1128 need to distribute keys. Below we first explain about key 1129 distribution in homomorphic encryption procedure between 1130 parties in collaborative learning.

1131
In one scheme, a typical cross-silo FL system is men-1132 tioned [104]. Each client has an HE module, and there is 1133 an honest-but-curious aggregator server that coordinates the 1134 clients and aggregates the encrypted gradients. A crypto-1135 graphic protocol such as SSL/TLS protocol is used to secure 1136 the communication between the clients and the aggregator 1137 server; thus, the transferred messages cannot be learned. One 1138 client is selected randomly by the aggregator as the leader 1139 to generate an HE key pair. The selected random client syn-1140 chronizes the keys to all the other clients; also generates the 1141 ML model initially and sends the weights to other clients. 1142 Whenever the clients receive the key-pair and the initial 1143 model, they start training and computing the local gradient 1144 updates. Clients encrypt the updates using the public key and 1145 send the results to the server. The server performs homo-1146 morphic computation on all received updates (e.g., add them 1147 up) and sends out the results to all clients. The aggregated 1148 gradients are decrypted by the clients and the local models 1149 are being updated. The assumptions behind this scheme may 1150 be considered prohibitively strong; it is assumed that the 1151 server will not collude with any client. If the server chooses 1152 a client intentionally as a leader, it can learn the HE key 1153 pair and consequently learns gradients sent by other clients. 1154 In addition, the leader client has to communicate with all 1155 other clients to distribute the key-pair which can increase the 1156 communication cost.

1157
In another scheme, a Multiparty Homomorphic Encryp-1158 tion (MHE) [105] uses HE scheme to encrypt and exchange 1159 the input data between multiple parties. Using some secret-1160 sharing scheme, the secret key is distributed securely among 1161 the participants to preserve the privacy of inputs. The partic-1162 ipants need to collaborate with each other for the decryption 1163 process according to the access structure of the used secret 1164 sharing scheme. In this scheme, the clients collaboratively 1165 generate a private key. Even if the server colludes with one 1166 client, it could not get information about gradients of other 1167 clients.

1168
The multi-key HE scheme is an important class of Multi-1169 party HE which is firstly proposed by López-Alt et al. [106]. 1170 In this scheme, computations can be made on ciphertexts 1171 which are encrypted under different and independent keys 1172 without a joint key setup. The decryption of the ciphertext 1173 can be done jointly by all users who are involved in the 1174 computation.

1175
Now, we give an overview of existing solutions in the 1176 literature, which enhance privacy in collaborative ML using 1177 homomorphic encryption. 1178 Mittal et al.
[107] proposed a secure k-means data mining 1179 approach where the data is assumed to be distributed among 1180 different hosts (i.e., horizontally partitioned and stored). 1181 Their approach is to mine data securely using k-means 1182 VOLUME 10, 2022 algorithm from the cloud even when adversaries exist. The privacy of data at each host must be preserved, and no inter-    The user models are uploaded to the server for aggregation aged by a coordinator that implements privacy-preserving 1238 entity resolution and an additively homomorphic encryption 1239 scheme. The third party holds a private key and receives 1240 only the encrypted aggregated model updates from partici-1241 pants which are not considered private in their setting. They 1242 provided a formal study on the impact of entity resolution 1243 errors on learning since identifying corresponding entities in 1244 a vertically partitioned dataset is challenging.

1245
In the cross-silo FL framework, to ensure that clients' 1246 updates are not revealed during aggregation, they are allowed 1247 to mask their local gradient updates using additive HE. How-1248 ever, the computation and communication cost of HE oper-1249 ation is extremely high. In [104], the authors proposed a 1250 BatchCrypt, a simple batch encryption technique, to solve 1251 the communication and computation bottlenecks caused by 1252 HE. Each client represents its gradient value with low-bit 1253 integer using quantization. Then the batch of quantized values 1254 is encoded into a long integer, and then batch encryption 1255 is performed, which decreases the overhead of encryption 1256 and total volume of ciphertext. The authors validated their 1257 techniques with experimental results.

1258
In [111] and [112], the authors proposed privacy preserving 1259 multi-party machine learning based on federated learning and 1260 homomorphic encryption where each node has a different HE 1261 private key in the same FL-based system. In [113], the authors 1262 proposed a privacy-preserving FL approach which use a 1263 momentum gradient decent optimization algorithm (MGD) 1264 to accelerate the model convergence rate during the training 1265 process. To preserve the local privacy information of each 1266 agent, a fully homomorphic encryption is adopted to encrypt 1267 gradient parameters.

1269
Secure multi-party computation (SMPC) is another paradigm 1270 for computation on encrypted data in addition to the homo-1271 morphic encryption technique. In the SMPC setting, there are 1272 parties with their sensitive inputs, and they want to compute a 1273 joint function using their inputs but don't want to reveal their 1274 private inputs to each other. In other words, at the end of the 1275 protocol, the parties learn nothing beyond what is revealed by 1276 the output itself. The term ''Secure two-party computation'' 1277 is used for the special case that the number of parties is just 1278 two. Note that in this computation paradigm, there is no need 1279 for a trusted server/party.

1280
Collaborative ML training can be considered as a function 1281 that receives sensitive data of data owners as input and out-1282 puts the collaborated machine learning model itself to the 1283 data owners or to a server. The output of the SMPC can be 1284 inference results for given inputs instead of the model itself. 1285 In literature, there are two generic secure multi-party pro-1286 tocols that can execute any function securely. One of them 1287 is Yao's garbled circuits introduced by Andrew C. Yao in 1288 1986 [114], which works only for a two-party case. In that 1289 solution, the sending party creates a circuit for the function 1290 needed to be computed, randomly selects two symmetric keys 1291 for each wire in the circuit for two possible values '0' and 1292 '1', then sends the truth table for each gate in the circuit in a 1293 random order, with the corresponding keys for its inputs. The receiver party gets the corresponding keys for its input bits  Since collaborative ML is a function, the privacy of the par-1382 ties' input can be protected by using generic SMPC solutions: 1383 Yao's Garbled Circuits or GMW. However, the realization 1384 of privacy preserving collaborative ML by using these tech-1385 niques is not so practical because of heavy computation and 1386 communication costs. Because of that, custom SMPC pro-1387 tocols have been offered to enhance privacy in collaborative 1388 ML. Instead of the construction of the SMPC protocol for the 1389 whole training algorithm, some parts of the training algorithm 1390 can be implemented in the SMPC protocol. For example, 1391 in the federated learning scenario, solving the secure aggrega-1392 tion of weight updates can be enough to prevent sensitive data 1393 leakage. Commonly used primitives in construction of SMPC 1394 protocols are secure aggregation protocols [31], [124], secret 1395 sharing schemes, and usage of generic protocols (GMW and 1396 Yao's garbled circuits). In the secure aggregation setting, the 1397 server (called as aggregator) aggregates the sensitive data of 1398 the parties without learning any sensitive information about 1399 the data. Below we give a short overview of existing solutions 1400 in the literature.

1401
For the federated learning method, [31] proposed a solution 1402 to prevent the server from learning the weight updates by 1403 obscuring the aggregation from the server. Another work 1404 related to federated learning setting proposed a method to 1405 VOLUME 10, 2022 privately aggregate the model parameter updates and detect malicious update values using secret sharing scheme [120]. ilar to that work, many of the solutions in literature make use sharing is [127], which was presented at ESORICS 2020. side channel attacks [136] to the processors that provide the 1459 security functionality are still possible. Although vendors 1460 release some countermeasures or patches, they do not guar-1461 antee that the processors are safe from side channel attacks. 1462 Mitigating the risk is left to the solution provider. Hence, the 1463 security functionalities and threat model might be limiting 1464 the intended security design, and the system is protected as 1465 much as the security of the vendor's solutions. As for the other 1466 outsourced services, integrity, confidentiality and privacy is 1467 a growing need for ML services that use remote or shared 1468 environment, no matter there are different actors involved 1469 in or not. The confidential computing paradigm enabled by 1470 TEE is a natural pragmatic solution to this problem as it 1471 provides secure training and inference for ML by isolating 1472 sensitive computations from the untrusted stack. The use of 1473 confidential computing comes at a price; it requires additional 1474 capabilities on hardware. In the FL setting, TEE can be used 1475 for server-side operations or can be used in client devices. 1476 If the client-side is a massive deployment environment like 1477 IoT cases, the requirement that each IoT device has TEE 1478 would be costly. Although there are IoT specific solutions like 1479 ARM TrustZone, current implementations are mostly done 1480 on server-side. On the other hand, recent sybil attacks show 1481 that protection against malicious client devices is required to 1482 prevent sybil based poisoning attacks [137]. , the authors focused on how to protect 1488 the DNN model in MaaS setting using TEE. They try to 1489 solve the problem arising from the fact that executing model 1490 inference within the TEE is not practical due to hardware 1491 constraints. The code running inside the TEE (enclave) is 1492 bounded by a threshold, e.g., 128 MB for Intel SGX. If the 1493 threshold is exceeded, then data swapping occurs creating 1494 performance and security issues as the data must be decrypted 1495 and encrypted during swapping operations. To overcome this 1496 problem, the model can be split up, and GPU accelerators 1497 are used for the untrusted code. Since GPUs do not provide 1498 trusted execution, how to outsource to GPU and how to split 1499 the DNN are the most important issues, which most of the 1500 studies try to deal with. To provide some level of protection, 1501 in some of the works, a blinding operation is performed 1502 before outsourcing the computation from enclave to GPU.  Depending on the trust model, there can be two scenarios 1507 where TEE usage can be leveraged in FL cases. The first 1508 scenario is the untrusted aggregation server case. Even if 1509 SMPC is used to protect the model updates, a malicious server 1510 may still be a problem in semi-honest models. In this case, 1511 TEE (like Intel SGX) can be used to provide protection for 1512 server-side operations. The second scenario is the existence 1513 of malicious client devices. Since client devices hold the data, 1514 they can see the model and can tamper with the protocol. 1515 Even if the devices behave benignly, external factors may 1516 create threats e.g., malicious mobile application can poison local training data or tamper updates. In these cases, TEEs (like ARM Trustzone) on the client-side can be used. In the worst-case scenario, when both server-side and client-side can be malicious, TEE can be employed on both sides [146].

1521
Although a lot of work has been done in ML using TEEs,  data transfer needs, respectively. We don't see such costs in 1571 DP and TEE. Regarding the communication round overhead, 1572 some SMPC solutions may be considered as heavy because 1573 of needing more than two communication rounds. From the 1574 accuracy perspective, DP is the worst candidate because there 1575 is noise in the data in this type of solution while SMPC, 1576 HE and TEE outputs the same result when compared to the 1577 plain computation. Also, it should also be considered while 1578 comparing these solutions, TEE needs special hardware while 1579 others don't. This high-level comparison of the focused PETs 1580 and their functionalities are presented in Table 3.

1581
In the context of privacy-preserving federated learning 1582 (PPFL), choosing the right privacy enhancing method is not 1583 straightforward because these methods are different in terms 1584 of effectiveness and computation cost. Using different metrics 1585 for evaluating data utility and data privacy can be a way 1586 to optimize the deployment of defense mechanisms [14]. 1587 In addition, since each privacy enhancing techniques have 1588 dominant advantages, combining different techniques may be 1589 useful to develop effective PPFL frameworks.

1591
This section proposes a methodology for practitioners to 1592 select the right privacy approach for collaborative ML mod-1593 els. We first introduce the collaborative ML considerations 1594 and PETs constraints for different metrics. We start with 1595 discussing the collaborative ML model characteristics inves-1596 tigating the different angles. Then, we provide an analysis on 1597 the PETs constraints, mapping the appropriate PETs to the 1598 identified constraints. For each metric, we provide questions 1599 to be evaluated by the practitioners. Then we introduce a 1600 selection machinery based on evaluation of these questions 1601 as shown given in Figure 10. Depending on the relations and interactions between the 1607 devices/parties, the collaboration model may change. Com-1608 munication architecture is also tied to device interactions and 1609 affects the selection of collaborative ML model, e.g., what 1610 kind of communication protocol will be used and any 1611 resource (energy, compute, storage) limitation in place. These 1612 differences affect the PET selection in terms of what is shared 1613 during the training, the ownership of the model (or infer-1614 ence model), and the threat model, which will be analyzed 1615 separately.

1616
What is shared among the parties involved in the train-1617 ing process changes with the collaboration model and 1618 impacts the communication and computation load. In FL 1619 and split learning, each client only exchanges the parame-1620 ters with the server, so there is no communication between 1621 clients. In decentralized learning, each client exchanges 1622 with the neighboring client synchronously or asynchronously 1623 VOLUME 10, 2022    The distribution of data with collaboration and ML models 1674 affects the computation operations that need to be executed 1675 privately and the selection of PET solutions. In horizontal 1676 case, which requires simple operations such as averaging, 1677 usage of somewhat or partially homomorphic encryption 1678 schemes are more practical compared to the fully homomor-1679 phic encryption schemes. 1680

1681
What is the adversary model and his/her capabilities? How 1682 the data ownership and trust boundaries are defined for the 1683 parties? 1684 Based on the identified collaboration model, data own-1685 ership, trust model, and adversary capabilities, the threat 1686 model can be identified to decide appropriate privacy solu-1687 tion. The adversary model should be defined for both server 1688 and clients, which can be trusted, honest-but-curious, or mali-1689 cious. Honest-but-curious clients access the updated model 1690 parameters received from the server. In addition, malicious 1691 clients can tamper with the training process in the rounds 1692 they participate. Similarly, an honest-but-curious server may 1693 infer all client updates during the training process. In addi-1694 tion to this, a malicious server can tamper with the training 1695 process. In the scenarios where the server can be trusted 1696 for the computation of the global model, not trusted for the 1697 privacy of the clients, and the clients may be regarded as 1698 trusted entities, a PET solution such as secure aggregation 1699 may be enough. However, when the server has the poten-1700 tial to manipulate the global model in the direction of its 1701 intent, then more advanced solutions that make the clients 1702 ensure the correctness of the model need to be constructed, 1703 which will increase the overhead. Considering these types of 1704 threat models, possible PET solutions should be evaluated 1705 for whether they meet this requirement while keeping the 1706 computation and communication overhead as low as possible. 1707 Similar considerations are also needed for other threats, such 1708 as preventing the clients to provide non-legitimate updates for 1709 the model construction. The construction of privacy solutions 1710 with low overhead for honest-but-curious adversaries is more 1711 feasible than the solutions against malicious adversaries. As a 1712 result, the decision of the appropriate PET solution is highly dependent on the threat model.  If there is a need for distributed computation to improve the 1740 performance, this also may change the data ownership and 1741 trust boundaries since data or model will be distributed to 1742 compute nodes. In the following, we explain each constraint, 1743 as depicted in Figure 9, in detail. In addition to explaining the 1744 constraints, we provide recommendations about the selection 1745 of PET solutions with some selection motivations considering 1746 characteristics of PET solutions. We also depict this recom-1747 mendation in Figure 9. Communication rounds refer to the number of times data 1753 communicates between participants during a protocol or 1754 learning process. The number of communication rounds 1755 between parties is the main issue for collaborative learning 1756 tasks affecting the total communication overhead. Increas-1757 ing communication efficiency is a challenging task and may 1758 cause redesigning the algorithms e.g., to reduce the communi-1759 cation cost of sending big weight matrices. If the bandwidth is 1760 limited and the concern is to have less communication rounds, 1761 then a privacy enhancing techniques that brings no overhead 1762 on the communication rounds, such as DP and HE, must be 1763 considered, where SMPC which may require to perform addi-1764 tional rounds might not be feasible. Since TEE solutions are 1765 mostly used to protect operations performed on the devices 1766 or attestation purposes, no additional communication cost is 1767 introduced. However, if attestation is used in TEE solution, 1768 additional communication rounds are needed. How much data is allowed to be transferred between parties? 1771 The amount of data transmitted among the parties during 1772 the protocol is an important efficiency parameter. In col-1773 laborative learning, the parameters might be updated and 1774 sent between parties several times. In SMPC solutions, there 1775 may be many communication rounds that increases the total 1776 amount of data needed to be transferred. In SMPC, the 1777 amount of data can also be very high because of the nature 1778 of the techniques. For example, in Yao's garbled circuits 1779 solution, which is a secure two-party computation protocol, 1780 to compute one-bit output of a function of two private bits of 1781 two parties, more than 768 bits must be transferred. When 1782 the transmitted data size is a constraint, DP can be pre-1783 ferred. TEEs do not put additional overhead in the network 1784 since their protection deals execution phase. If attestation is 1785 required in the system security design, additional steps are 1786 needed, hence latency concerns should be considered. How much computation resource is available to each party? 1789 The need for computation demand increases as ML 1790 is adopted extensively a use case enabler. Bringing pri-1791 vacy also adds one more angle to this demand. There-1792 fore, one needs to think about the available resources while 1793 investigating the appropriate privacy solution. When pri-1794 vacy preserving techniques are considered, their computa-1795 tion overhead should be considered as well. Considering 1796 the current privacy enhancing techniques, the computation 1797 overhead relationship, as discussed in Section VI.E, tends to order as such: HE > SMPC > DP. If the client devices have limited computation capability, the choice for privacy solution should be first DP then SMPC and HE, respectively.

1801
Bringing privacy using TEE depends on where TEE is avail-1802 able. If server-side protection e.g., protecting model privacy 1803 is the concern, then client-side limitations would not affect be applied which may result in some information loss about 1830 data and decrease the accuracy. For example, truncation of 1831 weights in neural network can decrease the communication 1832 and computation cost but also decreases the accuracy. Using 1833 the TEE as part of the privacy solution does not affect the 1834 accuracy of the model. 1835

1836
Will the solution be deployed in a multi-tenant environment? 1837 Concerns stemming from multi-tenant cloud environments 1838 also apply for ML use cases if a ML as a service (MaaS) cloud 1839 model is in place. For example, if FL server-side operations 1840 are performed on the cloud environment, then as in the other 1841 cloud-based services, memory isolation and shared environ-1842 ment access vulnerabilities will be important parameters. 1843 TEE is the only way forward to achieve protection for the 1844 data in memory. The most important aspect when using TEE 1845 is to partition the code into trusted and untrusted parts since 1846 they have memory and performance constraints [155]. The 1847 code should be partitioned in a secure way so that information 1848 disclosed outside of the enclave should not be used for adver-1849 sarial purposes. Recent developments on library operating 1850 systems (lib-OS) pave the way for running applications with-1851 out any decomposition effort. Although they are arguable for 1852 performance overhead and trusted computing base size, they 1853 are preferred for ease of development and comparable perfor-1854 mance. Graphene  Especially in cross-device cases, some devices cannot par-1864 ticipate in some iterations because of any intentional or unin-1865 tentional reasons. To clarify the issue, the following example 1866 can be given for unintentional dropouts. To construct an 1867 ML model which is used for word prediction, many mobile 1868 phone users make contributions using users' wording behav-1869 ior. In the model training phase, it is natural that some of the 1870 users may have some connection problems. For performance 1871 considerations, usage of small subsets of users can be con-1872 sidered, especially in federated learning. Usage of different 1873 subsets for each iteration can be given as an example for 1874 such an intentional dropout. Since secure multi-party com-1875 putation requires no trusted party and enables execution of 1876 functions without revealing private inputs, this technique can 1877 overcome the dropout cases. Also, with the help of threshold 1878 cryptography it is not needed to have all the clients to join to 1879 the recovery of secrets, sensitive data, and decryption keys, 1880 which may help SMPC and HE solutions to handle the device 1881 dropouts during the execution of collaborative ML.

1883
In this section we listed some of the identified issues and 1884 research directions as follows.