Privacy preserving n-party scalar product protocol

Privacy-preserving machine learning enables the training of models on decentralized datasets without the need to reveal the data, both on horizontal and vertically partitioned data. However, it relies on specialized techniques and algorithms to perform the necessary computations. The privacy preserving scalar product protocol, which enables the dot product of vectors without revealing them, is one popular example for its versatility. Unfortunately, the solutions currently proposed in the literature focus mainly on two-party scenarios, even though scenarios with a higher number of data parties are becoming more relevant. For example when performing analyses that require counting the number of samples which fulfill certain criteria defined across various sites, such as calculating the information gain at a node in a decision tree. In this paper we propose a generalization of the protocol for an arbitrary number of parties, based on an existing two-party method. Our proposed solution relies on a recursive resolution of smaller scalar products. After describing our proposed method, we discuss potential scalability issues. Finally, we describe the privacy guarantees and identify any concerns, as well as comparing the proposed method to the original solution in this aspect.


INTRODUCTION
F EDERATED learning is a field that has recently grown in prominence due to increasing awareness of data privacy issues and data ownership as well as the rising need to combine data originating from different sources [1].It is a thriving research field that promises to make it possible to apply machine learning algorithms (or any other data analysis) on multiple decentralized datasets in a collaborative manner [2].This applies to both horizontally and vertically split data.Horizontally partitioned data describes the situation where different organizations collect the same information from different individuals (e.g. the same clinical data collected in multiple hospitals).Vertically partitioned data occurs when different organizations collect different information about the same individuals (e.g. insurance claims and hospital records).
In order to apply machine learning algorithms on decentralized data, various techniques have been proposed to run the necessary analyses in a privacy-preserving manner.The techniques for vertically partitioned data are generally referred to with the umbrella term of secure multiparty computation (SMPC) [3].SMPC is a research field that focuses on developing methods to calculate functions on decentralized data without revealing the data to other parties.
Examples of the various proposed techniques are machine learning algorithms to train Bayesian networks [4], neural networks [5], or random forests [6].These algorithms .The views expressed in this paper are those of the authors and do not necessarily reflect the policy of Statistics Netherlands.may rely on techniques such as secret sharing [7] and homomorphic encryption [8].Both secret sharing and homomorphic encryption work at their core by transforming the original values α and β, owned by different parties, into transformed values γ and δ such that f (α, β) = g(γ, δ), thus making it possible to calculate the result of function f (α, β) by calculating a different function g(γ, δ) without ever needing to reveal α or β.In the case of homomorphic encryption, this is achieved by using encryption schemes that are 'homomorphic' with respect to specific functions, allowing the user to calculate these functions using encrypted data [8].In the case of secret sharing, the core concept relies on obfuscating the raw data with a secret share (e.g., a random number), and then applying calculations to the obfuscated data in such a way that the secret shares will cancel out in the end [7].
Other techniques focus on specific calculations that can be used as building blocks for machine learning algorithms, such as the scalar product (or dot product) of vectors.The scalar product is an integral part of various machine learning algorithms, such as neural network training [9].Therefore, secure scalar product protocols have been widely studied in federated learning [10].In addition, it can be used in combination with clever data representations to calculate various statistical measures in a privacy preserving manner, such as the information gain of an attribute, as well as to classify an individual using a decision tree in a federated setting [10].More generally speaking the scalar product protocol can be employed to determine the size of a subset of the population that fulfills a set of criteria in a privacy preserving manner, even if the relevant attributes are spread across multiple data owners.
Because of its importance, multiple scalar product variants have been proposed.Du and Atallah proposed several methods for the scalar product [11], [12].Du et al. also proposed a similar method for secure matrix multiplication to be used in multivariate statistical analysis [13].Vaidya and Clifton [14] proposed a new method to alleviate the scalability issues of existing methods and used this method to determine globally valid association rules.Du and Zhan [10] proposed yet another alternative, with better time complexity than the method proposed by Vaidya and Clifton [14], and better communication cost than the methods proposed by Du and Atallah [11], [12].Du and Zhan [10] then used it to train a decision tree in a federated setting.Goethals et al. [15] discovered certain privacy flaws in some of the earlier mentioned protocols, and suggested an alternative with improved privacy guarantees.Shmueli and Tassa utilize a scalar product protocol to solve a problem with n parties [16], however, it should be noted that they solely use the scalar product protocol to solve multiple independent 2party sub-problems.
However, all these solutions focus on two party scenarios where the scalar product is concerned.Translating them to scenarios involving more than two parties is not straightforward, if at all possible.This is a significant drawback since in practice often three, or even more parties, can be involved.
In this study, we look at the method proposed by [10] and determine if, and how, it can be scaled to an arbitrary number of parties.This has applications for the various calculations which can (partially) be transformed into a scalar product problem mentioned before, such as calculating information gain or anything else that can be represented as a set-inclusion problem.

METHOD
In this section, we first introduce the notation used, then we describe the original solution proposed [10].We will then try to naïvely translate the original solution to an n-party situation.This naïve translation will result in several leftover terms in the equations which need to be solved.We will then discuss how these left-over terms can be solved.We will illustrate the steps in this translation with a threeparty scenario.Finally, we will give a formal definition for the n-party scenario.
In this paper, we use lowercase letters to denote scalars (e.g., 's'), uppercase for vectors (e.g., V ) and uppercase with a bold face for matrices (e.g., 'M').

Original protocol
The original protocol [10] works as follows.Alice and Bob have different features on the same individuals and want to calculate the scalar product of their private vectors A and B, both of size m where m is semi-honest commodity server we have named Merlin.The protocol consists of the following steps.3) Bob generates a random number v 2 and computes and sends the result to Bob. 5) Bob then calculates the final result v1 It should be noted that this protocol utilizes a secret sharing approach.Because of this, the extended n-party protocol will utilize the same secret sharing approach.

Naïve translation to a three-party scenario
For our three-party scenario we now have Alice, Bob and Claire who want to calculate the scalar product of their three vectors A, B, and C of size m as well as Merlin who will aid them in the calculation by fulfilling the role of commodity server.The first problem we encounter here is that A • B • C does not result in a scalar, it results in another vector.This means it is impossible to simply chain the scalar product protocol.Hence, we must first translate our scalar product problem into a different form so it can be solved for multiple parties.
To do this we create three diagonal matrices, matrices where only the diagonal has non-zero values, A, B, and C of size m×m, using the original vectors to fill the diagonals.This allows us to calculate A • B • C, the result of which is a matrix.To turn this back into a scalar we define a function ϕ which allows us to calculate the sum of the diagonal of a matrix.This means we have translated our 2-party scalar product problem into a 3-party matrix product problem where we calculate ϕ(A • B • C).This naïve translation has a similar form as the matrix multiplication method proposed by Du et al. [13] mentioned earlier in this article, however, it includes more than two parties and all of our matrices are diagonal matrices.
It should be noted that this matrix multiplication method cannot simply be used to replace the scalar product protocol, as this would result in individual level data being shared across parties.For example, when using the scalar product protocol to build a decision tree [10], we have diagonal matrices, and the diagonal only contains 0 and 1 values.It would be trivial to deduce which positions only contained a value of 1 at all parties based on the final result using the matrix multiplication approach, which would be a major breach of privacy, as this would allow one to know which individuals were selected.
Having successfully translated our problem into a form where we can work with three parties, we will now attempt to naïvely translate the protocol.First, it should be noted that Merlin should generate random diagonal matrices instead of vectors.Second, he needs to generate an extra matrix R c and scalar r c to send to Claire.Third, we need to introduce an extra step into our protocol for Claire that is equivalent to step 4 in the two-party protocol.And last, wherever vectors owned by Alice and Bob are multiplied we must now multiply matrices owned by Alice, Bob and Claire.It should also be noted that whenever we are now multiplying matrices, we need to apply the ϕ function to turn the resulting matrix into a scalar.Consequently, our naïvely adapted protocol will look as follows: Claire then sends u 3 to Bob. 6) Bob then calculates the final result As we can see our final result is not equal to ϕ(A

Solving the left-over terms
The first left-over that should be solved is the left-over of the form ) because we already add the various r x for each x ∈ {a, b, c} once in step 3−5, even in the naïve translation.We can solve this leftover term simply by replacing r x in step 3 − 5 with For example in step 4 instead of adding r a we will add 2r a in the 3-party protocol.
The remaining left-over terms are These left-over terms all have the form of ϕ(X • R y • R z ), where x, y, & z represent the different parties Alice, Bob, & Claire, and each of the multiplicands always belongs to a different party (e.g., they are never of the form ϕ(X • R x • R y )).Furthermore, the combined term R y • R z is known by Merlin, hence this can be rewritten as ϕ(X • M), where M = R y • R z and is owned by Merlin.This means that this left-over problem can be simplified into a 2-party scalar product problem, where Merlin is one of the parties.More generally these left-over terms within an n-scalar product protocol are themselves n − 1, or smaller, scalar product problems.These smaller scalar product protocols need to be solved with additional commodity servers (i.e., Merlin cannot play that role because he is involved as a party).In section 3.2 we will discuss how many commodity servers are needed for a given n-party protocol.
With the left-over terms solved we can now create a fully translated protocol to our three-party scenario.

Correct adaptation to a three-party scenario
To allow Alice, Bob, and Claire to calculate ϕ(A • B • C) the following protocol should be followed. and are solved by separate two-party scalar product protocols.The results are given to Claire and she computes ϕ Claire then sends u 3 to Bob. 7) Bob then calculates the final result: We have now successfully translated the two-party scalar product protocol into a three-party protocol. 2

Full translation to an n-party scenario
The n-party protocol can be formalized as follows: 1) If n = 2, use the two-party protocol [10], else go to next step.2) Let D 1 , D 2 , . . ., D n be the diagonal matrices containing the vectors owned by the n parties.3) Let ϕ be a function that calculates the sum of the diagonal of a matrix.4) R 1 , R 2 , .., R n are random diagonal matrices generated by a commodity server Merlin.
all but one of the r i terms are randomly generated.6) Merlin shares the pairs {R i , r i } with the i'th party for each i ∈ [1, n] 7) All parties calculate Di = D i + R i and share the result 8) Party 1 generates v 2 .9) Party 1 then calculates Where L i corresponds to leftover terms of the form , where all parties are involved, either as D i , providing their raw data, or as R j , using their random matrix, but never as both.12) These leftover terms represent a scalar product problem of at most n − 1 parties.Thus these sub problems can be solved separately using a smaller n-party scalar product protocol.

2.
A practical example of a 3-party scalar product protocol can be found in appendix ?? 13) Solving these leftover terms allows party n to calculate ϕ(D This allows us to calculate the scalar product for an arbitrary amount of parties.Pseudocode of the protocol can be found in algorithm 1.Now that we have shown that the protocol can be translated to a scenario with arbitrary n we will discuss how the protocol scales as well as potential security issues in the next section.

Commodity server
The n-party scalar product protocol contains multiple subprotocols of at most n − 1 sized all of which involve data owned by the commodity server in the n-party scalar protocol.These sub protocols will need to use a commodity server as well.However, the original commodity server Merlin cannot be reused as Merlin fulfills the role of data-owner in these sub protocols.In section 3.2 we will discuss what influence this will have as n grows and how potential issues can be minimized.

DISCUSSION
In this paper, we have translated an existing 2-party scalar product [10] protocol to an n-party protocol.We have shown that a naïve translation is insufficient.However, by using a more sophisticated approach, it is possible to adapt the protocol to work with an arbitrary number of parties.In appendix ??, a fully worked out example of the three-party protocol can be found.Appendix ?? provides references to a repository containing java and python implementations of the n-party protocol.
We will now discuss the security and privacy guarantees this n-party protocol provides as well as how the complexity scales as the number of parties grows and how practical it is to use this protocol.

Security
The proposed method requires a commodity server, which is a semi-honest trusted third party within the calculation.A semi-honest party is a party which executes its part in the protocol accurately, but may try to learn as much as it can from the messages it receives in the process [17].In this section we will discuss the exact risks involved with this.
As a method that relies on secret shares generated by a semi-trusted third party, this protocol utilizes an approach similar to assymetric encryption [18], with the individual secret shares performing the role of private keys.This limits the risks involved.However, the trusted third party does introduce a risk in itself.
The risk posed by requiring a semi-honest trusted third party to be the commodity server would be that several semi-honest parties could potentially cooperate with the commodity server in order to jointly learn private data of the other parties.It should be noted that this risk is higher in an Internet of Things (IoT) setting than in a formalized joint research setting.An IoT setting consists of many unverified devices and parties.A formal joint research setting allows all parties involved to verify, and enforce, for example by requiring audits and adding other legal agreements, the integrity of the other parties to a certain extent.This will minimize the risk in practice in this setting.While it would be preferable if privacy could be protected by design with technical solutions, there will always be a need for a certain degree of trust in the various parties involved and legal means are a perfectly acceptable way of achieving the required trust [2].
However, this does not remove the technical possibility of a joint attack when all parties are semi-honest.The local calculations done at a given node i are always of the form: Where Dx is locally known by every data-owner participating in this protocol.However, Dx is unknown to the commodity server in this protocol.Assuming the node cooperates with the commodity server, they could then separate Dx into its com- ponents D x and R x .Where D x is private data belonging to a different party and R x is the random diagonal matrix generated by the commodity server, thus learning D x .This is a serious concern.This issue is especially relevant in an IoT setting where the trustworthiness of the commodity servers and individual parties is very difficult to verify and enforce.
However, in a formal joint research setting, a sufficient level of trust can be achieved to minimize the risk of this attack by enforcing the commodity server to act as an honest party, not just semi-honest [2] [19] First, it is possible to simply enforce this using legal means and mandate it is honest, however this may not be accepted in practice.Second, it is possible to give all parties involved joint custody over the commodity servers, thus allowing each party to individually verify the commodity server is completely honest.
Joint custody over the commodity servers could, for example, be achieved by allowing any party to execute independent audits of the commodity server and giving them a veto over the hardware and software setup used on the servers.Such a setup allows each party to individually verify that the commodity server is honest, which works because each party has a vested interested in ensuring the honesty of the commodity server to protect their own data.This should allow the parties to jointly guarantee the commodity server are honest, even if the individual parties themselves are semi-honest.
It is important to note that these security concerns, and the possible solutions, are the same regardless of the size of n.That is to say, our proposed n-party protocol is equally as secure as the original 2-party protocol proposed by Du and Zhan because the original protocol also uses a trusted third party as commodity server which as we just discussed is the vulnerability exploited in a collusion attack.

Scalability
The number of subprotocols will grow with a factorial order of growth with respect to n.The reason it scales in this manner is because the subprotocols have the form of ϕ( m i=1 D i n j=m R j -i = j).Where all parties are involved, either as D i , providing their raw data, or as R j , using their random matrix, but never as both.There will be n!
x!(n−x)!such subprotocols for each 2 ≤ x < n.These subprotocols will have x R j factors and n − x D i factors.For example, a three-party protocol will have the following 3 subprotocols involving 2 R j factors: This growth in subprotocols will have an effect on the scalability.We will discuss the two aspects in which this matters in the following two sections.

Time and Space Complexity
The first aspect affected by the factorial order of growth is the time complexity of the protocol.The amount of direct subprotocols for an n-party protocol will be equal to for each x ∈ [2; n].These subprotocols may also have further subprotocols themselves.Furthermore, the amount of messages that need to be send for a given protocol are as follows; 1 message needs to be send from the commodity server to each of the n dataowners to share the relevant pair of {R i , r i }.Each party then shares its matrix Di with each other party, resulting in n • (n − 1) messages.Finally each party has to share its subresult once, resulting in a further n messages.This means a total of n + n 2 messages for a given protocol.
In order to put this into perspective we show the number of protocols as a function of n in figure 1.In addition to this, the results of a small experiment measuring the runtime performance, where the n-party protocol was used to calculate the number of individuals fullfilling certain attribute requirements, can be found in figure 2. This experiment was run on a windows laptop using an Intel(R) Core(TM) i7-10750H processor with 16GB of memory and 6 cores.All parties had a local datastation on this laptop, no significant optimization was implemented.As can be seen in figure 1 the required number of protocols and messages grow quickly as n grows.This is a significant downside of this protocol.The results of the small runtime experiment further supports this, as the runtime does grow rapidly as the number of parties grows.However, it also shows that the protocol can easily deal with larger datasets as dataset size barely influences the runtime.It should also be noted that there is considerable room for parallelization within the protocol, allowing the protocol to still be useable in practice.The following steps can be parallelized: first, every subprotocol can naturally be calculated in parallel as these are independent problems.Secondly every calculation in substep 11 detailed in section 2.5 can be calculated in parallel as well.Both options will reduce the running time of the protocol, considerably, al- lowing it to still be a practical solution in many settings.In addition to this, the actual use of the protocol within model training can be optimized, for example by running multiple n-party product protocols in parallel.

Commodity Servers
It should be noted that these subprotocols need their own commodity server because no party may be both data owner and commodity server in a given protocol.Hence, we cannot reuse the original commodity server Merlin as it fullfills the role of a data owner in the subprotocols.
A naïve solution to the problem posed by this need would be to set up sufficient commodity servers to deal with every sub-protocol.However, the amount of commodity servers needed will scale linearly with n, since a commodity server can be shared across all subprotocols of the same size.As the largest subprotocol in an n-party protocol will be an (n − 1)-party subprotocol, and a two-party protocol will have no subprotocol, we will need n − 1 commodity servers to solve an n-party problem.While this might be manageable for small n this eventually becomes untenable.
An alternative to this naïve solution would be to have the various parties double as commodity servers whenever they are not involved in a calculation themselves.To show that this is a viable, and safe solution, we will first divide the subprotocols into two categories.All subprotocols have the form ϕ( m i=1 D i n j=m R j |i = j), this can be further subdivided into subprotocols which contain only 1 D i term, which will have the form ϕ(D i • R j • ... • R m ), and subprotocols with multiple D i terms.
The first category of subprotocols, which only contain one D i term, can be solved by simply sharing the result of random matrices is itself a random matrix, provided there are at least two R j factors involved, which cannot be used to leak any information.For example, the sub-protocols in the three-party protocol can be solved this way without requiring extra commodity servers.Doing this will also be faster than using the two-party scalar product protocol as it only requires a straightforward multiplication instead of the entire scalar product protocol.It should however be noted that the solution to this subprotocol may never be revealed to the commodity server that owns the R j terms, as this would allow the commodity server to calculate D i .For example, if we are calculating A • R b • R c the result should never be revealed to Merlin, as revealing this would allow Merlin to learn Alice's data.This is of course also true in the original 2-party protocol.
The second category of subprotocol, which contains multiple D i terms, can reuse one of the parties which is not currently providing data (i.e. a D i term) as the new commodity server.This is secure as there is no need to reveal anything to the commodity server during the calculation.All it needs to do is generate and share the new {R i , r i } pairs for this subprotocol.As such, it never needs to see any (sub)results, and thus cannot reverse engineer anything.Additionally, the same party should never be used twice as a commodity server in any set of subprotocols.That is to say, if Alice handles a 3-party subprotocol then she should not handle any child protocols that arise as a consequence of this specific 3-party subprotocol.Fortunately, it is easy to avoid this as there will always be at least one new party available to fulfil the role of commodity server for the new subprotocols.
While this is a practical solution to the need for multiple commodity servers, it does come with the major caveat that one must be certain no parties will attempt to cooperate to jointly learn private data of the other parties.As pointed out in section 3.1, the protocol is vulnerable to this type of attack.

CONCLUSION
In this paper, we have explained how the two-party scalar product protocol by Du and Zhan [10] can be scaled to an n-party scalar product protocol.We have illustrated how it works using a three-party scenario, after which we have given the formal definition of the protocol for any number of parties.This protocol can be used to calculate a number of metrics, such as the information gain of an attribute [10], in a scenario with an arbitrary number of parties.The benefit of being able to calculate such metrics is that it opens up the door for other more complex analysis.For example, using the information gain one can build a decision tree or apply feature selection.
Similarly, by using an innovative data representation the n-party protocol can be used to classify an individual in a privacy preserving manner using a decision tree [10].By using other innovative data representations this n-party protocol could potentially be used for a wide variety of analysis and calculations.Aside from these benefits, which require the problem at hand to be rephrased into a scalar product problem, there is also the obvious benefit that it allows the use of the scalar product itself in an n-party scenario.This allows the use of any calculation that would normally rely on the scalar product in a classical machine learning setting but which cannot be executed easily in a federated setting without a private n-party scalar product protocol.
While not appropriate in every scenario (scalability and the need for more commodity servers or semi-honest servers as the number of parties grows are a practical concern), we believe this is still a valuable tool in the federated learning toolbox.

Future work
For future work we would like to devise n-party protocols with better time complexity, as well as find a way to remove the vulnerability to joint-attacks introduced by the need for a commodity server.
In addition to this it would be valuable to investigate to which extend our extension to n parties can be applied to the secure matrix multiplication proposed by Du et al. [13].The protocol used for matrix multiplication is very similar to the 2-party scalar product protocol we extended, as such our extension should be of use when extending this matrix multiplication protocol.
Lastly, we are planning to utilize the n-party scalar product protocol to implement various federated algorithms so we can test the practical viability of this protocol in a real life setting.
Lianne Ippel Lianne Ippel received her PhD in Statistics from Tilburg University on analyzing data streams with dependent observations, for which she won the dissertation award from General Online Research conference (2018).After a Postdoc at Maastricht University, she now works at Statistics Netherlands where she works at the methodology department on international collaborations and innovative methods for primary data collection.

•
F. van Daalen, I. Bermejo, and A. Dekker are with the Department of Radiation Oncology (MAASTRO) GROW School for Oncology and Reproduction Maastricht University Medical Centre+ Maastricht the Netherlands • L. Ippel is with Statistics Netherlands Heerlen the Netherlands.

1 ) 2 )
Merlin generates two random vectors R a , R b of size m and two scalars r a and r b such that r a + r b = R a •R b , where either r a or r b is randomly generated.Merlin then sends {R a , r a } to Alice and {R b , r b } to Bob.Alice sends Â = A + R a to Bob, and Bob sends B = B + R b to Alice.

1 ) 3 )
Merlin generates three random diagonal matrices R a , R b , R c and two random scalars r a , r b .It then calculates a third scalar r c such that r a + r b + r c = ϕ(R a •R b •R c ). Merlin then sends {R a , r a } to Alice, {R b , r b } to Bob and {R c , r c } to Claire.2) Alice calculates Â = A + R a and sends it to Bob and Claire, Bob sends B = B + R b to Alice and Claire, and Claire sends Ĉ = C + R c to Alice and Bob. Bob generates a random number v 2 and computes

1 )
Merlin generates three random diagonal matrices R a , R b , R c and two random scalars r a , r b .It then 1.A full elaboration of the equation can be found in appendix ?? calculates a third scalar r c such that r a + r b + r c = ϕ(R a •R b •R c ). Merlin then sends {R a , r a } to Alice, {R b , r b } to Bob and {R c , r c } to Claire.2) Alice sends Â = A + R a to Bob and Claire, Bob sends B = B + R b to Alice and Claire, and Claire sends Ĉ = C + R c to Alice and Bob. 3) Bob generates a random number v 2 and computes

Fig. 1 :
Fig. 1: Rate at which the number of protocols and messages grow as functions of n. (y-axis in log-scale)

Fig. 2 :
Fig. 2: Average time in ms necessary to calculate the number of individuals fullfilling the requirements of 2-5 attributes divided over 2-5 parties for different population sizes.(yaxis in log-scale) Prof. Andre Dekker, PhD (1974) is a medical physicist and professor of Clinical Data Science at Maastricht University Medical Center and Maastro Clinic in The Netherlands.His Clinical Data Science research group (50 staff) focuses on 1) federated FAIR data infrastructures, 2) AI for health outcome prediction models and 3) applying AI to improve health.Prof. Dekker has authored over 200 publications, mentored more than 30 PhD students and holds multiple awards and patents on the topic of federated data and AI.He has held visiting scientist appointments at universities and companies in the UK, Australia, Italy, USA and Canada.Inigo Bermejo Inigo Bermejo received the BSc degree on Computer Engineering from the University of the Basque Country, Spain, in 2006 and the PhD in Intelligent Systems from UNED, Spain, in 2015.He is currently a postdoctoral researcher at the Clinical Data Science group, Maastricht University.His research interests include privacy preserving techniques, prediction modelling and causal inference.