Processing math: 100%
Privacy Preserving Data Sharing With Anonymous ID Assignment | IEEE Journals & Magazine | IEEE Xplore

Privacy Preserving Data Sharing With Anonymous ID Assignment

Open Access

Abstract:

An algorithm for anonymous sharing of private data among N parties is developed. This technique is used iteratively to assign these nodes ID numbers ranging from 1 to N. ...Show More

Abstract:

An algorithm for anonymous sharing of private data among N parties is developed. This technique is used iteratively to assign these nodes ID numbers ranging from 1 to N. This assignment is anonymous in that the identities received are unknown to the other members of the group. Resistance to collusion among other members is verified in an information theoretic sense when private communication channels are used. This assignment of serial numbers allows more complex data to be shared and has applications to other problems in privacy preserving data mining, collision avoidance in communications and distributed database access. The required computations are distributed without using a trusted central authority. Existing and new algorithms for assigning anonymous IDs are examined with respect to trade-offs between communication and computational requirements. The new algorithms are built on top of a secure sum data mining operation using Newton's identities and Sturm's theorem. An algorithm for distributed solution of certain polynomials over finite fields enhances the scalability of the algorithms. Markov chain representations are used to find statistics on the number of iterations required, and computer algebra gives closed form results for the completion rates.
Published in: IEEE Transactions on Information Forensics and Security ( Volume: 8, Issue: 2, February 2013)
Page(s): 402 - 413
Date of Publication: 20 December 2012

ISSN Information:


SECTION I.

Introduction

The popularity of internet as a communication medium whether for personal or business use depends in part on its support for anonymous communication. Businesses also have legitimate reasons to engage in anonymous communication and avoid the consequences of identity revelation. For example, to allow dissemination of summary data without revealing the identity of the entity the underlying data is associated with, or to protect whistle-blower's right to be anonymous and free from political or economic retributions [1]. Cloud-based website management tools [2] provide capabilities for a server to anonymously capture the visitor's web actions. The problem of sharing privately held data so that the individuals who are the subjects of the data cannot be identified has been researched extensively [3]. Researchers have also investigated the relevance of anonymity and/or privacy in various application domains: patient medical records [4], electronic voting [5], e-mail [6], social networking [7], etc.

Another form of anonymity, as used in secure multiparty computation, allows multiple parties on a network to jointly carry out a global computation that depends on data from each party while the data held by each party remains unknown to the other parties [8], [9]. A secure computation function widely used in the literature is secure sum that allows parties to compute the sum of their individual inputs without disclosing the inputs to one another. This function is popular in data mining applications and also helps characterize the complexities of the secure multiparty computation [10], [11].

This work deals with efficient algorithms for assigning identifiers (IDs) to the nodes of a network in such a way that the IDs are anonymous using a distributed computation with no central authority. Given N nodes, this assignment is essentially a permutation of the integers \{1,\ldots, N\} with each ID being known only to the node to which it is assigned. Our main algorithm is based on a method for anonymously sharing simple data and results in methods for efficient sharing of complex data. There are many applications that require dynamic unique IDs for network nodes [12]. Such IDs can be used as part of schemes for sharing/dividing communications bandwidth, data storage, and other resources anonymously and without conflict. The IDs are needed in sensor networks for security or for administrative tasks requiring reliability, such as configuration and monitoring of individual nodes, and download of binary code or data aggregation descriptions to these nodes. An application where IDs need to be anonymous is grid computing where one may seek services without divulging the identity of the service requestor [13].

To differentiate anonymous ID assignment from anonymous communication, consider a situation where N parties wish to display their data collectively, but anonymously, in N slots on a third party site. The IDs can be used to assign the N slots to users, while anonymous communication [6], [14] can allow the parties to conceal their identities from the third party.

In another application, it is possible to use secure sum to allow one to opt-out of a computation beforehand on the basis of certain rules in statistical disclosure limitation [15] or during a computation [16] and even to do so in an anonymous manner. However, very little is known with respect to methods allowing agencies to opt-out of a secure computation based on the results of the analysis, should they feel that those results are too informative about their data [17].

The work reported in this paper further explores the connection between sharing secrets in an anonymous manner, distributed secure multiparty computation and anonymous ID assignment. The use of the term “anonymous” here differs from its meaning in research dealing with symmetry breaking and leader election in anonymous networks [18], [19]. Our network is not anonymous and the participants are identifiable in that they are known to and can be addressed by the others.

Methods for assigning and using sets of pseudonyms, have been developed for anonymous communication in mobile networks [20], [21]. The methods developed in these works generally require a trusted administrator, as written, and their end products generally differ from ours in form and/or in statistical properties. To be precise, with N nodes n_{i} the algorithms of this paper distribute a computation among the nodes generating a permutation s of \{1, 2,\ldots, N\} chosen with a uniform probability of N!^{-1} from the set of all permutations of \{1, 2,\ldots, N\} where n_{i} will know only s_{i}. Such a permutation can also be produced by algorithms designed for mental poker [22]. The algorithms for mental poker [23] are more complex and utilize cryptographic methods as players must, in general, be able to prove that they held the winning hand. Throughout this paper, we assume that the participants are semi-honest [10], also known as passive or honest-but-curious, and execute their required protocols faithfully. Given a semi-honest, reliable, and trusted third party, a permutation can also be created using an anonymous routing protocol [6], [14]. Despite the differences cited, the reader should consult [20] and consider the alternative algorithms mentioned above before implementing the algorithms in this paper.

This paper builds an algorithm for sharing simple integer data on top of secure sum. The sharing algorithm will be used at each iteration of the algorithm for anonymous ID assignment (AIDA). This AIDA algorithm, and the variants that we discuss, can require a variable and unbounded number of iterations. Finitely-bounded algorithms for AIDA are discussed in Section IX. Increasing a parameter S in the algorithm will reduce the number of expected rounds. However, our central algorithm requires solving a polynomial with coefficients taken from a finite field of integers modulo a prime. That task restricts the level to which S can be practically raised. We show in detail how to obtain the average number of required rounds, and in the Appendix detail a method for solving the polynomial, which can be distributed among the participants.

SECTION II.

A Review of Secure Sum

Suppose that a group of hospitals with individual databases wish to compute and share only the average of a data item, such as the number of hospital acquired infections, without revealing the value of this data item for any member of the group. Thus, N nodes n_{1}, n_2,\ldots n_N have data items d_{1}, d_2,\ldots d_N, and wish to compute and share only the total value T=d_{1}+d_{2}+\cdots+d_{N}. A secure sum algorithm allows the sum T to be collected with some guarantees of anonymity. Again, we assume the semi-honest model of privacy preserving data mining [10]. Under this model, each node will follow the rules of the protocol, but may use any information it sees during the execution of the protocol to compromise security.

Should all pairs of nodes have a secure communication channel available, a simple, but resource intensive, secure sum algorithm can be constructed. In the following algorithm, it is useful to interpret the values as being integer on first reading:

Table I Random Numbers Transmitted by a Secure Sum Execution
Table I- Random Numbers Transmitted by a Secure Sum Execution

SECTION Algorithm 1

Secure Sum

Given nodes n_{1},\ldots, n_N each holding an data item d_{i} from a finitely representable abelian group, share the value T=\sum d_{i} among the nodes without revealing the values d_{i}.

  1. Each node n_{i}, i=1,\ldots, N chooses random values r_{i,1},\ldots, r_{i,N} such that r_{i,1}+\cdots+r_{i,N}=d_{i}

    View SourceRight-click on figure for MathML and additional features.

  2. Each “random” value r_{i,j} is transmitted from node n_{i} to node n_{j}. The sum of all these random numbers r_{i,j} is, of course, the desired total T.

  3. Each node n_{j} totals all the random values received as: s_{j}=r_{1,j}+\cdots+r_{N,j}

    View SourceRight-click on figure for MathML and additional features.

  4. Now each node n_{i} simply broadcasts s_{i} to all other nodes so that each node can compute: T=s_{1}+\cdots+s_{N}

    View SourceRight-click on figure for MathML and additional features.

Example 1: A Secure Sum Computation

In Table I two examples are shown, one for later use. The reader can ignore the columns labelled {\mathhat{r}}_{i,1} and {\mathhat{d}}_{i} and need not attribute any significance to the boldface type. In the example, the initial data items held by nodes n_{1}, n_{2}, n_{3} and n_{4} are d_{1}=6, d_{2}=10, d_{3}=6 and d_{4}=2 respectively. For example, node n_{2} would transmit 7, 3, {-}{5}, and 5 to nodes n_{1}, n_{2}({\rm itself}), n_{3} and n_{4} respectively. Node n_{2} would receive {-}{10}, 3, 11, and {-}{8} from nodes n_{1}, n_{2}({\rm itself}), n_{3} and n_{4} respectively. Then node n_{2} would compute and transmit the total s_{2}=-4 of the values received to all nodes. Finally, n_{2} would compute the total of all the second round transmissions received, 24=18+-4+8+2.

Our choices for the “random” numbers r_{i,j} are for illustration and are not realistic. For example, given that each data item was originally in the range 0 to 10, the total would be in the range 0 to 40, and choosing random numbers modulus 41 would be more appropriate. \hfill{\square}

In seeking to specify the security and privacy provided by our algorithms, we are indeed fortunate to have an abundance of definitions to choose from [24], [25], even when restricting ourselves with the semi-honest assumption. The choice of definition should be dependent on considerations such as whether private or cryptographically secured communications channels are used, etc. We follow the suggestion of a reviewer that a particular information theoretic definition of privacy be used. The central arguments of the proofs should remain useful when evaluating the algorithms with respect to other models of secure multiparty computation.

We will have need of some notational conventions. Note that given a function or sequence f:D\rightarrow R, its restriction to a subdomain S\subseteq D is denoted by f\vert_{S}. For a function f of two variables we will denote its restriction to the domain A\times B by f\vert_{AB}. Using bracket notation the set \{1,\ldots, n\} is denoted by [n]. We also may identify an indexed set with its set of indexing subscripts as in writing n\vert_{[N]} or even [N] for the nodes \{n_{1},\ldots, n_{N}\} when clear from the context.

Suppose that a coalition C\subset [N] of the nodes seeks to garner information about the private data d\vert_{D} of the other nodes D=[N]\setminus C. Since the parties are semi-honest, the possibly useful outside information available to a coalition seeking to garner the private information of other parties, consists of only the random numbers in the messages received from those parties r\vert_{DC}, the partial sums s\vert_{D} and the total T. We view the algorithm at a valid completion meaning that the relations between r_{ij}, d_{i}, s_{i} and T established by Algorithm 1 are satisfied. It is evident that the coalition, C learns nothing.

Theorem 1

Suppose that \vert C\vert<N. Then for any given d\vert_{C}, T, s\vert_{D} and r\vert_{DC} there is a one-to-one correspondence between all valid outcomes with values d\vert_{D} and those with any valid replacement values d\vert_{D}\leftarrow{\mathhat{d}}\vert_{D}.

Proof

Given values d\vert_{D} at termination, suppose w.l.o.g that 1\in D. An equally likely outcome would have been to have altered r_{ij} values {\mathhat{r}}_{i,1}=r_{i,1}+{\mathhat{d}}_{i}-d_{i} for i\in D and {\mathhat{d}}\vert_{D} replacing d\vert_{D} where all other values remain unchanged. The revised outcome is also valid. Of note is that since \sum d\vert_{D}=\sum{\mathhat{d}}\vert_{D}, it follows that \sum{\mathhat{r}}_{[N]\{1\}}=\sum r_{[N]\{1\}}=s_{1}. \hfill{\square}

In Table I an example with C=\{4\} shows two corresponding outcomes with d\vert D=(6, 10, 6) and (8, 9, 5) respectively with the values {\mathhat{r}}\vert_{D\{1\}} adjusted as specified in the proof. Items known by C are in boldface. Interpreting all entries modulus 41 is required to actually meet the conditions of the theorem.

More formally, let P(Y_{1},\ldots, Y_{N}\mid X_{1},\ldots, X_{N})=P(Y_{[N]}\mid X_{[N]}) denote the conditional probability distribution computed by a distributed protocol where X_{[N]} is the distribution of inputs provided by the N parties n\vert_{[N]} respectively, and similarly the outputs known to the respective parties are Y_{[N]}. For privacy nothing additional about D_{P}=(X\vert_{D}, Y\vert_{D}) should be learned by an adversary whose view V_{P} of the computation includes all information known by the coalition including C_{P}=(X\vert_{C}, Y\vert_{C}) and any random values generated by or messages passed to or from coalition members. Denoting the conditional mutual Shannon information between A and B given C by I(A; B\mid C) [26] define:

Definition 1: Subset Collusion Resistance

The algorithm computing the conditional probability P will be said to be subset collusion resistant to the set of semi-honest parties C=n\vert_{[N]}\setminus D provided that: I(V_{P}; D_{P}\mid C_{P})=0

View SourceRight-click on figure for MathML and additional features.

Thus, in the above definition, privacy is preserved if an adversary possessing the collective information of the coalition can learn nothing beyond that by observing the algorithm—loosely that V_{P} and D_{P} are independent of each other when C_{P} is factored out. Because the outputs Y_{C} must be generated by any protocol implementing the function, they are included in the view V_{P}. It can be useful to add global information explicitly to the definition, but it is not necessary for our results. If a protocol/algorithm is subset collusion resistant for all subsets C of size t or less, then it is said to be t-private.

Theorem 2: Secure Sum is N-Private

The secure sum method of Algorithm 1 is resistant to the collusion of any subset of the participating nodes, where the data values are from a finite abelian group.

For use in the following proof, two applications of the chain rule for conditional information suffice to show that I(A; B, C\mid C, D)=I(A; B\mid C, D). We also use [27] I(A, B; C\mid D)=I(B; C\mid D)+I(A; C\mid D, B).

Proof

The cases \vert C\vert\geq N-1 are vacuously true. Since all parties are semi-honest, we may take V_{P}=(r\vert_{DC}, s\vert_{[N]}, r\vert_{C[N]}, d\vert_{C}, T). Because s\vert_{C} can be calculated from r\vert_{C[N]}, and d\vert_{C} from r\vert_{C[N]} and r\vert_{DC}, this reduces to V_{P}=(r\vert_{DC}, s\vert_{D}, r\vert_{C[N]}, T) and we compute: \eqalignno{& I(V_{P}; D_{P}\mid C_{P})\cr&\quad=I(r\vert_{DC}, s\vert_{D}, r\vert_{C[N]}, T; d\vert_{D}, T\mid d\vert_{C}, T)\cr&\quad=I(r\vert_{DC}, s\vert_{D}, r\vert_{C[N]}; d\vert_{D}\mid d\vert_{C}, T)\cr&\quad=I(r\vert_{C[N]}; d\vert_{D}\mid d\vert_{C}, T)\cr&\qquad+I(r\vert_{DC}, s\vert_{D}; d\vert_{D}\mid d\vert_{C}, T, r\vert_{C[N]})\cr&\quad=I(r\vert_{C[N]}; d\vert_{D}\mid d\vert_{C}, T)+0\cr&\quad\leq I(d\vert_{C}; d\vert_{D}\mid d\vert_{C}, T)=0}

View SourceRight-click on figure for MathML and additional features.where the information processing inequality [28, p. 541], with probabilistic functions for conditional mutual information is used in showing the inequality above. That I(r\vert_{DC}, s\vert_{D}; d\vert_{D}\mid d\vert_{C}, T, r\vert_{C[N]})=0 follows from Theorem1. \hfill\square

The secure sum algorithm just given remains N-private even when the input data are known as a multiset to all parties. We term this input permutation collusion resistance as the coalition knows the data held by the remaining parties only as a multiset. Because the parties are semi-honest, it does not matter if the values are known a priori or a posteriori. In the preceding proof, given that the multiset of values \{d\vert_{[N]}\} is known, the input distribution simply requires that the revised values be a permutation of the original values, and that proof, as well as, more directly, the proof of Theorem 1, hold with this restricted choice. This yields a trivial result which will, nonetheless, be of importance in the sequel:

Corollary 3: Secure Sum Hides Permutations

The secure sum method of Algorithm 1 is input permutation resistant to the collusion of any subset of the participating nodes.

Other secure sum algorithms [16], [29] certainly can be used with physically or cryptographically secured communications channels. For example, it is easy to see that secure sum using a single Hamiltonian cycle [10] is input permutation collusion resistant provided that the coalition C is trapped in a connected region of the cycle. Such results can also be extended to provide privacy guarantees for the algorithms in subsequent sections should they utilize, e.g., a Hamiltonian cycle based secure sum. The secure sum technique can be employed with finite abelian groups other than GF(P) such as GF(2)^{n}.

SECTION III.

Transmitting Simple Data with Power Sums

Suppose that our group of nodes wishes to share actual data values from their databases rather than relying on only statistical information as shown in the previous section. That is, each member n_{i} of the group of N nodes n\vert_{[N]} has a data item d_{i} which is to be communicated to all the other members of the group. However, the data is to remain anonymous. We develop a collusion resistant method for this task using secure sum as our underlying communication mechanism. Our data items d_{i} are taken from a, typically finite, field F. In the usual case, each d_{i} will be an integer value and F will be the field GF(P) where P is a prime number satisfying P>d_{i} for all i. Thus, arithmetic will typically be performed using modulus P, but other fields will also be used.

SECTION Algorithm 2

Anonymous Data Sharing with Power Sums

Given nodes n_{1},\ldots, n_{N} each holding a data item d_{i} from a finitely representable field F, make their data items public to all nodes without revealing their sources.

  1. Each node n_{i} computes d_{i}^{n} over the field F for n=1,2,\ldots, N. The nodes then use secure sum to share knowledge of the power sums: \matrix{\noalign{\vskip-6pt}\multispan{9}\hrulefill\cr\smash{\hbox{\vrule height1.4pc depth1.0pc}}& P_1=\sum\limits_{i=1}^N d_i^1 \!&\! \smash{\hbox{\vrule height1.4pc depth1.0pc}} \!&\! \quad P_2=\sum\limits_{i=1}^N d_i^2 \!&\!\smash{\hbox{\vrule height1.4pc depth1.0pc}} \!&\! \cdots \!&\!\smash{\hbox{\vrule height1.4pc depth1.0pc}} \!&\! P_N=\sum\limits_{i=1}^N d_i^N &\smash{\hbox{\vrule height1.4pc depth1.0pc}} \cr \multispan{9}\hrulefill}

    View SourceRight-click on figure for MathML and additional features.

  2. The power sums P_{1},\ldots, P_{N} are used to generate a polynomial which has d_{1},\ldots, d_{N} as its roots using Newton's Identities as developed in [30]. Representing the Newton polynomial as p(x)=c_{N}x^{N}+\cdots+c_{1}x+c_{0}\eqno{\hbox{(1)}}

    View SourceRight-click on figure for MathML and additional features.the values c_{0},\ldots, c_{N} are obtained from the equations: \eqalignno{c_{N}=&\,-1\cr c_{N-1}=&\,-{1\over 1}(c_N P_1)\cr c_{N-2}=&\,-{1\over 2}(c_{N-1}P_1+c_N P_2)\cr c_{N-3}=&\,-{1\over 3}(c_{N-2}P_1+c_{N-1}P_2+c_N P_3)\cr c_{N-4}=&\,-{1\over 4}(c_{N-3}P_1+c_{N-2}P_2+c_{N-1}P_3+c_N P_4)\ldots\cr c_{N-m}=&\,-{1\over m}\sum_{k=1}^{m}c_{N-m+k}P_{k}&{\hbox{(2)}}}
    View SourceRight-click on figure for MathML and additional features.

  3. The polynomial p(x) is solved by each node, or by a computation distributed among the nodes, to determine the roots d_{1},\ldots, d_{N}.

The power sums P_{i} can be collected and shared using a single round of secure sum by sending them as an array and applying the method to the vectors transmitted and received. The power sums are symmetric functions, and thus no association is made between n_{i} and the value of d_{i}. However, nonetheless, the information contained in these sums can be used to find the values of the data items d_{1},\ldots, d_{N}.

The choice c_{N}=-1, chosen for consistency with [31], may be replaced by c_{N}=1 or any other nonzero value. Also, note that in the typical F=GF(P) case, the solution for the c_{i} requires finding the multiplicative inverse of the coefficients 2,3,4,\ldots,N modulo P. While the Euclidean algorithm could be used, the inverses 1/x can easily be computed in the order x=1,2,\ldots,N by the formulae: q={P/ x}+r;\quad{1/x}=-q\left({1/ r}\right)\quad ({\rm mod} P)

View SourceRight-click on figure for MathML and additional features.

After the integer division with remainder r, 1/r will already be known, since r<x.

Table II Powers of Data Values d_{i} Chosen by Each Node Modulo P=11
Table II- Powers of Data Values $d_{i}$ Chosen by Each Node Modulo $P=11$

Example 2: Transmission of Simple Data

Suppose that N=4 nodes n_{i} wish to share a data item d_{i} which takes values 0 to 10. Using our sample data again, the values will be d_{1}=6 for n_{1}, d_{2}=10 for n_{2}, d_{3}=6 for n_{3}, and d_{4}=2 for n_{4}. Choice of the prime P=11 and F=GF(11) will serve to represent these numbers. The modulus 11 inverses needed will be 1/2=6, 1/3=4, and 1/4=3. The nodes compute the power sums shown in Table II. Solving each of the Newton identities (2) in turn yields c_{4}=-1=10, c_{3}=2, c_{2}=9. c_{1}=1, and c_{0}=6, and thus the polynomial of (1) is p(x)=10 x^{4}+2 x^{3}+9 x^{2}+1 x+6\quad ({\rm mod} P=11)

View SourceRight-click on figure for MathML and additional features.

All the nodes receive the values P_{1}, P_{2}, P_{3}, and P_{4} and can compute the polynomial and its roots to recover the original data items, 2, 6, 6, and 10, but not their indices. \hfill{\square}

There are, of course, many methods available to accomplish the solution of the polynomial p(x). A simple method which proved fast in practice is given in the Appendix.

In the sharing procedure, each node n_{i} adds a vector D_{i}=(d_{i}^{1},\ldots, d_{i}^{N}) to the secure sum. The vector D_{i} can be regarded as a single integer in a very high radix. However, note that the multiset of values \{D_{1},\ldots, D_{N}\} becomes known to all participants. The protection of the values against collusion is dependent upon the version of secure sum used and the particular set of colluding parties (and not merely the size of that set). We content ourselves with observing:

Theorem 4: Power Sum Data Sharing is N-Private

The data sharing method of Algorithm 2 is resistant to the collusion of any subset of the participating nodes when based on the secure sum Algorithm 1.

Because the input data is present as a multiset in the output of every party and all parties are semi-honest the result is implied by our previous discussions of the secure sum Algorithm 1. The data sharing is anonymous in the sense that the sources of the data items cannot be traced. Of course, it is possible that a given data value would make sense only for a certain participant due to some factor such as the relative sizes of the participants. The paper [16] shows how anonymous opt-out can be used to address some of these concerns.

SECTION IV.

Sharing Complex Data with AIDA

Now consider the possibility that more complex data is to be shared amongst the participating nodes. Each node n_{i} has a data item d_{i} of length b-bits which it wishes to make public anonymously to the other participants.

As the number of bits per data item and the number of nodes becomes larger, the method of the previous section becomes infeasible. Instead, to accomplish this sharing, we will utilize an indexing of the nodes. Methods for finding such an indexing are developed in subsequent sections. Assume that each node n_{i} has a unique identification (ID) or serial number s_{i}\in\{1, 2,\ldots, N\}. Further, suppose that no node has knowledge of the ID number s_{i} of any other node, and that s_{1},\ldots, s_{N} are a random permutation of 1,\ldots, N. This, again, is termed an Anonymous ID Assignment (AIDA) [16].

Such an AIDA may be used to assign slots with respect to time or space for communications or storage. It may be possible to simply have a database with central storage locations C_{i} such that each node simply stores its data there setting C_{s_{i}}:=d_{i}. This could occur if there was a trusted central authority, or if the storage operation C_{s_{i}}:=d_{i} was untraceable [6], [14].

Given that there is no central authority (the situation for which secure sum was designed), secure sum can be used to accomplish the desired data sharing. Let \vec{0} be a vector of b-bits. Each node creates a data item D_{i} of N\cdot b-bits. Numbering each of the N, b-bit components 1, 2,\ldots, N we have: \matrix{& 1 & 2& & s_i& & N\cr\noalign{\vskip-6pt}\multispan{7}\hrulefill\cr D_i=&{\vec 0}&{\vec 0}&\cdots & d_i &\cdots &{\vec 0}\cr\noalign{\vskip-6pt}\multispan{7}\hrulefill}

View SourceRight-click on figure for MathML and additional features.

The secure sum algorithm, given earlier in this paper, may now be used to collect the data items D_{1},\ldots, D_{N}. The algorithm is applied using GF(2)^{N b} as the abelian group. The group operation is bitwise exclusive-or, and each node n_{i} will choose N-1 random entries r_{i,j}, each composed of N\cdot b randomly chosen bits while calculating one entry, e.g., r_{i,i} to ensure sum (by xor) equal to D_{i}.

SECTION V.

How to Find an AIDA

We present a simple algorithm for finding an AIDA which has several variants depending on the choice of the data sharing method at step (3) below. At one step, random integers or “slots” between 1 and S are chosen by each node. A node's position will be determined by its position among the chosen slots, but provisions must be made for collisions. The parameter S should be chosen so that S\ge N.

SECTION Algorithm 3

Find AIDA

Given nodes n_{1},\ldots, n_{N}, use distributed computation (without central authority) to find an anonymous indexing permutation s:\{1,\ldots, N\}\to\{1,\ldots, N\}.

  1. Set the number of assigned nodes A=0.

  2. Each unassigned node n_{i} chooses a random number r_{i} in the range 1 to S. A node assigned in a previous round chooses r_{i}=0.

  3. The random numbers are shared anonymously. One method for doing this was given in Section III. Denote the shared values by q_{1},\ldots, q_{N}.

  4. Let q_{1},\ldots, q_{k} denote a revised list of shared values with duplicated and zero values entirely removed where k is the number of unique random values. The nodes n_{i} which drew unique random numbers then determine their index s_{i} from the position of their random number in the revised list as it would appear after being sorted: s_{i}=A+{\rm Card}\{q_{j}:q_{j}<=r_{i}\}

    View SourceRight-click on figure for MathML and additional features.

  5. Update the number of nodes assigned: A=A+k.

  6. If A<N then return to step (2).

Example 3: Execution of Algorithm to Find an AIDA

Suppose that four nodes participate in searching for an AIDA. For simplicity we continue our running example with S=10 and random number choices 6, 10, 6, 2 again in the first round. The choices of n_{1} and n_{3} are 5 and 6 respectively in the second round while n_{2} and n_{4} choose 0 as they will already have indices assigned at that point. A trace of critical steps in the procedure is shown in Table III. The final AIDA result is then s_{1}=3 for node n_{1}, s_{2}=2 for node n_{2}, s_{3}=4 for node n_{3}, and s_{4}=1 for node n_{4}. \hfill\square

Table III Trace of an AIDA Algorithm Execution
Table III- Trace of an AIDA Algorithm Execution

The number of rounds this algorithm takes is modeled by a Markov chain. While no absolute upper bound is possible, we will see in Section VII that the performance is very good, as one might expect, when S is much larger than N. The various methods for sharing the random numbers at step (3) are addressed in Section VI.

The collusion resistance of AIDA depends upon the underlying secure sum algorithm used and the collusion resistance of that algorithm for a particular set of colluding nodes C. The strongest result possible can be obtained by using our simple, but inefficient, secure sum Algorithm 1:

Theorem 5: AIDA is N-Private

Algorithm 3 is resistant to the collusion of any subset of the participating nodes when the secure sum method of Algorithm 1 is used.

Proof

We sketch the essential step of the proof by viewing the AIDA algorithm at its final termination. Suppose that there are M iterations of steps (2)–​(6). Let r^{m}_{1},\ldots, r^{m}_{N} denote the random values chosen by nodes n_{1},\ldots, n_{N} at step (2) in iteration m. Denote by i\rightarrow s(i) the final permutation s(i)=s_{i} produced by AIDA.

Let t(i) denote any permutation of [N]. Suppose that the random choices by node n_{i} at iteration m during execution had been {\mathhat{r}}^{m}_{i}=r^{m}_{t(i)} rather than r^{m}_{i}. This choice of random numbers would be equally likely and would have resulted in the final assignment n_{i}\rightarrow s(t(i)).

Let C denote the coalition of colluding nodes and D=[N]\setminus C the remaining nodes. Given any desired permutation p:D\rightarrow D, define p^{\prime}(i)=p(i)\,\vert\, i\in D;\quad p^{\prime}(i)=i\,\vert\, i\in C.

View SourceRight-click on figure for MathML and additional features.

The selection t(i)=s^{-1}(p^{\prime}(i)) yields n_{i}\rightarrow s(s^{-1}(p^{\prime}(i))=p^{\prime}(i).

View SourceRight-click on figure for MathML and additional features.

Thus all permutations of the values \{s(i)\vert i\in D\} are equally likely and this is independent of the number of iterations. \hfill\square

Setting C to be the empty set, we have shown:

Corollary 6: AIDA Produces Random Permutations

Algorithm 3 results in a permutation of the participating nodes chosen from the uniform distribution (all assignments are equally likely) when the secure sum method of Algorithm 1 is used.

Table IV Trace of the Slot Selection Method
Table IV- Trace of the Slot Selection Method

SECTION VI.

Comparison of AIDA Variants

In the previous section the algorithm to find an AIDA required that the random numbers be shared anonymously at step (3). We now look at three methods which are variants of that procedure. The parameter S must be chosen in each case. The expected number of rounds depends only on the selection of S and not on the variant chosen.

A. Slot Selection AIDA

The slot selection method was developed in [16] where a more detailed explanation may be found. In this variant of the AIDA algorithm, each node n_{i} submits the euclidean basis vector e_{r_{i}}\in GF(N+1)^{S}, zero except for a single one in component r_{i}, to a secure sum algorithm. A node which has received an assignment in a previous round, however, submits the zero vector. The sum T of these vectors is computed over the abelian group GF(N+1)^{S} using a secure sum algorithm. The random numbers chosen and their multiplicities are simple to determine as T_{k}=Card\{i:r_{i}=k\}.

Example 4: A Slot Selection AIDA

With the choice S=10 the AIDA example from the previous section would have executions of secure sum at each round with results as shown in Table IV. Using our example secure sum algorithm N=4 vectors of S=10 random numbers (not shown) would need to be chosen by each of the N=4 participating nodes at each round. \hfill\square

This variant of the algorithm has as its main drawback the very long message lengths that are encountered when using large S to keep the number of expected rounds small.

B. Prime Modulus AIDA

A prime P>S is chosen. Generally, P will be chosen as small as possible subject to this restriction. The random numbers chosen are distributed at step (3) as in Section III using the field F=GF(P) to compute the required power sums, the Newton polynomial p(x), and the polynomial roots.

This variant will be seen to result in shorter message lengths for communication between nodes. Again, the computation required to find the roots of the Newton polynomial is addressed in the Appendix. However, note that this computation can be delayed and thus overlaps any additional required rounds.

Additional rounds of the AIDA algorithm can proceed almost immediately as it is not necessary to solve p(x)=0before proceeding to the next round. Each node n_{i} merely computes the derivative polynomial p^{\prime}(x) and evaluates that polynomial at its chosen random value r_{i}. The value r_{i} is a multiple root if and only if p^{\prime}(r_{i})=0. Thus, if p^{\prime}(r_{i})=0 then node n_{i} chooses a new random number r_{i} for use in the next round. If p^{\prime}(r_{i})\ne 0 then the n_{i} has an assignment and for subsequent rounds will use r_{i}=0.

Example 5: An Application of the Derivative Test

Continuing the example of Section V, at step (3) of the first round the Newton polynomial p(x) = 10 x^4 + 2x^3 + 9x^2 + x + 6 and its derivative p^{\prime}(x) = 7x^3 + 6x^2 + 7x + 1 can be calculated (modulus 11) and evaluated as follows: \matrix{x=\hfill\!&\!\hfill r_1=6 \!&\!\hfill r_2=10 \!&\!\hfill r_3=6 \!&\!\hfill r_4=2\cr\noalign{\vskip-6pt}\multispan{5}\hrulefill\cr\phantom{\null'} p(x)=\hfill \!&\!\hfill 0\!&\!\hfill 0\!&\!\hfill 0\!&\!\hfill 0\cr p^{\prime}(x)=\hfill \!&\!\hfill 0\!&\!\hfill 4\!&\!\hfill 0\!&\!\hfill 7}

View SourceRight-click on figure for MathML and additional features.

Thus, only nodes n_{2} and n_{4} have received assignments in the first round. \hfill\square

C. Sturm's Theorem AIDA

It is possible to avoid solution of the Newton polynomial entirely. Sturm's theorem [32] allows the determination of the number of roots of a real polynomial p(x) in an interval (a, b] based on the signs of the values of a sequence of polynomials derived from p(x). The sequence of polynomials is obtained from a variant of the Euclidean Algorithm.

As in the previous variant, the power sums are collected and the Newton Polynomial is formed. However, the field used for computation is the field of rational numbers {\bf Q}. The test p^{\prime}(r_{i})=0 is again sufficient to determine whether or not n_{i} has received an assignment. A computational advantage arises in that the nodes do not need to solve the Newton polynomial p(x) to determine the (now implicitly) shared values. Assume that x=0 is not a root of p(x) as x^{k} has been factored out immediately if applicable. Each node n_{i} which has received an assignment must count separately multiple roots and also forms g(x)=gcd(p(x),p^{\prime}(x)). A multiple roots version of Sturm's theorem [32] is then applied to calculate the number of roots for the polynomial p(x) in the range, (0,r_{i}]. (Note that r_{i} itself is not a multiple root allowing application of the theorem.) The polynomial g(x)=gcd(p(x),p^{\prime}(x)) is a by-product of this computation. The same Sturm procedure is applied to g(x) thus obtaining a count of the multiple roots in the same range, (0,r_{i}]. The index received by n_{i} is then \displaylines{s_{i}={\rm Card}\{x:p_{i}(x)=0\wedge 0<x\leq r_{i}\}\hfill\cr\hfill-{\rm Card}\{x:g_{i}(x)=0\wedge 0<x\leq r_{i}\}}

View SourceRight-click on figure for MathML and additional features.

The collected power sums P_{i} are integers. To guarantee privacy, ideally, compute sums with Algorithm 2 using a field GF(P) with P greater than any possible value of P_{i}.

Our timings showed that using Sturm's theorem is not currently competitive with the various methods of polynomial solution using the “prime modulus” approach and runs twice as slow at best. Although we, therefore, do not give an example, the construction is straight forward. The application of Sturm's theorem requires use of an ordered field resulting in large polynomial coefficients. Unfortunately, we do not currently know of a computationally reasonable analog of this result which is usable over a finite field. However, some results in this direction are available [33].

Table V Data Bits Required Per Message
Table V- Data Bits Required Per Message

D. Communications Requirements of AIDA Methods

We now consider the required number of data bits for each of the three variant methods just described. This is the number of data bits that would be transmitted in each packet by the secure sum algorithm introduced earlier. The required numbers of data bits B are slightly overestimated by the formulae: \eqalignno{B_{prime}=&\, N\cdot\lceil \log_{2}(P+1)\rceil\cr B_{sturm}=&\,{{N(N-1)}\over{2}}\lceil \log_{2}(N)\rceil\cdot\lceil \log_{2}(S)\rceil\cr B_{slot}=&\, S\cdot\lceil \log_{2}(N+1)\rceil}

View SourceRight-click on figure for MathML and additional features.

Table V gives the number of required data bits for selected values of N and S. Header bits required by the transmission protocol, etc. are not included. Where the number of data bits required by the Sturm method is not competitive, a “–” has been entered to enhance readability of the table.

The computational requirements of the “slot selection” appear, at first, to be trivial. However, for every root that the “prime modulus” method must check, “slot selection” must input/output and check \log_{2}(N) {\rm bits}. This simple method can be recommended, for small N and S, when multiple rounds of the algorithm can be tolerated.

Computational time required by the “prime modulus” method grows with S. For most of the entries in the table, the amount of time is trivial. At N=1{,}000 and S=10{,}000 the computation required becomes a consideration. Here, somewhat less than 0.03 seconds of CPU time were required using the method described in the Appendix. When S was raised to 10,000,000 this increased to 9 seconds. The computation can be distributed across the participating nodes easily and efficiently with the expected nearly N-fold reduction in wall clock time required for computation.

SECTION VII.

The Completion Rate After R Rounds

Two nodes might make identical choices of random numbers, or slots as they will be termed in this section. One can only guarantee that a complete assignment of N nodes using S possibilities for slots or random number choices and R rounds will occur with at least a desired probability P_{S=s}(R,N)=P(R,N). The parameter S will usually be implicitly set to S=s by its omission in this section.

The reader may observe that estimating the number of assignments made in one round is essentially the well-known birthday problem. This and similar approaches have been used to compute bounds for our problem and others [12], [16]. However, to compute the expected number of assignments in R rounds precisely, it is necessary to know the probability distribution of the assignments made in a single round, and not just the average number. We now show how to compute the probabilities P(R,N).

As a first step, consider the probability p(N,A,C) that a single round, starting with N nodes and s slots, results in A assignments and C slots with conflicts as those slots have been chosen by more that one node. Note that Z=s-A-C is the number of slots that no (zero) nodes have chosen. The recurrence relations (3) determine p(N,A,C). \eqalignno{p(0, 0, 0)=&\, 1,\quad p(1, 1, 0)=1\cr p(N, A, C)=&\, 0\quad{\rm whenever} C<0\vee A<0\cr p(N, A, 0)=&\, 0\quad{\rm whenever} N\ne A\cr p(N, A, C)=&\, 0\quad{\rm whenever} N<A+2 C\cr p(N, A, C)=&\,{{1}\over{s}}(s-A-C+1)\cdot p(N-1, A-1, C)\cr&+{{1}\over{s}}C\cdot p(N-1, A, C)\cr&+{{1}\over{s}}(A+1)\cdot p(N-1, A+1, C-1)&{\hbox{(3)}}}

View SourceRight-click on figure for MathML and additional features.

The formulae are derived by assuming that N-1 nodes have chosen slots and looking at the next choice. The Nth node is to choose a slot resulting in A assignments and C conflicts. The slot it chooses could be unassigned, already in conflict with multiple occupants, or already assigned with exactly one occupant. For example, in the last case, the number of assignments falls by one with probability (A+1)/S where the configuration before the choice occurred with probability p(N-1, A+1, C-1). After the additional choice, there are N nodes, A of which are assigned and the number of conflicted slots rises by 1 to C.

More base conditions than are necessary have been given in the recursion relations (3). For efficiency, the values P(N, A, C) need to be cached during computation. However, we will need only to remember p_{N,A}, the probability that beginning with N nodes and s slots, exactly A assignments have been made after one round: p_{N,A}=\sum_{C=0}^{\lfloor N/2\rfloor}p(N, A, C)

View SourceRight-click on figure for MathML and additional features.

Let P(R,N,A) be the probability of A assignments being made after R rounds so that P(R,N)=P(R,N,N). Define \vec{P}(R,N) to be a vector of N entries, which may be interpreted as either row or column vectors, giving probabilities: \eqalignno{\vec{P}(R,N)=&\,\left(\matrix{P(R,N,N)\cr P(R,N,N-2)\cr\vdots\cr P(R,N,1)\cr P(R,N,0)}\right)\cr\vec{P}(0,N)=&\,\vec{e_{N}}=\left(0, 0,\ldots, 0, 1\right)}

View SourceRight-click on figure for MathML and additional features.

There is no entry for A=N-1 because a single collision results in A=N-2.

Table VI Percent Probability of an Incomplete Assignment After R Rounds
Table VI- Percent Probability of an Incomplete Assignment After $R$ Rounds

The matrix {\bf P}(N) gives the transition probabilities for a single round of AIDA starting with N nodes and ending with N-A nodes yet to be assigned. {\bf P}(N)=\left(\matrix{p_{0,0}& p_{2,2}& p_{3,3}&\cdots & p_{N,N-0}\cr 0 & p_{2,0}& p_{3,1}&\cdots & p_{N,N-2}\cr 0 & 0 & p_{3,0}&\cdots & p_{N,N-3}\cr 0 & 0 & 0 &\cdots & p_{N,N-4}\cr\vdots &\vdots &\vdots &\cdots &\vdots\cr 0 & 0 & 0 &\cdots & p_{N, 0}}\right)\eqno{\hbox{(4)}}

View SourceRight-click on figure for MathML and additional features.

Note that across each row of the array, the number of unassigned nodes is the same. For example, all the probabilities in the second row are for a single collision. The equation for getting the vectors of probabilities for the next round from the previous round is then found by matrix multiplication. \vec{P}(R,N)={\bf P}(N)\cdot\vec{P}(R-1,N)=({\bf P}(N))^{R}\cdot\vec{P}(0,N)\eqno{\hbox{(5)}}

View SourceRight-click on figure for MathML and additional features.

For the conventional form of a Markov chain, transpose {\bf P}(N) to get a right stochastic matrix whose rows each sum to 1. \vec{P}(R,N)=\vec{P}(R-1,N)\cdot{\bf P}(N)^{\top}\eqno{\hbox{(6)}}

View SourceRight-click on figure for MathML and additional features.

Thus, the formulae (5) and (6) can be used to obtain the desired probability P(R,N)=\vec{P}(R,N)_{1} of termination in R rounds. The probability of incomplete assignment in R rounds will be denoted by I(R,N)=1-P(R,N). We have calculated these I(R,N) for small R with values as high as N=1000 and S=50{,}000{,}000 using floating point computation with 256 bit mantissas, and verified the results by simulation. We formally state:

Theorem 7

The completion rate P(R,N)=\vec{P}(R,N)_{1} of Algorithm 3 with N nodes after R rounds is given by P(R,N)=(({\bf P}(N))^{R}\cdot\vec{e}_{N})_{1}

View SourceRight-click on figure for MathML and additional features.

Table VI gives values for percentages of incomplete assignments. Percentages are rounded up to three significant digits.

Example 6

For N=4 nodes, S=10 slots, and R=2 rounds, in (7), we show the probabilities for A=4, 2, 1, 0 assignments in terms of the same for R=1 as given by the matrix calculation P(2,4)={\bf P}(4)\cdot P(1,4) of (5). For R=1 the probability of a complete assignment is P_{S=10}(1,4)=50.4\%=1-49.6\%=1-I_{S=10}(1,4). This can be verified by multiplying the same matrix {\bf P}(4) with the starting state (0001) as the column vector on the right showing that P(1,4) is simply the right-most column of {\bf P}(4). For R=2 the probability of a complete assignment is 93.2832\%=1-6.7168\%. \cdot\left(\matrix{0.932832\cr 0.065016\cr 0.001368\cr 0.000784}\right)=\left(\matrix{1 & 0.9 & 0.72 & 0.504\cr 0 & 0.1 & 0.27 & 0.432\cr 0 & 0 & 0.01 & 0.036\cr 0 & 0 & 0 & 0.028}\right)\left(\matrix{0.504\cr 0.432\cr 0.036\cr 0.028}\right)\eqno{\hbox{(7)}}

View SourceRight-click on figure for MathML and additional features.

The entries in the matrix are, of course, calculated from the recurrence relations. However, one can validate some of the entries indirectly. For example, suppose the four nodes each choose a number in the range 1–10 and no assignments are made. Let C(n, k) denote the number of combinations of n items taken k at a time. Then p_{4,0}=0.028=(10+C(4,2)\cdot C(10,2))/10000 where the first term of the numerator is the number of ways in which all four numbers can be identical while the second term is the number of ways in which the four numbers can form two pairs of identical numbers. \hfill\square

An implementer would generally want to choose S to provide a high completion rate within just a few rounds. The material in this section is the most direct way to determine such completion rates. Section VIII indicates how additional information is obtained using computer algebra techniques.

SECTION VIII.

Algebraic Completion Statistics

For many purposes, the formulae of Section VII provide a satisfactory answer. However, the rich literature on absorbing Markov chains and the availability of computer algebra packages provide many other possibilities for analysis. To determine a desirable value for the number of slots “S=s” one can take advantage of the fact that the probabilities p_{N,A} are representable as rational functions of the number of slots S=s. In fact {\bf P}(N) is the N by N the upper, left-hand corner of an infinite matrix {\bf P}. When N is small, the entries, which have no discernible pattern, can be calculated by a computer algebra package from the recurrence relations yielding: {\bf P}=\left(\matrix{1 &{{s-1}\over{s}}&{{(s-2) (s-1)}\over{s^{2}}}&{{(s-3) (s-2) (s-1)}\over{s^{3}}}&\cdots\cr 0 &{{1}\over{s}}&{{3 (s-1)}\over{s^{2}}}&{{6 (s-2) (s-1)}\over{s^{3}}}& \cdots\cr 0 & 0 &{{1}\over{s^{2}}}&{{4 (s-1)}\over{s^{3}}}& \cdots\cr 0 & 0 & 0 &{{3 s-2}\over{s^{3}}}& \cr\vdots &\vdots &\vdots&\vdots & }\right)

View SourceRight-click on figure for MathML and additional features.

A. The Average Number of Rounds

Following standard methods for absorbing Markov chains [34], we determine the average number of rounds required.

Let {\bf Q}(N)^{\top} be the matrix formed by dropping the first row and the first column of {\bf P}(N) which correspond to the absorbing state A=N of the Markov chain (note the transposition). Given that {\bf I}_{k} is the k by k identity matrix, denote the fundamental matrix for the Markov chain [34] by: {\bf N}(N)=({\bf I}_{N-1}-{\bf Q}(N))^{-1}

View SourceRight-click on figure for MathML and additional features.{\bf N} is, interestingly, a lower-triangular, infinite matrix from which the upper-left N-1 by N-1 corner forms {\bf N}(N). Let \vec{1} denote a vector with all entries set to 1. The value of most interest from the standard theory [34] in our context is probably the last entry of the vector {\bf N}(N)\cdot\vec{1}R_{avg}(N)=\left({\bf N}(N)\cdot\vec{1}\right)_{N-1}
View SourceRight-click on figure for MathML and additional features.
which is the average number of rounds the algorithm will take. Related statistics such as the variance of R_{avg}(N) can be determined. One starting point would be the related work [35] which deals with bounds for the slotted ALOHA protocol where there are arrivals.

Example 7: An Average Number of Rounds Calculation

For larger values of N, say N>16, we would normally compute {\bf P}(N) and {\bf N}(N) numerically, but for our small running example with N=4 we proceed symbolically. A calculation with a computer algebra package gives R_{avg}(N)={{s^{2}\left(s^{2}+7s-2\right)}\over{(s-1)^{2}(s+1)(s+2)}}

View SourceRight-click on figure for MathML and additional features.

Substituting s=10 yields 1.56127 rounds required on the average. If one is dissatisfied with this and would prefer a better average of 1.1 rounds, solution of s^{2}\left(s^{2}+7s-2\right)=\left({11\over 10}\right) (s-1)^{2}(s+1)(s+2)

View SourceRight-click on figure for MathML and additional features.yields a root at s=59.2225 requiring a choice of s\geq 60. \hfill\square

B. Asymptotic Rates of Complete Assignments

Exact rates or somewhat simpler asymptotic bounds on the percentage of incomplete assignments I(R,N) for any given number of rounds can be obtained. The second-largest eigenvalue of the matrix {\bf P}(N) gives the asymptotic rate at which the percentage of incomplete assignments decreases between rounds. However, more precise information is available.

Since {\bf P}(N) is triangular, the eigenvalues are easy to determine and for parameters of interest the second-largest is 1/s. We restrict our attention to cases in which these eigenvalues are distinct. The only currently known exception is for the parameter, S=9 which yields the eigenvalue 1/81 twice. For parameters 4\leq N\leq S\leq 1000 there are no other occurrences. In the presence of duplicate eigenvalues Perron's formula for matrix powers [36] can be used to obtain a similar analysis, albeit more complex. Instead, we suggest simply using a slightly decreased value of S to obtain the bounds using the methods developed below should duplicate eigenvalues occur.

Using the matrix {\bf Q}(N), 1/s is the spectral radius and Gelfand's formula yields the asymptotic rate of contraction 1/s. The sum of the entries of \vec{P}(R,N) is 1 and thus: I(R,N)=\vec{1}\cdot\left(e_{N-1}\cdot{\bf Q}(N)^{R}\right)

View SourceRight-click on figure for MathML and additional features.

Let v denote the eigenvalues taken from the diagonal of {\bf Q}(N) and let {\bf E}(N) denote the matrix composed of a basis of corresponding left eigenvectors as rows. Representing {\bf Q}(N) with respect to the eigenvector basis we arrive at \displaylines{I(R,N)\hfill\cr\hfill=\vec{1}\cdot\left(\left(\left(e_{N-1}\cdot{\bf E}(N)\right)\ast\left(v_{1}^{R},\ldots,v_{N-1}^{R}\right)\right)\cdot{\bf E}(N)^{-1}\right)\quad{\hbox{(8)}}}

View SourceRight-click on figure for MathML and additional features.where “∗” indicates component-by-component multiplication of the two vectors. The formula transforms the starting state e_{N-1} into the eigenvector basis where the multiplication by powers of the eigenvalues performs the equivalent of the transformation by {\bf Q}(N)^{R}. The result is then transformed back to euclidean coordinates and the probabilities of the incomplete states are tallied with the inner product using \vec{1}.

Collecting the terms of formula (8) above yields an expression for I(R,N) of the form (9) with result:

Theorem 8

The expected percentage of incomplete assignments of N nodes after R rounds is: I(R,N)=a_{1}v_{1}^{R}+a_{2}v_{2}^{R}+\cdots+a_{N-1}v_{N-1}^{R}\eqno{\hbox{(9)}}

View SourceRight-click on figure for MathML and additional features.where v_{1},\ldots, v_{n} are the distinct eigenvalues from the diagonal of the truncated transition matrix {\bf Q}(N), and a_{1},\ldots, a_{N-1} are the solution of the linear equations: \left(\matrix{I(0,N)\cr I(1,N)\cr\vdots\cr I(N-2,N)}\right)=\left(\matrix{1 &\cdots & 1\cr v_{1}^{1}&\cdots & v_{N-1}^{1}\cr\vdots & &\vdots\cr v_{1}^{N-2}&\cdots & v_{N-1}^{N-2}}\right)\cdot\left(\matrix{a_{1}\cr a_{2}\cr\vdots\cr a_{N-1}}\right)\eqno{\hbox{(10)}}
View SourceRight-click on figure for MathML and additional features.

Note that the Vandermonde coefficient matrix can be ill-conditioned. Both special methods and explicit formulae exist for use in computing its inverse. It follows immediately that:

Corollary 9

The expected percentage of incomplete assignments of N nodes after R rounds is asymptotically: I_{0}(R,N)\approx a_{1}\cdot\left({{1}\over{s}}\right)^{R}

View SourceRight-click on figure for MathML and additional features.where a_{1} is computed by Theorem 8; i.e., the limit of the quotient of the approximately equal quantities is 1 as R\to\infty.

The quantities a_{i} and v_{i} are implicitly functions of s. Again, keeping s a free variable is useful only for small N.

Example 8: Calculation of an Asymptotic Bound

In Section VII using N=4 and S=10, we already computed I(0,4)=1, I(1,4)=0.496, and I(2,4)=0.67168. For subsequent use, we sort the entries from the diagonal of {\bf Q}(N) into descending order v_{1}=1/10, v_{2}=7/250, and v_{3}=1/100. Substituting into (10) yields: \left(\matrix{1\cr 0.496\cr 0.67168}\right)=\left(\matrix{1 & 1 & 1\cr 0.1 & 0.028 & 0.01\cr 0.01 & 0.000784 & 0.0001}\right)\cdot\left(\matrix{a_{1}\cr a_{2}\cr a_{3}}\right)

View SourceRight-click on figure for MathML and additional features.

Solving these linear equations, gives the form of (9) for the exact average rate of incompletion: I(R,N)=7.5\left({{1}\over{10}}\right)^{R}-10.5\left({{7}\over{250}}\right)^{R}+4\left({{1}\over{100}}\right)^{R}\eqno{\hbox{(11)}}

View SourceRight-click on figure for MathML and additional features.

Thus, we also obtain a simpler asymptotic rate for incomplete assignments of: I_{0}(R)=7.5\cdot\left({{1}\over{10}}\right)^{R}=a_{1}\cdot\left({{1}\over{s}}\right)^{R}

View SourceRight-click on figure for MathML and additional features.

This is seen to be an upper bound that rapidly converges:\matrix{R & \smash{\hbox{\vrule height1.0pc depth2.6pc}}&1 & 2 & 3 & 4 \cr\noalign{\vskip-6pt}\multispan{7} \hrulefill\cr I(R) && 0.496 & 0.067168 & 0.0072735 & 0.000743586 &\cr I_{0}(R) & &0.75 & 0.075 & 0.0075 & 0.00075 &}

View SourceRight-click on figure for MathML and additional features.\hfill\square

SECTION IX.

Finite Termination

Although the algorithms developed here terminate with probability 1, there is no absolute upper bound on the number of rounds required. Under some assumptions, it has been proven that finite termination cannot be guaranteed for the simpler leader election problem [18], [19]. While there may be extreme conditions under which no algorithm for AIDA can be guaranteed to finitely terminate, we conjecture only that at least N sequential communications are required in such an algorithm. On the other hand, the algorithms of [20], [21] are already collision free, but do not generate a permutation chosen at random from all possible permutations. For the current problem, the number of rounds is typically small and we do not recommend seeking finitely bounded termination.

For completeness, we sketch a cryptographic approach, that could guarantee finitely bounded termination, even without a trusted authority. (Note that some mental poker algorithms [23] could also be used to make this guarantee.) Suppose each node n_{i} has a unique, but not anonymous identifier 1\leq A_{i}\leq N, then the numbers R_{i}=E(s_{i}\cdot N+A_{i}) are unique where E() is an encryption function, and s_{i} is the usual sufficiently long seed (random number) known only to n_{i}. The function E() may be cooperatively generated with inverse unknown to the nodes individually using techniques from [37]. The use of these random numbers (r_{i}=R_{i}) at step (2) of Algorithm 3 would guarantee termination in a single round. However, polynomial solution for S of the required size is impractical.

The computational problems with this approach can be overcome by using the numbers R_{i} as pseudorandom bit streams in Algorithm 3. Each node n_{i} takes b bits from the head of its R_{i} bit stream at each round to form r_{i}. Two problems arise. First, nodes drawing identical random numbers in a round must add their position in the list of random numbers generated in that round p_{i}=Card\{r_{j}<r_{i}:1\leq j\leq N\}<N to the tail of their bit stream. This keeps all the bit streams distinct. Second, the analysis of the number of rounds required in Sections VII and VIII no longer applies a priori unless the encryption function E() has the difficult to obtain plausible deniability property [38]. However, provided that b>\lceil \log_{2}(N)\rceil, the bit strings R_{i} become shorter at each round and a fixed bound is easily calculated.

SECTION X.

Concluding Remarks

Each algorithm compared in Section VI can be reasonably implemented and each has its advantages. Our use of the Newton identities greatly decreases communication overhead. This can enable the use of a larger number of “slots” with a consequent reduction in the number of rounds required. The solution of a polynomial can be avoided at some expense by using Sturm's theorem. The development of a result similar to the Sturm's method over a finite field is an enticing possibility.

With private communication channels, our algorithms are secure in an information theoretic sense. Apparently, this property is very fragile. The very similar problem of mental poker was shown to have no such solution [22] with two players and three cards. The argument of [22] can easily be extended to, e.g., two sets each of N colluding players with a deck of 2 N+1 cards rather than our deck of 2 N cards.

In contrast to bounds on completion time developed in previous works, our formulae give the expected completion time exactly. We conjecture the asymptotic formula of Corollary 9, based on computational experience, to be a true upper bound.

All of the noncryptographic algorithms have been extensively simulated, and we can say that the present work does offer a basis upon which implementations can be constructed. The communications requirements of the algorithms depend heavily on the underlying implementation of the chosen secure sum algorithm. In some cases, merging the two layers could result in reduced overhead.

ACKNOWLEDGMENT

Ideas contributed by Dr. W. E. Clark were crucial in this research. He also tested several existing algorithms for finding roots of polynomials. The assistance of the reviewers is gratefully acknowledged, especially with respect to formalizing and strengthening the privacy results.

Appendix

The polynomials that must be solved are always of the form p(x)=C\cdot (x-r_{1})\cdot (x-r_{2})\cdot\ldots\cdot (x-r_{N})

View SourceRight-click on figure for MathML and additional features.where C is a constant. Thus, there are exactly N roots in GF(P), counting multiplicities. (Note that all formulae in this appendix are modulus P.) Algorithms such as Berlekamp [39] and Cantor-Zassenhaus [40] are implemented in computer algebra packages and were compared with the simple method developed below. A custom implementation of one of these methods might be faster, but as tested, they were at best 2.7times slower for our selected values of N and S.

Following [41], we use a method well-known in computer graphics to simply evaluate the polynomial p(x) at all values x=0,1,2,\ldots,P-1 to find its roots. This method also has the merit of being a task which is easily distributed among the nodes. With N=100 and S=10{,}000 less than 0.03 seconds of CPU time were required for the nondistributed version (2.2GHz processor). However, the computation time is less than 0.0004 seconds CPU time per node when distributed among the 100 nodes. For an extreme case, with N=1{,}000 and S=10{,}000{,}000, Horner's rule needed 316 seconds versus 9 seconds of single processor CPU time for our implementation.

Appendix

Newton Extrapolation Evaluation for Finding Roots

To use the Newton extrapolation formula [42], the polynomial p(x) must first be represented with respect to the Newton (backward) basis polynomials. Our computations will use the formulae specialized for consecutive integers. Let n_{0,k}(x), n_{2,k}(x),\ldots, n_{N,k}(x) designate the basis Newton polynomials constructed at the values x_{N}=N+k, x_{N-1}=N-1+k,\ldots, x_{0}=k. \eqalignno{n_{N,k}(x)=&\, 1\cr n_{N-1,k}(x)=&\, (x+k-N)\cr n_{N-2,k}(x)=&\, (x+k-N)(x+k-(N-1))\cr \cdots\cr n_{0,k}(x)=&\, (x+k-N) (x+k-(N-1))\cr\cdots&(x+k-2)(x+k-1)}

View SourceRight-click on figure for MathML and additional features.

The Newton polynomial for p(x) is then given by p(x)={b_{N-0,k}\over 0!}+{b_{N-1,k}\over 1!}\cdot n_{N-1,k}(x)+\cdots+{b_{N-N,k}\over N!}\cdot n_{0,k}(x)

View SourceRight-click on figure for MathML and additional features.

The backward repeated differences b_{j,k} are computed by:b_{j,k}=p(n+k);\quad b_{j-1,k}=b_{j,k}-b_{j,k-1}\eqno{\hbox{(12)}}

View SourceRight-click on figure for MathML and additional features.

During the first phase of the algorithm, computing p(-N),\ldots, p(1), p(0), and then using the recursive formulae above yields values for b_{N,0}, b_{N-1,0},\ldots, b_{1,0}, b_{0,0}. Since p(x) is a polynomial of degree N, b_{0,k}=K, a constant, for all k. This fact allows the computation of b_{\ast,k+1} from b_{\ast,k} during the second phase of the algorithm by means of the formulae: b_{0,k+1}=b_{0,k};\quad b_{j,k+1}=b_{j-1,k+1}+b_{j,k}\eqno{\hbox{(13)}}

View SourceRight-click on figure for MathML and additional features.

Thus, we have the initially found values b_{N,k},\ldots,b_{0,k} with k=0 and we proceed to compute these values for k=1,\ldots,P-1 in succession. At each iteration a root is found if b_{N,k}=p(k)=0.

The complexity of the algorithm is effectively O(N\cdot P), when additions modulus P are performed in unit time.

Appendix

Deflation by a Root

We gain an approximately two-fold speedup with our own method to deflate the polynomial when a root r is found.

During the second phase of the algorithm, suppose that a root is found and that a deflation p_{1}(x)=p(x)/(x-r) occurs. Then, let our Newton coefficients be represented by b_{j,k} and b_{j,k}^{1} for p(x) and p_{1}(x) respectively. That r is a root implies b_{N,r}=0. The values b_{\ast,r-1}^{1} are computed from the values b_{\ast,r} by: b_{N_{1}-j, r-1}^{1}={1\over (j+1)}\ast b_{N-j-1, r}\eqno{\hbox{(14)}}

View SourceRight-click on figure for MathML and additional features.

Here N_{1}=N-1 since deg(p_{1})=deg(p)-1. These relations are obtained by dividing the formula for p(x) expressed as a Newton polynomial by (x-r) and equating coefficients.

Appendix

Coding the Polynomial Solution Algorithm

The algorithm was coded using the programming language C and tested using several different compilers and computers. It is assumed that any zero roots have been eliminated by shifting of coefficients. We use a single array {\tt b[j]} to contain the values b_{j,k}. The array is initialized with the values {\tt b[{0}]}=p(-N),\ldots,{\tt b[N-1]}=p(-1),{\tt b[N]}=p(0)

View SourceRight-click on figure for MathML and additional features.

The first set of values b_{j,k} for k=N are found by taking recursive differences as developed in (12). This task is accomplished by the following code: {\tt for (topRow=N-1 downto topRow=0) do} {\tt for (j=0 to j=topRow) do} {\tt b[j]\ <\!\!\!-\ b[j+1]-b[j]} {\tt if (b[j]<0) then b[j]\ <\!\!\!-\ b[j]+P} {\tt od} {\tt od}

The values b_{\ast,k+1} are found by the following code from those for b_{\ast,k} according to the formulae (13). {\tt repeat k=k+1;} {\tt for (j=1 to j=N) do} {\tt b[j]\ <\!\!\!-\ b[j]+b[j-1]} {\tt if (b[j]>=P) then b[j]\ <\!\!\!-\ b[j]-P;} {\tt od} {\tt until (b[N]=0 or k=P-1)}

The entry {\tt b[N]} contains p(k) at each iteration. When {\tt b[N]} is zero, the polynomial is deflated by: {\tt if (b[N]=0) then} {\tt for (j=0 to j=N-1) do} {\tt b[j]\ <\!\!\!-\ 1/(N-j)\ast b[j] mod P} {\tt od} {\tt N\ <\!\!\!-\ N-1} {\tt k\ <\!\!\!-\ k-1} {\tt fi} where {\tt 1/(N-j)} is looked up in a table. Control returns to the repeat-until loop when {\tt k<(P-1)}.

The last two code segments above both contain the test {\tt b[N]=0}. We eliminated an if-then-else structure in favor of the two separate tests in order to create a smaller innermost loop. This doubled the execution speed. Introducing registers to contain {\tt b[j]} and {\tt b[j-1]} and rearranging the code to test {\tt b[j]<0} rather than {\tt b[j]>=P} in the inner loop also provided gains in speed. Use of optimization options in compilation did reduce runtime, but did not obviate the need for the noted coding changes. Our faith in the current generation of optimizing compilers has been slightly diminished.

References

References is not available for this document.