Quantifying Membership Privacy via Information Leakage

Machine learning models are known to memorize the unique properties of individual data points in a training set. This memorization capability can be exploited by several types of attacks to infer information about the training data, most notably, membership inference attacks. In this paper, we propose an approach based on information leakage for guaranteeing membership privacy. Specifically, we propose to use a conditional form of the notion of maximal leakage to quantify the information leaking about individual data entries in a dataset, i.e., the entrywise information leakage. We apply our privacy analysis to the Private Aggregation of Teacher Ensembles (PATE) framework for privacy-preserving classification of sensitive data and prove that the entrywise information leakage of its aggregation mechanism is Schur-concave when the injected noise has a log-concave probability density. The Schur-concavity of this leakage implies that increased consensus among teachers in labeling a query reduces its associated privacy cost. Finally, we derive upper bounds on the entrywise information leakage when the aggregation mechanism uses Laplace distributed noise.


Introduction
In recent years, many useful machine learning applications have emerged that require training on sensitive data. Such applications span across a diverse range of fields such as medical imaging [1], rumor identification in social media [2], or financial fraud detection [3]. While all machine learning applications by definition reveal some information about the training data, privacy concerns arise when machine learning models memorize properties that are unique to individual data entries. In fact, a variety of privacy attacks have demonstrated that it is indeed possible to exploit this "memorization" capability of models to infer information about data entries in the training set [4].
Arguably, the simplest type of privacy attacks against machine learning models is membership inference attacks in which an adversary infers whether or not a certain data point was used in the training [5,6]. In response to such attacks, a number of mitigation techniques have been proposed in the literature, with differential privacy-based methods being the most commonly studied. Differential privacy [7] provides provable and operationally meaningful privacy guarantees, and by definition neutralizes membership inference attacks. Roughly speaking, differential privacy ensures that all datasets differing in only one entry (i.e., adjacent datasets) produce an output with similar probabilities. Moreover, it has several useful properties, such as satisfying data-processing inequalities and composition theorems [7].
The standard definition of differential privacy (i.e., pure differential privacy) uses a parameter to define a multiplicative upper bound on the changes in the probability of an output for all adjacent datasets in the input [8]. However, this definition is known to be very strict, and has limited applicability. As such, several relaxations of differential privacy have been proposed, the most notable of which is ( , δ)-differential privacy [9]. A common interpretation of ( , δ)differential privacy is that the guarantees of -differential privacy hold except with probability δ. Thus, it provides the necessary flexibility for studying a larger class of privacy-preserving mechanisms such as the Gaussian mechanism [8].
Despite the advantages of ( , δ)-differential privacy, one should note that its privacy guarantees are qualitatively different from those of pure differential privacy (see [10] for illustrative examples). On this account, recently Rényi differential privacy [10] was proposed as an alternative relaxation of pure differential privacy. While Rényi differential privacy satisfies the same useful properties as pure differential privacy, it does not offer any intuitive operational meaning, and its privacy guarantees are usually translated into ( , δ)-differential privacy for interpretation.
In this paper, we propose to use (a conditional form of) the notion of maximal leakage [11] to measure the amount of information leaking about any single data entry in a dataset, i.e., the entrywise information leakage. Maximal leakage [11] is an operationally meaningful privacy metric that captures the inference capabilities of an adversary trying to deduce some information about the input data by observing the output. Specifically, maximal leakage quantifies the maximal gain in an adversary's ability to correctly guess any arbitrary discrete function of the input data by observing the output (as opposed to making a guess with no observations). Note that the original definition of maximal leakage quantifies the information leaking about the whole dataset, whereas we are interested in measuring the information leaking about single data entries in the dataset. As such, similar to [12], we consider an adversary who knows the values of all the entries in the dataset, except for a single data entry of interest. Intuitively, in this setup, observations only convey the unique information contributed by the unknown data entry since all other entries are already known to the adversary. To quantify this entrywise information leakage, we propose a conditional form of maximal leakage, namely the pointwise conditional maximal leakage. Then, by allowing the unknown entry to be any of the entries in the dataset, we can derive upper bounds on the entrywise information leakage, and provide meaningful worst-case privacy guarantees.
Maximal leakage satisfies several useful properties, most notably a data-processing inequality and a composition lemma [11]. The data-processing inequality ensures than no manipulation of the output can increase the information leakage, while the composition property characterizes the information leaked through multiple independent observations. Here, we show that the same properties hold for pointwise conditional maximal leakage, rendering it suitable for privacy analysis of more complex information systems.
We apply our privacy analysis to the Private Aggregation of Teacher Ensembles (PATE) framework [13,14]. PATE is a general framework for privacy-preserving classification of sensitive data, and operates by transferring the knowledge of an ensemble of models (called teachers) trained on disjoint partitions of the sensitive data to a student classifier. Specifically, the student is trained using a public unlabelled dataset which will be labelled by the teachers through an aggregation mechanism. The aggregation mechanism is essentially the Report-Noisy-Max mechanism [7] which adds noise to the teachers' predictions to enable derivation of privacy guarantees. PATE has several advantages as a privacy-preserving machine learning framework. First, the privacy guarantees result solely from the aggregation mechanism and are agnostic to the specific machine learning techniques used by each teacher. This is because the modular structure of PATE enables us to invoke the data-processing inequality to uncouple the information leaked through the training and aggregation, and guarantee that the overall leakage is less than both. Second, PATE lends itself well to distributed learning by allowing data owners to separately train their own predictors, hence mitigating the need for centralized storage of the sensitive data. Finally, the aggregation mechanism induces a favorable synergy between privacy and accuracy such that increased agreement among the teachers in labelling a query lowers its associated privacy cost. This synergy is one of the main focuses of this paper, and will be extensively studied.
The privacy guarantees established by PATE are characterized in [13,14] in terms of differential privacy, and results from experiments are reported. Here, we will analyze the privacy of the framework in terms of the entrywise information leakage while our goal is to gain a deeper understanding of the operation of the framework, especially the Report-Noisy-Max mechanism used for aggregating teachers' predictions. As [13,14] present a thorough experimental study, here we refrain from repeating the experiments but focus on giving a rigorous analysis of the framework to explain properties such as the privacy-accuracy synergy.

Contributions
Our contributions can be summarized as follows: i) Introducing pointwise conditional maximal leakage. We approach membership privacy from a novel angle by studying the information leakage of individual data entries in a database. We begin by deriving a data-processing inequality and a composition lemma for the pointwise conditional maximal leakage, and then apply them to the problem of studying the entrywise information leakage in PATE.
ii) Proving the privacy-accuracy synergy in PATE. We show that the entrywise information leakage of the aggregation mechanism in PATE (i.e., the Report-Noisy-Max mechanism) is Schur-concave [15,16] when the injected noise has a log-concave [17,18] probability density. As we will see, this implies that increased consensus among teachers lowers the privacy cost of labelling a query. Note that many commonly used probability distributions including the Laplace and Gaussian distributions are log-concave rendering this result fairly general.
iii) Deriving membership privacy guarantees for PATE with Laplace noise. We derive upper bounds on the entrywise information leakage when the noise injected in the aggregation mechanism has Laplace distribution.

Other Related Work
Information leakage metrics. In recent years, a large body of work has been dedicated to studying various informationtheoretic privacy metrics. Most notably, mutual information has been frequently proposed and studied as such a metric (see e.g., [19][20][21]) by appealing to its operational meaning in communication theory. Similarly, in [22] another information-theoretic quantity namely the total variation distance is studied as a privacy metric in an information disclosure scenario. More closely related to our approach, several information leakage metrics have recently emerged that aim to capture the inference abilities of an adversary trying to guess a secret. For instance, [23] proposes to use the probability of correctly guessing the secret as a privacy metric. In [24] a class of tunable loss functions are introduced to capture a range of adversarial objectives, e.g., refining a belief or guessing the most likely value for the secret. Other methods include posing the privacy problem as a hypothesis test, e.g., in [25]. It is worth mentioning that the majority of the proposed privacy metrics have no clear operational meaning which limits their applicability. A systematic survey of privacy metrics is provided in [26].
Privacy-preserving machine learning. Several centralized and decentralized solutions have been proposed in the literature that provide privacy guarantees in terms of differential privacy. To give a few examples, [27] proposes a collaborative framework for privacy-preserving deep learning where the guarantees of differential privacy are obtained by perturbing the gradients. Another example is [28] where the privacy analysis of gradient perturbations are improved by introducing the moments accountant framework. Other methods include privacy-preserving logistic regression [29,30], support vector machines [31] and empirical risk minimization [32,33].

Outline of the Paper
The rest of the paper is organized as follows: in Section 2 we will review the definition of maximal leakage and give a short summary of the operation of the PATE framework. In Section 3 we will present the definition of pointwise conditional maximal leakage, and state a few of its key properties. In Section 4 we will present our privacy analysis of the framework and state our results. Section 5 concludes the paper.

Background
Throughout this work, upper-case letters are used to represent discrete random variables, upper-case calligraphic letters represent their corresponding alphabets and lower-case letters represent the elements of the alphabets. Furthermore, 1, n = {1, . . . , n} denotes the set of integers between one and n, | · | denotes the cardinality of a set, and log(·) denotes the natural logarithm. Finally, all sets considered in this paper are assumed to be finite.
We begin by reviewing a few key concepts.

Maximal Leakage
Let X be a random variable representing the data containing sensitive information, and Y be the publicly observed output of a probability kernel P Y |X with input X. Suppose that an adversary observes Y and wishes to guess an arbitrary discrete function of X, denoted by U .
Definition 1 (Maximal leakage [11]) Suppose P XY is a joint distribution defined on the alphabets X and Y. The maximal leakage from X to Y is defined as whereÛ is the optimal estimator (i.e., MAP estimator) taking values from the same alphabet as U .
Maximal leakage quantifies the maximal gain in the adversary's ability to correctly guess U after observing Y (compared to correctly guessing U with no observations). It is shown in [11, Theorem 1] that for finite alphabets X and Y, (1) simplifies to

The PATE Framework
PATE [13,14] is a general framework for privacy-preserving classification of sensitive data. It operates by transferring the knowledge of an ensemble of classifiers, called teachers, trained on (disjoint) partitions of the sensitive data to a student classifier. More specifically, the PATE framework consists of the following three main components: Teacher models. A teacher is a classification model trained on one of the disjoint partitions of the training data, and can use any classification algorithm suited for the task. At inference, each teacher predicts a label independently of others, to which we will refer as that teacher's vote. Thus, partitioning data into L sets (and correspondingly training L teachers) produces L primary votes for predicting the label of any new data point.
Aggregation mechanism. To predict the label of a new data point, the aggregation mechanism (i.e., the Report-Noisy-Max mechanism [7]) constructs the histogram of teachers' votes, adds calibrated noise to each of the bins, and outputs the class label with the maximum noisy vote as the final aggregate prediction. Note that the overall privacy guarantees of the framework result from the addition of noise in the aggregation mechanism.
Student model. The student model is trained using a public unlabelled dataset which will be labelled by the teachers' ensemble through the aggregation mechanism. Note that to limit the privacy cost of the overall system, the student must be trained with as few queries to the teachers as possible.

Pointwise conditional maximal leakage
In this section, we introduce the notion of pointwise conditional maximal leakage, and state two of its important properties. Recall that maximal leakage is defined in a setup where an adversary wishes to guess an arbitrary discrete function U of the private input data X by observing the output Y . Here, we consider the case where the adversary has some a priori knowledge about X. We model this a priori knowledge as the outcome of a random variable, and accordingly define a conditional form of maximal leakage. Consider an adversary that knows the outcome of a random variable Z.
Definition 2 (Pointwise conditional maximal leakage) Suppose P XY Z is a joint distribution defined on the alphabets X , Y and Z, and that the value of the random variable Z is a priori given as z ∈ Z. The pointwise conditional maximal leakage from X to Y given Z = z is defined as whereÛ is the optimal estimator of U given Y and Z = z, andŨ is optimal estimator of U given only Z = z.
Proposition 3 For finite alphabets X , Y and Z, the pointwise conditional maximal leakage can be expressed as The proof is given in Appendix A.1. Our definition of conditional maximal leakage differs slightly from the one proposed in [11]. In [11] the leakage is conditioned on the random variable Z itself, which translates into a maximization over the outcomes of Z in (4). We, on the other hand, are conditioning the leakage directly on outcomes of Z since we are interested in characterizing the leakage for all outcomes, not just the one with the highest leakage.
Similarly to [11], we now state two important properties of the pointwise conditional maximal leakage: a dataprocessing inequality and a composition lemma. These properties will be used in the next section to analyze the entrywise information leakage of the PATE framework.   [13]: each partition of the sensitive training data is used to train a teacher. A student model is then trained using a public data-set labelled by the noise-perturbed predictions of the teachers. An adversary who knows all the data-entries except for D * is trying to guess D * by observing teachers' responses to queries made by the student.
Lemma 5 states that the information leaked to multiple independent observations is upper bounded by the sum of the information leaked through each of the observations.
Lemma 6 states that all processing of the output can only decrease the information leakage. Further, it allows us to upper bound the end-to-end leakage of a complex mechanism in terms of the leakages of its smaller intermediate mechanisms.
The proofs of Lemma 5 and Lemma 6 are given in Appendix A.2 and A.3, respectively.

Information leakage analysis of PATE
In this section we we will use the pointwise conditional maximal leakage to measure the information leaking about individual data entries in the PATE framework. We will begin by describing the our system model in Section 4.1. Then, in Section 4.2 we will first prove that increased consensus among teachers in answering queries induces a lower privacy cost (i.e., the privacy-accuracy synergy), and then, state bounds on the entrywise leakage when the noise is Laplace distributed.

System Model
Suppose d = ((x 1 , y 1 ), . . . , (x n , y n )) ∈ X n × Y n represents the training data where X is the arbitrary but finite domain set, Y = 1, m is the label set. The pairs (x i , y i ) are sampled independently according to some distribution P over X × Y, i.e., D ∼ P n . We use the training data d to train L teachers for a classification task with m ≥ 2 classes in the PATE framework. Let (d (1) , . . . , d (L) ) represent a disjoint partitioning of the training set such that This results in a total of L teacher models, classifying queries independently of each other.
The student model is trained using a public and unlabelled dataset, which will be labelled by the teachers ensemble in a privacy-preserving manner. Let (x 1 , . . . , x k ) ∈ X k be the independently sampled unlabelled dataset and suppose that the student queries the ensemble about the label of x i . Each teacher separately predicts a label for x i , referred to as a vote.
Let v(x i ) = (v 1 (x i ), . . . , v m (x i )) be the histogram of teachers' votes, where v j (x i ) = |{l : l ∈ 1, L , f l (x i ) = j}| corresponds to the number of teachers who classified x i as belonging to class j. Note that m j=1 v j (x i ) = L. The aggregation mechanism in PATE is essentially the Report-Noisy-Max mechanism [7] which operates by adding i.i.d. noise samples to the bins of the votes' histogram, and returning the class label with the highest (noisy) value. Let Lap(b) denote the Laplace distribution with location 0 and scale b. Suppose N = (N 1 , . . . , N m ) is a sequence of i.i.d.
Laplace random variables, where N j ∼ Lap( 1 γ ) for all j ∈ 1, m represents the noise added to the jth bin. Note that γ determines the dispersion of the noise, and thus, affects the privacy guarantees of the system. Roughly speaking, smaller values of γ correspond to larger noise, and in turn, stronger privacy guarantees. Finally, let Y i = arg max j v j (x i ) + N j be the random variable denoting the predicted label for x i returned by the aggregation mechanism. Labelling the entire dataset (x 1 , . . . , x k ) produces k such predictions, each of which entailing a privacy cost. The system model is depicted in Figure 1.

Measuring the Entrywise Information Leakage
Now, we will measure the information leaking about individual data entries in the training set using the notion of pointwise conditional maximal leakage. In order to be able to quantify the entrywise leakage, let us consider the following scenario: assume an adversary knows the values of all the entries in the training set except for a single entry denoted by D * = (X * , Y * ). The adversary tries to guess the value of D * (or any arbitrary discrete function of it) by observing the queries made by the student and their corresponding labels returned by the aggregation mechanism. Clearly, in this setup, observations leak information only about the unknown entry D * since the adversary already knows all the other entries. Now, assume that the adversary has perfect knowledge of the algorithms used to train each teacher, and that the training is done deterministically. That is, we will assume that all classification algorithms and the resulting teacher models (i.e., predictors) are deterministic. Intuitively, this assumption describes the least private scenario where the training leaks a lot of information about D * , and the overall privacy guarantees stem only from the aggregation mechanism. As we are studying the worst-case leakage, our analysis remains valid for all PATE structures regardless of how the teachers are trained, or what classification algorithms are used.
It follows naturally from the previous assumptions that, in principle, the adversary knows all the votes except for the vote of the teacher whose training partition includes D * . Note that we are considering a general setup in which any single data entry can arbitrarily affect the vote of its teacher, resulting in observations which are highly informative for inferring the data entry of interest. In other words, if the adversary can already predict the last vote there is no information left to be leaked.
Based on the scenario we described, let D − = D \ D * be the random variable representing the portion of the training set which is known to the adversary, and let V − (x i ) = (V − 1 (x i ), . . . , V − m (x i )) be the random variable representing the histogram of the known votes for input x i . Note that m j=1 V − j (x i ) = L − 1 for all x i ∈ X . For simplicity, let Y = (Y 1 , . . . , Y k ) denote the sequence of random variables representing the predicted labels for the queries (x 1 , . . . , x k ). We are interested in quantifying the information leaking about D * to Y assuming that the adversary knows D − = d − , that is where (a) follows from (5) since the Markov chain D − − D − Y clearly holds. Using Lemma 5 we can upper bound the information leaked through multiple queries by writing that is, the information leaked to the output of multiple queries is upper bounded by the sum of the information leaked through individual queries. Further, using Lemma 6 we can upper bound the information leaked to the output of a single query as i.e., the information leaked to the output of a single query is upper bounded by the smallest of the information leaked through the training and the information leaked through the aggregation mechanism.
As we can make no assumptions about the privacy of the training process, we now turn to evaluating the information leaked through the aggregation mechanism. Let δ j = (0, . . . , 0, 1, 0, . . . , 0) be a sequence with all components equal to 0, except for the jth component which equals 1. We will use δ j to represent a single vote for class j. Then, we have where (a) follows from (5), and (b) follows from the fact that the probability of outputting class j is maximized when the last vote (i.e., the vote of the teacher whose training partition includes D * ) is placed for class j.
We will now evaluate (11) using the ideas of majorization theory [15,16], and assuming that the noise in the aggregation mechanism has a log-concave [17,18] probability density. Specifically, we will find the v − maximizing (or minimizing) the information leakage of the aggregation mechanism for any noise with log-concave probability density.
Definition 8 (Schur-concave function) Consider a real-valued function Φ defined on I n ⊂ R n . Φ is said to be Schur-concave on I n if p q on I n implies Φ(p) ≤ Φ(q).
Definition 9 (Log-concave function) A non-negative function f : R n → R + is said to be log-concave if it can be written as f (x) = exp φ(x) for some concave function φ : R n → [−∞, ∞).
Note that many commonly used probability density functions (and their corresponding CDFs) are log-concave, such as the Laplace and the Gaussian distributions [17].
Theorem 10 Consider the aggregation mechanism in PATE (i.e., the Report-Noisy-Max mechanism) where the noise has a log-concave probability density. Then, and is minimized when for each j ∈ 1, m . The proof is give in Appendix B.

Remark 11
The Schur-concavity of the entrywise information leakage of the aggregation mechanism implies that stronger consensus among teachers lowers the amount of information leaked about any individual data entry.
The preceding remark points to one of the main advantages of the PATE framework: increased accuracy of the teacher models results in stronger consensus in predicting the label of a given query, which, in turn, results in stronger privacy guarantees. Note that [13,14] come to the same conclusions regarding the synergy between privacy and accuracy for the case of Laplace and Gaussian noise distributions, whereas here we have stated a systematic proof of this desired property and have generalized it to the class of log-concave probability densities.
Applying Theorem 10 to (11) yields the following result for the leakage of the aggregation mechanism with Laplace noise.

Proposition 12
Consider the PATE framework where noise with Laplace distribution is used in the aggregation mechanism. Given the votes of all teachers except for one, the information leaked to the output of a single query is upper bounded by where H(0) := γ and and the bound is attained The proof of this result is given in Appendix C.1. Proposition 12 describes a bound that depends on m, the number of classes. It can be verified through simple calculations that the bound is non-decreasing in m, the number of classes. Therefore, by letting m tend to infinity, we get the following simple bound describing the worst-case entrywise information leakage.
Theorem 13 Consider the PATE framework where noise with Laplace distribution is used in the aggregation mechanism, and suppose that all entries in the training set are known to an adversary except for a single entry D * . The information leaked about D * as a result of labelling a single query is bounded by The proof of this result is given in Appendix C.2. Note that while Theorem 13 provides a simple and general upper bound on the entrywise information leakage, it is possible to get tighter bounds by directly calculating the leakage of the aggregation mechanism in (11) using the conditional probabilities. This is demonstrated in the following example.
which is a tighter bound than the one provided by Theorem 13. Note that due to the Schur-concavity of it was already expected that information leakage would be largest for (4, 3, 2, 1), and it would have sufficed to just consider this case.
We conclude the paper by summarizing our results in the following Corollary.
This result is a direct consequence of Theorem 13 and Lemma 5, and characterizes the overall information leaked about a single data entry as a result of training a student classifier using k queries to the teachers.

Conclusions
In this paper, we proposed an approach based on information leakage for quantifying membership privacy. Particularly, we showed that the pointwise conditional maximal leakage, a conditional form of maximal leakage, can be used to measure the information leaking about individual data entries in a dataset. We applied our privacy analysis to PATE and derived novel privacy guarantees for this privacy-preserving classification framework in the form of upper bounds on its entrywise information leakage when the injected noise has Laplace distribution. We also showed that the privacy-accuracy synergy of PATE can be explained by studying the entrywise information leakage of the framework while it was only intuitively justified through the lens of differential privacy.
As our work has taken a step towards gaining a deeper understanding of some underlying privacy principles, our results can be used in the design of machine learning algorithms that preserve both privacy and utility. For example, we envision that based on the precise characterization of the leakage presented in our work one can employ data-dependent perturbation, i.e., noise that scales based on the privacy cost entailed by each query. This reduces the overall amount of noise, which, in turn, will improve the utility of the system. Another potential application is in privacy thresholding schemes where queries which are expensive in terms of privacy will not be answered at all. Once again this method will improve both the privacy and the utility of the system since the expensive queries are precisely those which were not labelled with certainty by the teachers.
Appendix A Proofs of the results in section 3

A.1 Proof of Proposition 3
This result follows readily from [11, Theorem 1] by considering L(X → Y ) such that P X = P X|Z=z . Nevertheless, we provide an alternative proof.
Upper bound: First, we prove the upper bound on L(X → Y | Z = z). Consider any discrete U satisfying U − (X, Z) − Y and define whereÛ andŨ are MAP estimators of U . Then, For each z ∈ Z, define U z := {u : and Thus, for all U such that U − (X, Z) − Y holds. Then, Lower bound: To prove the lower bound on L(X → Y | Z = z), we will consider a discrete U for which L U (X → Y | Z = z) attains the bound. We fix a U such that U − (X, Z) − Y holds and H(X | U ) = 0, that is, the value of X is completely determined by the value of U . Further, we assume that U | Z = z is uniformly distributed, i.e., P U |Z (u | z) = 1 |Uz| for all z ∈ Z and u ∈ U z . Then, Therefore, Hence, from (23) and (25) it follows that A.2 Proof of Lemma 5 Consider the Markov chain Y 1 − (X, Z) − Y 2 . Then, Therefore,

A.3 Proof of Lemma 6
Our proof follows the same reasoning as the proof of [11,Lemma 1]. For all discrete U satisfying where L U is defined in (19). Therefore, Similarly, Thus,

Appendix B Proof of Theorem 10
Before stating the proof, let us recall some concepts/results from majorization theory.
Definition 16 (Symmetric function) Let x = (x 1 , . . . , x n ) ∈ I n ⊂ R n and consider a real-valued function Φ : I n → R. The function Φ(x) is said to be symmetric if x can be arbitrarily permuted without changing the value of Φ(x).
Lemma 17 (Schur's condition) Let x = (x 1 , . . . , x n ) ∈ I n ⊂ R n and consider a continuously differentiable function Φ : I n → R. Φ(x) is Schur-concave on I n if and only if it is symmetric on I n and Since Φ(x) must be symmetric, it is sufficient to verify the reduced condition Proposition 18 ([16, Theorem 2.21]) Let x = (x 1 , . . . , x n ) ∈ R n + and let f : R n + → R + be a Schur-concave function. Consider the following problems and Then, the global maximum is achieved by and the global minimum is achieved by x min = (0, . . . , 0, S, 0, . . . , 0).
We now prove that the entrywise information leakage of the aggregation mechanism is Schur-concave when the injected noise has a log-concave probability density. In order to simplify the proof, we will assume that the elements of v − (i.e., the histogram of known votes) can take non-negative real values. The results of the proof, however, will be readily applicable to histograms of non-negative integers.
Using (11) we define where δ j = (0, . . . , 0, 1, 0, . . . , 0) represents a single vote for class j. Then, It is clear from (40) that the leakage does not depend on the order of elements in Moreover, according to [15, 3.B.1], the composition of an increasing function and a Schur-concave function remains Schur-concave. Since log(·) is an increasing function, to prove the Schur-concavity of the entrywise leakage we only need to verify Schur's condition for f (v − ). Let  N = (N 1 , . . . , N m ) denote the tuple of noise, where the elements are independent, identically distributed and have a log-concave probability density. We write

Without loss of generality assume that
We now show that both A 1 − A 2 and B (1,j) − B (2,j) + B (3,j) − B (4,j) are non-positive. However, let us first recall some properties of log-concave functions.
Proposition 19 ([18, Lemma 1]) Consider g : R → R + and suppose that {x : g(x) > 0} = (a, b). Then, g(x) is log-concave if and only if for all a < x 1 ≤ x 2 < b and all δ ≥ 0 it holds that Proposition 20 ([17, Remark 2]) Suppose g : R → R + is a continuously differentiable function and let {x : g(x) > 0} = (a, b). Then, g(x) is log-concave if and only if g (x) g(x) is a non-increasing function of x in (a, b).
We now apply Proposition 19 to the preceding equation by noting that u ≥ u + v − 2 − v − 1 (due to the non-increasing order of the elements in v − ), and write and P{N l < (t + 1)} = 1 2 e γ(t+1) t ≤ −1, 1 − 1 2 e −γ(t+1) t ≥ −1. (61) Thus, we have It is straightforward to calculate integrals A and C as and H(0) = γ. Thus, Note that H(m) is non-negative and monotonically decreasing in m. Since Finally, we have

C.2 Proof of Theorem 13
In order to prove the bound, we will show that k(m) Since m is an integer, we will check the second-order difference of the leakage with respect to m. The first-order difference is ∆k(m) = k(m + 1) − k(m) where (a) follows from the fact that H(m) is non-negative. Thus, we have shown that is concave in m. Furthermore, it is straightforward to verify that (70) holds. Hence, we have (73) Finally, we get