Perfectly Accurate Membership Inference by a Dishonest Central Server in Federated Learning

Federated Learning is expected to provide strong privacy guarantees, as only gradients or model parameters but no plain text training data is ever exchanged either between the clients or between the clients and the central server. In this paper, we challenge this claim by introducing a simple but still very effective membership inference attack algorithm, which relies only on a single training step. In contrast to the popular honest-but-curious model, we investigate a framework with a dishonest central server. Our strategy is applicable to models with ReLU activations and uses the properties of this activation function to achieve perfect accuracy. Empirical evaluation on visual classification tasks with MNIST, CIFAR10, CIFAR100 and CelebA datasets show that our method provides perfect accuracy in identifying one sample in a training set with thousands of samples. Occasional failures of our method lead us to discover duplicate images in the CIFAR100 and CelebA datasets.


I. INTRODUCTION
When operating on user data, a fundamental advantage of Federated Learning (FL), compared to centralized learning, is the privacy gain experienced by the user.No plain text training data, but only gradients or model parameters are exchanged with a Central Server (CS).However, a dishonest CS can still mount attacks against its users.In this work, we study the privacy risk for a participant in a FL system, that results from a dishonest CS.At the same time, this scenario is also applicable to an active man-in-the-middle attacker between the client and the CS.
To motivate the study of this particular membership inference attack problem, we provide a fictitious, but plausible scenario, where such an attack could be successfully mounted: The popular mobile phone manufacturer A wants to improve the operating systems of their phones by training a superior image recognition software, allowing their customers to easily categorize snapshots.In order to do this, company A employs FL and obtains consent of (most of) its users for training on the customer's device using their local photo library.Thus, A sends a parameter vector θ of a neural network to each phone.Without user interaction, the mobile phone trains the neural network on the local photo library and communicates the result to A. Crucially, the images never leave the device, providing customers with a sense of privacy.
At the time, in a kidnapping case, a photo of the victim is sent to the police.Suspecting that the photo may have been taken with a device produced by A, the police approach A, asking if it is possible to determine which user has this particular image on their mobile phone.In this paper, we show how A can craft a special set of parameters θ in order to accomplish the police request.Letting each user train on θ for one iteration, A can deduce which user has this particular image in their photo library, by evaluating the answers.The impact on the FL system is negligible as only one iteration of training was needed, but A can achieve perfectly accurate membership inference as we will show in this work.

A. Privacy in FL
subset of clients is selected for the current iteration.Every client in this subset is then queried and the results stored.Before the next iteration begins, an aggregation function aggregate is used to obtain the next parameter vector.The data returned by the clients, as well as the aggregation function vary, depending on the FL algorithm.E.g., in Federated Stochatic Gradient Descent (FedSGD) [1], clients compute the (average) gradient on a local mini-batch and the CS takes the arithmetic average of these gradients and performs a gradient descent step.Similarly, in Federated Averaging (FedAvg) [2], clients train on several batches, possibly for multiple epochs, returning the full parameter vector.The CS uses the arithmetic average of these parameter vectors for the next iteration.
We are interested in the privacy aspect of this setup from the client's point of view.In particular, we want to study the membership inference problem, where the CS wants to learn whether a particular training sample is present in the training data of a particular client.From Algorithm 1 it is clear that the only interaction between a particular client and the CS is through the function φ = client(θ), where the CS provides a parameter vector θ and retrieves the answer φ.Thus, we will solely focus on this interaction between the CS and one particular client.
Moreover, we perform our attack using only one query, i.e., the CS determines membership based only on the output φ = client(θ) for one input θ.

B. Previous Work
Privacy of FL has been an active area of research since this field developed.We focus on privacy at training time and in particular the task of membership inference conducted by the CS.Most literature either assumes a client-side attacker [3], [4] or, that a malicious CS behaves honest-but-curious [5]- [7].Much work has also been devoted to reconstructing input from gradients [4], [8], [9].This problem is rather trivial if the gradient at a single input is available, but becomes challenging when a whole batch is processed, the typical scenario in FedSGD, where each client reports its gradient computed on one mini-batch to the CS.
As far as the authors are aware, an active attacker is only mentioned in [5] and actively studied in [10], [11].However, both works are aimed at reconstructing client data.No attempt is made to study the membership inference task.
In [11], an active, global adversary is considered.However, the attacker at the CS in [11] requires additional training for several iterations and a significant amount of additional training data.This much more, than what is afforded to the attacker in this work, where the CS may only query the client once and, apart from the one sample to be tested, does not have any additional data.
Additionally, there is vast literature on the privacy of machine learning models in general [12]- [14].However, these works assume a fully trained model and tend to exploit overfitting to be able to perform membership inference.The scenario considered here is different in that the attack is only performed during one iteration.
A different line of research considers defense mechanisms such as encrypted parameter updates and homomorphic encryption schemes [15]- [17].

II. OUR METHOD A. Setup and Attacker Model
Our method relies crucially on the following assumptions.1) The model f(x; θ) uses ReLU activations and the structure depicted in Fig. 1 can be embedded in f.
Neural network for the identification of M values.Blue (hidden) nodes have ReLU activations.The weights for the neurons in the first hidden layer are ±1, while their biases are ∓ηm.The weights for the neuron in the second hidden layer are −1 and its bias is .
2) The client uses a form of stochastic gradient descent, e.g., SGD, Adam [18] to minimize a loss function.
3) The function client(), i.e., all training hyperparameters (batch size, optimizer, . . . ) are known to the CS.While certainly adaptable to other settings, we will detail and showcase our strategy for FedAvg in the task of image classification.Thus, we will additionally assume the following.
4) The dataset D = {(x n , y n )} n=1,...,N consists of images x n and discrete labels y n ∈ {1, . . ., L}. 5) The model outputs logits and we use cross-entropy with softmax, i.e., the client attempts to minimize the loss function at the training samples D, where 6) FedAvg is used, i.e., the response φ to the CS is the full parameter vector after training.The client trains for E epochs using J mini-batches with batch size B = N J .7) The structure depicted in Fig. 1 can be embedded in the last layers of f, obtaining b as a logit in the output.The CS is given the target sample s = (x t , y t ), where t ∈ {0, 1}.Note that we have s ∈ D if t = 1, while for t = 0, s = (x 0 , y 0 ) is another sample from the dataset, not present in D. Using s, the CS needs to produce a parameter vector θ, which is used by the client to compute the response φ = client(θ, D).This response is used by the CS, to obtain an estimate T of t.If T = t the CS correctly identified membership.
Remark II.1.Note, that FedAvg is the more challenging task for an attacker, as FedSGD is essentially a special case of FedAvg.In order to see this, assume that FedAvg is used, with E = 1 epoch and J = 1 mini-batch, i.e., B = N .Then the entire training set D consists of only one single mini-batch.The difference θ − φ between the parameter vectors is simply the (scaled) gradient of (1) at D.
Remark II.2.Note that the dishonest server model used here is equivalent to a man-in-the-middle between the CS and the client with the capability to intercept and manipulate traffic.
Remark II.3.Our method crucially relies on Assumption 1, i.e., it is possible to choose parameters θ of f(x, θ), such that the network depicted in Fig. 1 can be realized inside f.For the sake of presentation we assume that this embedding is possible in the final layers (Assumption 7), which simplifies the calculation of derivatives, but is not strictly necessary for our method to be applicable.

B. The Strategy
Before we detail the construction of the model parameters θ, that the CS uses to determine membership of the target sample s in D, we want to provide the main idea behind it.Essentially, we embed the structure shown in Fig. 1 in the clients model.Each hidden node of this structure has a ReLU activation function.It is designed, such that the derivative of its output w.r.t.all parameters in this structure is zero, unless all the inputs a j are close to η j .By choosing η j equal to the values a j produced by s, we can use this property to test for membership: If s ∈ D, and no other sample randomly falls within this narrow interval, the initial parameters will not change during training.
The following lemma summarizes the central properties of the network in Fig. 1.
Lemma II.1.Given the network depicted in Fig. 1, we have Furthermore, for any parameter θ (weight or bias) in the hidden layers of the network depicted in Fig. 1, we have Given a target sample s = (x t , y t ), the CS obtains θ as follows.
1) The model f consists of two parts f = f 1 • f 0 where the structure in Fig. 1 is embedded in f 1 (cf.Assumption 7).
Initialize the parameters θ 0 of f 0 randomly.In the parameter vector θ 1 of f 1 , initialize all weights with zeros and all biases with −1. 2) Let the M largest (in absolute value) components of f 0 (x t ) be (a 1 , . . .a M ) and their value are (η 1 , . . ., η M ). 3) Use these M components (a 1 , . . .a M ) as inputs to the model in Fig. 1 and set the corresponding weights and biases in θ 1 .The output b corresponds to the y t -th component of the output.As all but the y t -th output are zero, we use (1) to obtain the loss function and by Lemma II.1 its derivative w.r.t. is Crucially, by Lemma II.1, the derivative of w.r.t.all network parameters in the hidden layers of the network in Fig. 1 is zero if Thus, if all inputs x n in the training set result in a 1 , . . ., a M that are not close to η 1 , . . ., η M , then these parameters remain unchanged during the entire training process.For the decision function, we measure the change the parameter .

C. Decision
After obtaining the parameter vector φ = client(θ, D) resulting from the training at the client, the CS needs to make a decision T .This is performed by a threshold test on the -component of φ, denoted ¯ .
The CS computes one iteration of the optimization procedure with the crafted parameters θ on the single sample s = (x t , y t ), obtaining φ = client(θ, {s}).This is possible by Assumption 3. The decision statistic is where ˆ denotes the -component of φ and B is the batch size.For a fixed threshold ξ, the CS decides for T = 1 if ∆ ≥ ξ and T = 0 otherwise.

III. EXPERIMENTS
We train a simple network from the tutorial of the FL framework flower [19] with MNIST, CIFAR10, CIFAR100, and CelebA datasets2 .
The simple network architecture is ) Lin ) ReLU Python code to reproduce our experiments can be found at https://github.com/g-pichler/dishonest_mia.

A. Training
All parameters are listed in Table I.For each run, we randomly select N samples D from the training set (MNIST, CIFAR10, CIFAR100, or CelebA).For half the runs, we randomly pick an element s ∈ D, for the other half, we randomly pick an element from the remaining samples, thus ensuring s ∈ D. We then spawn a flower-client and a flower-server, implementing our dishonest strategy.Finally, after every run, we record ∆, as given by ( 9), as well as whether s ∈ D or not.A failure of our method occurs if ∆ ≥ ξ when s ∈ D (False Positive) or if ∆ < ξ and s ∈ D (False Negative).We perform 400 runs on the training set, using the fixed threshold ξ = 0.1.
All experiments concluded without a single false decision by the CS, achieving an over-all accuracy of 100%.
We experience very occasional failures of our method on the CIFAR100 and CelebA datasets during exploration.When investigating, we noticed that there are 14 exact duplicate images in the training set of the CIFAR100 dataset and 82 exact duplicates in the training set of the CelebA dataset.More details on these duplicates are provided in the discussion (Section IV-A).In addition, there are several near duplicate images present in the CelebA dataset.These images are 3 We use the notation F 2 = F ) F to denote functional composition.
indistinguishable for a human and large portions of the images are identical.These (near) duplicates lead to very occasional false positives of our method.We did not observe a single failure that could not be attributed to one of these (near) duplicates.
We decided to choose = 0.001 relatively large to obtain robustness against numerical errors.Then, M = 4 is sufficient to achieve perfect accuracy.However, choosing small M , e.g., M ∈ {1, 2} will then lead to false positives.This is showcased in the ablation study on M that can be found in Table II.We provide False Positive Rate (FPR), False Negative Rate (FNR), accuracy (acc.), and the Area under Curve (AUC) of the Receiver Operating Characteristic.All values are rounded to the second place value after the decimal point.

IV. DISCUSSION AND POSSIBLE EXTENSIONS
We presented a simple algorithm allowing membership inference attacks in a FL setup.Our attack achieves perfect accuracy when identifying a single target sample among thousands of training samples, using only a single query.However, it should still be regarded a proof-of-concept.
We refrained from performing our experiments on larger models.The model size does not impact our strategy, as most weights would be set to zero.We thus decided not to train larger models in order to keep the computation time to a minimum.
We only performed identification based on up to 16 components as a proof-of-concept.However, if enough neurons are available, an extension to more components is certainly feasible.
Finally, we want to remark that the strategy presented here may in fact be used for attribute inference.For instance, M −1 known attributes can be used to select a specific target sample and a private attribute can then be inferred using multiple queries.

A. Duplicates
As our strategy may appear to fail if exact duplicates are present in the dataset, we performed a search for such images in all four datasets (MNIST, CIFAR10, CIFAR100, CelebA).Both MNIST and CIFAR10 are free of exact duplicate images.We (re)discovered the exact duplicates in the CIFAR100 dataset that were pointed out by [21].
We found 14 and 82 duplicates in the training sets of CIFAR100 and CelebA, respectively.The testing sets contain 2 and 7 duplicates, respectively.Additionally, 9 of the 14 duplicates in the CIFAR100 training set, as well as 51 of the 82 duplicates in the CelebA training set occur with different labels.Both duplicates in the testing set of CIFAR100 are differently labeled each time.
To our surprise, we also found that the training and testing sets of CIFAR100 and CelebA contain identical images.There are 10 images that occur in both the training and testing set of CIFAR100, 6 of which are differently labeled in the two sets.For CelebA, 11 images are present in both sets, all of which are differently labeled for training and testing.
Table III provides a list of all duplicates we found in the CIFAR100 dataset with their IDs and labels.