Decentralised Federated Learning for Hospital Networks With Application to COVID-19 Detection

Federated Learning (FL) is a distributed machine learning technique which enables local learning of global machine learning models without the need of exchanging data. The original FL algorithm, Federated Averaging (FedAvg), is extended in this work by means of consensus theory. Differently from standard FL algorithms, the resulting one, named FedLCon, does not need a coordinating server, which represents a single failure point and needs to be trusted by all the clients. Furthermore, the consensus paradigm is also applied to the Adaptive Federated Learning (AdaFed) algorithm, which extends FedAvg with an adaptive model averaging procedure. Performance comparison tests are performed over a real-world COVID-19 detection scenario.


I. INTRODUCTION
Federated Learning (FL) was originally introduced in [1] with the Federated Averaging (FedAvg) algorithm, as an alternative to conventional approaches for training learning models using data coming from mobile devices. Unlike distributed optimization, FL deals with non-IID, unbalanced, and massively distributed data by means of a federation of clients communicating with a central server in a privacy-preserving way that avoids the exchange of any data. Contrary to a standard distributed learning setting, depicted in figure 1 a), where data are shared and partitioned by a server among the clients for distributed processing, the standard FL setting envisages that all data are locally processed by the clients and the server task is only to perform a so-called model averaging procedure. In fact, the FL server iteratively updates its model during each communication round by averaging the trainable The associate editor coordinating the review of this manuscript and approving it for publication was Valentina E. Balas . weights gathered from the clients of the federation after having trained on their locally available data, as depicted in 1 b).
The original FedAvg [1] algorithm averages the clients models with an a-priori weighting strategy that typically depends on the numerosity of the various clients datasets. On the contrary, the Adaptive Federated Learning (AdaFed) algorithm [2], recently proposed by the authors, envisages a dynamic and adaptive heuristic weighting scheme that takes into account the performance attained by the various clients, so that the impact of low performing and/or malicious clients can be reduced.
Both FedAvg and AdaFed share the same overall architecture, with a centralized server orchestrating the learning procedure of the entire federation. On the one hand, this provides a communication-efficient solution with privacy guarantees, but on the other hand, it introduces a single point of failure vulnerability in the system. Moreover, a centralized architecture of this kind requires all the clients to completely VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ trust the server behaviour, since they do not have any means to verify it and do not communicate with each other. To this end, a possible solution would be to allow pointto-point client agreements, as depicted in Figure 1 c), so that information is directly exchanged only with trusted entities. This work aims at developing a fully decentralized solution able to extend the FL paradigm to a group of federated entities of arbitrary topologies. The main challenge to be faced in such a decentralized setting is to assure the convergence of the clients models towards a common one [3]. In this direction, the present work proposes a scheme that relies on consensus theory, as introduced in the previous work [4]. The developed framework may find application over federations characterized by sparse communication graphs, as depicted in Figure 2, that arise in scenarios in which broader communications are prevented by regulations and/or physical connections.
The highlights and main contributions of this work are: • The FL framework is extended to a fully decentralized setting, that does not require a coordinating server and is robust with respect to clients with poor quality data and malicous ones, by means of a consensus-based algorithm; • The consensus-based FL algorithm FedLCon is designed so that it may provide a seamless decentralization solution for any FL algorithm that employs a centralized model averaging procedure; • FedLCon, is applied to decentralize both FedAvg and AdaFed; • Validating examples are discussed, demonstrating the applicability and performance properties on the X-Rays COVID-19 detection problem. The reminder of the paper is organized as follows: Section II discusses the relevant works in the literature; Section III shows how the COVID-19 detection problem is formulated in a real-world decentralized scenario; Section IV presents the proposed algorithms; Section V reports the results of our tests; Section VI draws the conclusions and discusses future research directions.

II. RELATED WORKS
This section first presents a brief overview of the application of FL algorithms to the healthcare domain and then discusses some recent Deep Learning (DL) applications for COVID-19 detection.

A. FEDERATED LEARNING FOR HEALTHCARE
The distributed and privacy preserving nature of FL makes it an ideal choice for healthcare applications. In fact, FL has been broadly applied wherever the collaboration of stakeholders (e.g., hospitals, laboratories, public bodies) would have improved significantly the handling of a specific problem, such as for electronic health records (EHRs) management, remote health monitoring and medical imaging [6].
In particular, EHRs are used to store patients' health information, such as diagnoses, medications and analyses, in a digital format [7]. EHRs are hence a valuable source of data that could be used and studied to facilitate the diagnosis and assessment of various diseases, but their sharing among different institutions poses several regulatory challenges. In [8] the authors address both the requirement of extensive computational resources and the vulnerability of transmission channels by proposing a privacy-aware and resource-saving collaborative learning protocol (PRCL). PRCL, based on a FL framework, makes use of a so-called model splitting method, that consists in outsourcing the most computing demanding part of the learning procedure to a cloud server and of a lightweight data perturbation method to prevent direct and indirect data leaks. The concept of splitting is also employed in [9], where a FL-based method called SplitNN is introduced; the basic idea is to split the model into multiple parts, each one trained by a single client, thus not requiring raw data sharing. Reference [10] uses a FL setting based on the soft-margin l 1 -regularized sparse Support Vector Machine (sSVM) classifier and on an iterative cluster Primal Dual Splitting (sPDS) algorithm to predict hospitalizations for cardicac events within a calendar year, based on information in the patients EHRs prior to that year. The authors of [11] focus on privacy aspects by introducing a FL framework based on a differential privacy mechanism whose effectiveness is proved using real-world EHRs of 1 million patients. The concept of differential privacy is employed also in [12] where artificial noises are added to the parameters at the clients side before aggregation. The work [13] aims at predicting patients mortality from their EHRs by means of a FL method called Federated-Autonomous Deep Learning (FADL), whose main novelty is that part of the model is trained by using all data sources in a distributed manner and other parts are trained using specific data sources. Prediction of patients mortality as a binary classification problem is explored in [14] where a stochastic FL algorithm called Stochastic Channel-Based Federated Learning (SCBF) is introduced; the privacy of each client is enforced by updating the weight of the server model by stochastically selecting the clients whose local gradients presented the largest magnitude. In [15], the problem of mortality and ICU stay-time forecasting are addressed via a FL-based algorithm called Community-Based Federated Machine Learning (CBFL) which clusters patients into clinically meaningful communities capturing similar diagnoses and geographical locations, learning then a different model for each cluster. Finally, the authors of [16] address the preterm-birth prediction by means of a novel FL algorithm called Federated Uncertainty-Aware Learning Algorithm (FUALA) which weights the contribution of each client based on its performance and introduces ensembling at prediction time.
Other than EHRs, a significant amount of healthcarerelated data is produced by wearable devices, that allow for remote monitoring of patients activity and heath status, gathering information like blood pressure and oxygenation levels, heart rate, sleep cycles and several other indicators. The work [17] proposes FedHealth, a FL algorithm tailored for wearable healthcare, which performs data aggregation  in a privacy-preserving way thanks to a standard FL framework, and builds personalized models using transfer learning. In [18], the authors propose to tackle the problem of heterogeneity in labels across users via model distillation techniques, and proves the validity of the approach on the Heterogeneity Human Activity Recognition (HHAR) dataset. Mood prediction and monitoring is carried out in [19], where metadata related to the use of a virtual keyboard, such as key letters, special characters and phone accelerometer values are used in a FL setting called FedMood, with fusion methods applied for data normalization.
Among data produced by healthcare facilities, medical images constitute one of the most common diagnostic tool for a vast number of diseases, as they reveal the interior of a body and offer a visual representation of the state of several organs and tissues. In [20], the authors developed a FL model to support brain tumor segmentation using Deep Neural Networks (DNNs) together with differential privacy techniques to enhance privacy among clients, evaluting the resulting framework over the BraTS dataset [21]. In [22], brain structural relationships are investigated using magnetic resonance images (MRI) scans distributed across institutions via a FL framework which accounts for data standardization, confounding factors correction, and multivariate analysis related to the variability of high-dimensional features. Brain tissue-based MRI analytics via FL is also investigated in [5], and FL is also used to perform reconstruction of MRI from under-sampled data [23]. Li et al. [24] proposes a FL method to perform functional MRI (fMRI) classification via a randomization mechanism to coordinate the weight sharing process. The recent study [25] shows a real-world application of FL to breast density classification based on Breast Imaging, Reporting and Data Systems (BI-RADS) [26].

B. DEEP LEARNING FOR COVID DETECTION
Artificial Intelligence (AI) techniques have been largely applied to in the fight against Covid-19. In this context, a significant share of the Deep Learning (DL) applications are related to autonomous detection of COVID-19 cases from X-ray images and Computed Tomography (CT) scans. In [27], a 3D DNN is used to predict the probability of COVID-19 infectious over a dataset of 499 chest CT samples, each one with the lung region segmented using a pre-traied UNet [28]. In this direction, [29] proposes a DL algorithm capable of detecting, localizing, and quantifying the severity of COVID-19 manifestation from chest CT scans, as well as unsupervised clustering of abnormal slices. First the lung region of interest (ROI) is localized in a chest scan, then a 2D ROI classification network classifies the considered ROI as normal vs abnormal. The GradCam method is then used to obtain a fine-grained map of the extracted pathological tissue, enabling for the creation of a so-called Corona score, related to the volumetric measure of the disease extent. Finally, unsupervised clustering of normal and abnormal slices is carried out to learn the different patterns of the abnormal manifestation of the disease.
The use of chest CT scans for COVID-19 detection is also explored in [30], [31], and [32]. In [33], two ''infection'' metrics were introduced; the use of DL techniques on chest CT scans makes possible to quantify the volume of infection and percentage of infection. VOLUME 10, 2022 Compared to CT scans, X-ray images are cheaper and faster to obtain, that is why a large number of works based on them can be found in the literature. In [34] a deep convolutional neural network (CNN) model, called COVID-Net, for the detection of COVID-19 cases based on chest X-ray (CXR) images, as well as the COVIDx benchmark dataset. In [35], transfer learning is used for COVID-19 classification from CXR and, in particular, a deep CNN called DeTraC is introduced. Similar approaches are followed by [36], [37], and [38], where the authors make use of transfer learning for COVID-19 detection.
The usage of FL in the contex of COVID analysis has already been explored in works such as [39], [40], and [41], where the authors compare the detection performance of various DL models also employing transfer learning, [42] where FedAvg is employed to predict the future oxygen requirements of symptomatic patients in a federation of 20 institutions. The authors of [43] propose a dynamic logic for the clients participation in the averaging procedure, depending on both their computational time and model performance -following a logic similarly to the one behind AdaFed -while [44] focuses on mortality prediction in hospitalized patients.
We mention that DL has also been used for other critical tasks related to the pandemy, e.g., for modeling the disease transmission dynamics [45], for ''drug-repurposing'' [46], [47], for drug discovery [48] -where a DL model named CogMol first learn candidate molecules that can interfere with the COVID-19 virus, and then generate candidate drugs -and for protein structure prediction [49], [50].
Overall, FL is a promising solution for healthcare application that may even be an enabling technology in scenarios where privacy is critical and data sharing is prevented by strict regulations.

III. PROBLEM FORMULATION
The COVID-19 pandemic underlined the importance of scientific collaboration and knowledge sharing to overcome the great challenges that a treat to the entire world poses. Motivated by this, we explore the problem of COVID-19 detection using X-Rays images in a federated setting, namely considering a network of hospitals that collaborate with each other by sharing the X-Rays images of their patients in a privacy-preserving and communication-efficient way.
We model the hospital network by a graph, whose vertices represent the various hospitals and whose vertices capture the possibility of communication between pairs of hospitals. In fact, due to regulatory and trust reasons, some hospitals may be prevented from directly sharing information with certain institutions: for this reason, we will conduct the following analysis considering a sparse/non-complete graph.
The remainder of the section will present the needed background on Consensus Theory and FL algorithms.

A. BACKGROUND ON CONSENSUS THEORY
Given a graph of N vertices connected by a set of edges, the general goal of a consensus algorithm is to derive a fully distributed information-sharing law that steers the state of all the various clients (agents) towards a common consensus value.
The following matrices are defined: the adjacency matrix A = (a ij ) i,j∈I ∈ R N ×N , where I is the set of N clients and with a ij = 1 if an edge connects clients i and j, and a ij = 0 otherwise; the out-degree diagonal matrix O = (o ij ) i,j∈I ∈ R N ×N , with o ij = j a ij computed as the clients' out-degree; the Laplacian matrix L := O − A, and the diagonal matrix P = diag(p i ) i∈I ∈ R N ×N , with p i representing the weight given to client i.
Let x i (t) be the state of agent i at time-step t, and let I i be its set of neighbors. Under the hypothesis of a connected and undirected consensus graph, 1 under the following discrete-time update rule: in which the sampling time in such that < min i∈I (p i /o ii ) [51], [52], the clients reach a consensus value on their states x i that coincides with the weighted average of their initial conditions: The convergence of the agents follows the dynamics of the discrete-time systems (1), that can be equivalently written in matrix form [52] as with H p = I − P −1 L. From (3), starting from the well known definition of dominant time constant for a discrete time linear time invariant system and its settling time [53], it follows that the agents will reach convergence, with a 99% precision, after a number of steps n : where λ i (H p ) is the i-th eigenvalue different from 1 of the matrix H p and · denotes the ceiling function of its argument, with a resulting 1%-settling time t a ≈ n · .

B. BACKGROUND ON FEDERATED LEARNING
Suppose to consider a set I of N federated clients, sharing the same DL model architecture, i.e. a deep neural network. Let D i = {(α n , β n ), n ∈ {1, . . . , |D i |}} be the dataset of client i ∈ I , with cardinality |D i |, and w i be the vector of its trainable parameters. We denote the total available data as D = i D i . In the federation, each client i is trained to minimize the loss function l i (α n , β n )|w i over its entire dataset D i : given the generic input α n , the loss function is used to quantify the quality of the model, with parameters w i , against the corresponding ground truth value β n , with (α n , β n ) ∈ D i . The choice of the loss function depends on the particular machine learning problem to be addressed. In general, regression tasks employ the mean squared error, whereas classification ones the categorical cross-entropy. We set: as loss function of client i over its entire dataset D i . The goal of the federation is then to find the optimal vector w * of parameters that, when shared by all clients, solves the minimization problem with joint cost function defined as [54]: withp i = |D i |/|D|. While in the standard machine learning setting optimization (6) is tackled by a centralized system, which computes the gradient ∇L(w) given the whole set D, in a distributed one, the gradient ∇L(w) has to be estimated starting from the gradients of the clients ∇L i (w i ). Moreover, in standard (non-federated) distributed learning, data can be distributed arbitrarily by a centralized entity over the clients. The typical assumption for this distribution is that the datasets D i are IID with respect to D, implying E [L i (w)] = L(w). In practice, under this assumption L i (w) provides a good approximation of L(w) [1] and the locally computed gradients ∇L i (w i ) can be averaged to reconstruct ∇L(w).
On the contrary, in the federated setting such IID hypothesis can not be assumed, as the training data is processed without any re-distribution and L i (w) could provide an arbitrarily bad approximation of L(w). For this reason, in FedAvg [1] the author proposed a round-based iterative procedure for model averaging.
FedAvg is divided into two main phases, which are repeated iteratively. In the first phase (local training), the server selects a subset of clients that update the weights of their models by training on their local datasets D i with a gradient descent update rule: where 0 < η < 1 is the learning rate andw i (t) is the locally updated weight of the model of agent i at timestep t. We mention that in the FL setting it is typically assumed that all clients share a common initial weight vector, i.e., w i (0) = w 0 ∀i ∈ I [1]. Actually, the local weight update is performed iteratively over a given number E of training epochs using a variation of gradient descent (minibatch gradient descent) that splits D i 's into a set of minibatches. For the sake of simplicity, equation (7) exemplifies the update rule with E = 1 and over the complete dataset, whereas the pseudo-code presented below reports the minibatch multi-epoch version of the algorithm.
In the second phase (centralized averaging), the server collects thew i 's, computes the weight vector w(t) as the select a subset of clients for the averaging procedure 4: for all selected client i do 5: CLIENT UPDATE 6: receivew i from client i 7: end for 8: set w(t) = ip iwi (t) 9: propagate w in the federation (w i (t) = w(t), ∀i) 10: end for 11: CLIENT UPDATE: 12: for each local epoch e = 1, . . . , E do 13: for each mini-batch b from D i do 14: 15: end for 16: end for 17: and propagates the weight vector w(t) to all the clients: We report the pseudo-code for FedAvg (see Algorithm 1), showing an implementation where the clients perform E local training epochs using mini-batch Gradient Descent with a batch size of B. In the code, ∇L i b|w i denotes the gradient performed over the mini-batch b and it is assumed for simplicity that all clients participate in the averaging procedure.
Like several other variants of FedAvg [54], [55], AdaFed shares the same centralized setting and the two-phase approach, however at its backbone there are an adaptive model averaging procedure paired with an adaptive loss function that heuristically provide a more resilient solution to imbalanced data distributions: • Weighted Model Average: During the server update, the performancep i (t) of each client model is evaluated over a common test set, and each model is weighted accordingly in the averaging; • Adaptive Loss: The server propagates to the federation both the updated model and a new loss function, adapted to the performance p(t) of its own model over a dedicated test set according to a use-case dependant metric. We report the pseudo-code for AdaFed (see Algorithm 2) and refer the reader to [2] for a more detailed discussion of the algorithm.

IV. PROPOSED FL ALGORITHM
Similarities between the FL framework and the one for discrete-time weighted average consensus may be found in VOLUME 10, 2022 for each client i = 1, 2, . . . , K do 5: receivew i from client i 6: evaluatew i on the server test set 7: use the evaluation to determine the weightp i 8: end for 9: set ip i (t) 10: evaluate its performance p(t) on the server test set 11: adapt the loss function l(t) depending on p(t) 12: propagate w(t) and l(t) to the clients 13: end for 14: ClientsUpdate: 15: for each local epoch e = 1, . . . , E do 16: for each mini-batch b from D i do 17: 18: end for 19: end for 20: setw i (t) = w i (t − 1) 21: returnw i (t) to the server the interpretation of the weights w i (t) of the FL clients as the states x i (t) of a set of agents seeking consensus (even if the former are not dynamical systems). On this interpretation, in [4] we proposed to combine (7)-(9) and (1) as described in the following.
At each communication round t, the weightsw i of each client are computed by (7), but the update of the weights vectors w i (t) is performed via a consensus round. Let k be the consensus round index and recall that n is the number of iterations required to reach consensus within the round. To reach consensus the federated clients exchange information n times, starting from the initial values x i (0) = w i (t), ∀i ∈ I . The following iteration rule is executed for k = 0, . . . , n − 1: with n computed by equation (4). Due to the structure of the update rule (10) and the convergence properties of (1), already discussed in Section III-A, one has that i.e., at the end of the communications (when consensus is reached among the federated clients), the proxy variables x i approximate the weights w(t) computed by the centralized FL case with equation (8). Setting Algorithm 3 FedLCon Applied to FedAvg 1: DECENTRALIZED FEDERATED TRAINING: 2: for each communication round t = 1, . . . , T do 3: for all clients i ∈ I do 4: for each local epoch e = 1, . . . , E do 5: for each mini-batch b from D i do 6: end for 8: end for 9: setw i (t) = w i (t − 1) 10: end for 11: update w i (t) via a CONSENSUS ROUND 12: end for 13: CONSENSUS ROUND: 14: Compute n according to (4) depending on the topology 15: Set x i (0) =w i (t), ∀i ∈ I 16: for k = 0, . . . , n − 1 do 17: for all clients i ∈ I do 18: update x i according to (10) 19: end for 20: end for 21: set w i (t) = x i (n ), ∀i ∈ I the procedure can be repeated starting form the training of equation (7) for all communication rounds t. Note that each communication round t now yields n information exchanges since it involves a consensus round, but at the same time it does not envisage the presence of any centralized entity.
The resulting consensus-based distributed FL algorithm (FedLCon) [4], applied to decentralize FedAvg, is reported as a pseudo-code (see Algorithm 3) in the same form of the two previous cases. Remark 1. As the consensus round is transparent to the FedAvg algorithm, different consensus algorithms can be used to exploit the communication and/or topology properties of the application scenarios.
Remark 2. Regarding the application of FedLCon to AdaFed, there are some design choices to be made depending on the use case characteristics. If all clients share a common validation dataset, on which they evaluate their model performance, they can directly obtain their weightp i (t) at the start of the consensus round and then update their loss functions at its end, when all the clients share a practically identical model. On the contrary, if each client has a different validation dataset, a possible solution is to let each client assign a performance weight to its neighbours; for example, each client i may compute the weightp i (t) by averaging the performance of the model of its neighbours tested on their own test set. Independently from this choice, at the end of the consensus round all the clients models will converge towards the weighted average model envisaged by AdaFed. The pseudo-code for the decentralized version of AdaFed is reported in Algorithm 3.

Remark 3.
The introduction of the consensus round (and its n information exchanges) yields a communication overhead, that is the main disadvantage of the proposed algorithm. From (4), it is clear that n is influenced by the eigenvalues of the matrix H p , that in turn depend on the communication network Laplacian matrix L. We mention that, in general, such eigenvalues do not depend directly on the number of clients in the federation and instead capture the topology connectivity level, meaning that the scalability of FedLCon is mostly affected by the number of links available in the communication topology and their positioning. Note that, in the cases in which the communication overhead becomes not negligible in terms of both bandwidth consumption and training time, one may still deploy adequate counter measures, such as resorting to a transfer learning approach to limit the amount of trainable parameters or multi-hop consensus protocols that virtually increase the federation connectivity.
Remark 4.The requirement of completing n information exchanges every consensus round causes FedLCon-based solutions to require more time to complete the model averaging process than centralized ones. We mention that, in general, the information exchange process is expected to require a significant lower amount of time than the training, making the impact of this overhead negligible in most application.
The discussion of this section can be summarized by the following theorem: Theorem 1: By exchanging information following the consensus-based protocol (10) n times, a federation of distributed clients is able to conduct a decentralized model averaging procedure that is equivalent to the one obtainable in a centralized setting. In fact, discrete-time dynamical systems and consensus theory assure that, with 99% precision, the decentralized and centralized averaged models are identical. Furthermore, since the learning process and the model averaging procedures are decoupled, (10) allows for effectively decentralizing any model averaging-based FL algorithm.

V. SIMULATIONS
In this Section, we compare the FedLCon paradigm applied to FedAvg and AdaFed over the X-Rays COVID-19 detection problem. Note that, unlike in the original FedAvg formulation, all the clients are involved in the averaging process by means of the consensus rounds. We consider a federation of |I | = 7 clients and we set the parameters of the algorithms as local client epochs E = 4, communication rounds T = 15, and local batch size b = 64. In the AdaFed implementation, the performance weightp i of the i-th client is computed as its accuracy over a common evaluation dataset of 4876 samples. The model of each client is a Transfer Learning one [56], composed of the VGG19 [57] network trained for ImageNet [58] and one dense layer with 1024 neurons and ReLU [59] activation function. Being the problem at hand a binary classification problem, a binary cross-entropy loss function is employed. The class-dependant weights of the loss function are set to be inversely proportional to the performance p(t) of the model the federation evaluation set with Algorithm 4 FedLCon Applied to AdaFed 1: DECENTRALIZED FEDERATED TRAINING: 2: for each communication round t = 1, . . . , T do 3: for all clients i ∈ I do 4: for each local epoch e = 1, . . . , E do 5: for each mini-batch b from D i do 6: end for 8: end for 9: setw i (t) = w i (t − 1) 10: evaluatew i (t) on the federation test set 11: use the evaluation to determine the weightp i 12: end for 13: update w i (t) via a CONSENSUS ROUND 14: evaluate the performance p(t) of w i (t) 15: adapt the loss function l i depending on p(t) 16: end for 17: CONSENSUS ROUND: 18: Compute n according to (4) depending on the topology 19: Set x i (0) =w i (t), ∀i ∈ I 20: for k = 0, . . . , n − 1 do 21: for all clients i ∈ I do 22: update x i according to (10), replacing D i withp i 23: end for 24: end for 25: set w i (t) = x i (n ), ∀i ∈ I respect to its corresponding class, and are updated after each communication round by means of its F c 1 -score, i.e., κ c = 1/(F c 1 + ), = 0.1, where c denotes the considered class.

A. DATASET DESCRIPTION
To tackle the COVID-19 detection problem, we make use of the COVID-19, Pneumonia and Normal Chest X-ray PA Dataset [60] and formulate the problem as a binary classification one, i.e., the detection of COVID-19 cases. Standard data augmentation is performed by rotating (+45 • , −45 • ) and flipping upside down each image to increase the dataset numerosity. Before the augmentation and before dividing data among N = 7 clients, the dataset is shuffled and its 20% is used as the evaluation test set with the remaining 80% evenly divided among the clients, leading to every client having access to about 580 images.

B. FEDERATION MODELLING
By the very nature of the proposed consensus-based algorithm FedLCon, the topology of the considered network plays an important role. As pointed out in Section III, the algorithm is developed under the hypothesis of a connected and undirected consensus graph. The network topolgy that we consider is depicted in Figure 3, which shows that the algorithm is tested on a sparse communication graph over which a limited VOLUME 10, 2022  number of information exchanges are allowed. The resulting number of steps for each consensus round is n = 10.

C. SIMULATION 1 -ORIGINAL DATA
In this simulation, we want to evaluate whether the federation is able to solve the considered problem in the absence of any training disturbances. In fact, given the relatively low numerosity of the dataset the federeted clients may in principle be subject to overfitting. Figures 4 and 5 show that the two proposed algorithms exhibit a similar performance across all the communication rounds, with AdaFed converging slightly faster to the final value. we remark that, in federation with balanced and IID data, AdaFed is expected to perform very similarly to FedAvg , as its dynamic weight update has a greater effect on uneven data distributions [2]. For benchmarking purposes, in Figure 4 we also include a dashed line that represents the performance attained after 60 epochs by a single, centralized, server that trains the same neural network on the entirety of the data. This comparison highlights how the decentralized FL setting shows only a slight performance decrease even when compared to a fully centralized, non federated, solution.

D. SIMULATION 2 -GAUSSIAN NOISE
In this simulation, the presence of two clients with poor quality data is simulated by adding gaussian noise to their   images. In particular, we choose to blur the images of clients 1 and 7 by adding a gaussian noise perturbation, with mean µ = 0 and standard deviation σ 2 = 1, to each of the pixels of their images. Having to deal with perturbed data, AdaFed starts to show its robustness compared to FedAvg, demonstrating a better performance starting from the third communication round, as shown in Figures 7 and 8. This performance advantage is due to AdaFed automatically giving a lower weight, for the model averaging procedure, to the clients with corrupted data (lines 10 and 11 of Algorithm 4), effectively preventing their negative contribution on the federation trainig.

E. SIMULATION 3 -LABEL SWAP
In this simulation, a malicious attack affecting clients 1 and 7 is introduced by inverting the labels of all their data. From Figure 8 it can be noted how both FedAvg and AdaFed   are similarly affected in terms of final accuracy. In fact, the extension of the attack (almost a third of the available data) removes a significant portion of information from the knowledge base available to the federation. Nevertheless, AdaFed demonstrates a better resiliency to this kind of scenario (9), as the corrupted clients are entirely removed from the model averaging procedure.

F. SIMULATION 4 -LABEL SWAP AND GAUSSIAN NOISE
In the final simulation we consider a more complex scenario that combines the previous situations by including both clients with poor quality data (clients 3 and 5) and malicous clients (clients 1 and 7). In this simulation the combination of label swapping and additive gaussian noise significantly lowers the performance of FedAvg, whereas AdaFed limits its performance degradation, as depicted in Figures 10 and 11.

VI. CONCLUSION AND FUTURE WORKS
This paper presents a FL consensus-based paradigm called FedLCon, originating on the ground of results from discrete-time average consensus theory, that is used to decentralize two FL algorithms -FedAvg and AdaFed. Decentralizing existing FL algorithms thought a solution such as FedLCon enables the application of FL over federations with sparse communications graphs, further enhancing its privacyrelated features. The paper presented the results attained by the tested algorithms on several scenarios for a COVID-19 detection task.
intelligence. He is also the Scientific Coordinator of the ESA-Funded 23 Research Project ARIES, related to Wildfire Emergency Management and 24 Work Package Leader at the EU-Korea H2020 Project 5G-ALLSTAR. Since 25 2020, he is serving as an Associate Editor for the International Journal of 26 Control, Automation and Systems (Springer). His main research interests 27 include network control and intelligent systems, where he published about 28 50 papers in international journals and conferences. 29 SABATO MANFREDI was a Visiting Profes-  neering Department, Imperial College London, 39 London, since 2012. He has authored/coauthored more than 100 scien-40 tific publications including 18 single-author papers and the monograph: 41 Multilayer Control of Networked Cyber-Physical Systems: Application to 42 Monitoring, Autonomous and Robot Systems (Advances in Industrial Con- 43 trol Series, Springer, 2017). He has collaborates with many international 44 universities and companies and holds European patent. He is also a Propo- 45 nent member of an Academic Spin-Off. He is also involved in a range of 46 academic, industrial, and consulting projects. His research interests include automatic control with a special emphasis on nonlinear and complex net- 48 works, distributed control and optimization, sensor/drone networks, and new 49 technologies/algorithms for smart city and cyber-physical systems. 50 DANILO MENEGATTI (Student Member, IEEE) 51 received the master's degree in control engineer-52 ing from the Department of Computer, Control, 53 and Management Engineering ''Antonio Ruberti'' 54 (DIAG), University of Rome ''La Sapienza,'' in 55 2020, where he is currently pursuing the Ph.D. 56 degree in automatic control, bioengineering and 57 operations research. His research interests include 58 intelligent systems, distributed learning, and rein-59 forcement learning applications.