Introduction
In recent years, the number of Internet of Things (IoT) and smart wearable devices have witnessed an increased proliferation in several vertical domains (e.g., wearables, home automation systems, smart glasses, health monitors, health-fitness trackers, smart grids, etc). These devices use a variety of wireless technologies and thus manifest different topological properties. For instance, LoRa deployment support one hop connection to the gateway with a distance-dependent spreading factor manifesting star topology. In contrast, Thread, Zigbee, and BLE Mesh manifest the mesh topology. The Mesh deployment for local connectivity is gaining extra momentum with the rapid evolution of standards.
The huge amount of data that sensors and edge devices collect is critical. It can be gathered and analyzed using modern ML techniques to build a classification model that can learn, predict, and meet the end user’s requirements optimally. ML algorithms can enable various applications. These include controlling and monitoring the home appliance, controlling autonomous vehicles, and monitoring various health parameters of the elderly such as heart rate, fall detection, etc. However, the data from an edge device may carry very personal information. Thus, data privacy and security are significant challenges as users usually do not allow sharing of sensitive information and data by putting it all in one central location.
Now, Federated Learning (FL) [1] has recently emerged to address this issue and solve participants’ essential requirements and concerns to preserve privacy and data security. FL has developed as a new paradigm in building models from distributed ML setups that can offer the opportunity to learn a model from multiple disjoint sensitive local datasets while keeping the user data private through distributed training [1] and [2].
FL has attracted much attention because of its ability to preserve the privacy of the client’s data by sharing only the locally trained model parameters instead of the local data itself. Figure 1 shows a reference deployment scenario for a healthcare application. This example shows how the Community Hospital, the Research Medical Centre, and the Cancer Treatment Centre all train their local models using their local private data and share only the model parameters to create a global model in order to improve the system performance.
Thus, FL performs the aggregation and analysis of the local models and updates the models on the participants’ devices or servers without sharing any devices’ data with others, thus keeping the private data protected. So, each participating device will train a model exploiting its local data usually using classification or regression algorithms (i.e., employing a Deep Neural Network (DNN)), and only share the model parameters with a central server. Afterwards, the FL central server will aggregate the parameters of these local models to process and create a global model. The FL server broadcasts the global model to update the local models. The system will iterate the same procedure until the system achieves a convergence state [3]. This paradigm of FL is also known as Centralized Federated Learning (CFL).
A. Motivation
The data generated by IoT devices and smartphone terminals has become essential for driving intelligent services through applications of Machine Learning (ML). Various applications ranging from healthcare to autonomous vehicles are rapidly deploying IoT solutions. It is thought that FL can offer more secure and shared security services for a wider range of applications, helping to support the steady growth of distributed ML applications [4]. Although the CFL systems are indeed promising, they face several limitations and challenges as they require a central FL server. Efficient communication is a critical challenge that needs to be addressed in FL to ensure that all participating devices are connected and that performance is not compromised [5]. Furthermore, in particular, it is hard to implement in some scenarios where a reliable and robust central server is difficult to find (e.g., a fully self-driving and autonomous network). Moreover, the CFL network faces the limitation of a single point of failure (communication bottleneck) at the central server. Therefore, this research will study a Decentralized Federated Learning (DFL) approach to overcome these challenges. In the DFL model, all neighbouring devices share their model parameters directly in a peer-to-peer manner. In this case, each device acts as a server by aggregating the parameters from neighbours and then averaging them to update a sub-global model that can be shared with others within one hop of a communication link.
The aim of this paper is to evaluate the feasibility of the implementation of the DFL approach under spatio-temporal dynamics. DFL algorithms are implemented using clusters of mesh networking groups located in different environments. Our system model can be translated into two practical deployment modalities: 1) Autonomous and Decentralized IoT networks: these are the networks formed by limited capability IoT and edge devices where decentralized FL is required intrinsically. For instance, networks formed by wearable IoT devices in battlefields or remote hospitals. 2) Edge/Fog-Assisted IoT networks: these are the hierarchical networks where the data from IoT devices is collected and processed by the edge/fog gateways [6]. These gateways then form a mesh network where DFL must be implemented. A typical example of such deployment is in agricultural IoT setup [7] or other smart city IoT applications. Both cases can be translated into the system model considered in this paper. In addition, this study will implement an FL network based on the slotted ALOHA protocol as a sub-optimal MAC protocol to evaluate the performance of the network and optimal configuration under the proposed setup.
The combination of the DFL with the slotted ALOHA mesh networking protocol is proposed to satisfy the users’ privacy preservation, increase the protection for confidential data, increase the prediction accuracy of the implemented algorithms and build a robust system. This system can help us exploit a greater percentage of the users’ data that will be trained to create a global model that can improve the local models in each participating device without sharing any data with other participants except the local model parameters.
B. Keycontributions
Limited studies were conducted using FL with a central server to achieve better results. The objectives of this study are to introduce and simulate the Centralized and Decentralized FL’s wireless communication stage between the devices in the learning process based on qualitative examinations of the CFL approach with a central server and the DFL approach over a Wireless Mesh Networking (WMN) without a central server. Furthermore, this study aims to simulate the communication network for the CFL and DFL models to design a robust wireless communication network during the training process, which helps to evaluate how the model can perform in different network conditions, such as congestion and interference.
C. Paper Organization
The rest of the paper is organized as follows. In Section II, the related work on the FL approach is provided. Background and challenges for DFL over WMN are presented in Section III. Section IV, introduces the system model in terms of theoretical analysis of device communication in the network and learning metrics for DFL over WMN. The learning criteria for DFL are demonstrated in Section V. The proposed framework simulation and results are summarized in Section VI. Finally, a summary of this research and future work is presented in Section VII.
Related Work
Many recent papers have investigated the fundamentals of CFL algorithms [8], [9], [10], [11], and [12]. CFL and its central server algorithm are called the Federated averaging (FedAvg) algorithm were first proposed and implemented in [1]. The FedAvg algorithm is implemented to create a global model by averaging the aggregated parameters from the participants [1].
In [8], comprehensive research of CFL for mobile-edge networks was presented. The authors examined the critical implementation issues with existing solutions and potential applications of CFL in IoT and mobile edge networks. In addition, some existing limitations and challenges in CFL are highlighted, such as the difficulty of aggregating sufficient data, real applications’ heterogeneous data distribution, and theoretical analysis of device communication and convergence. The work [9] reviewed the challenges in implementing CFL, future research directions and the existing CFL approaches. In [10] and [12] the authors surveyed the CFL implementations, devised a taxonomy, and overviewed the currently proposed solutions and their challenges in the CFL framework. They presented the essentials of preserving privacy and checking fairness in CFL.
The study in [13] conducted a thorough and comprehensive examination of the architecture, design, and deployment of FL, comparing it to centralized and distributed on-site ML-based systems. Furthermore, the challenges and potential future directions for research in FL were discussed, where some classification problems of FL topics and research fields were also presented, based on a thorough literature review, including taxonomies for its important technical and emerging aspects, such as the core system model and design, application areas, privacy and security, and resource management.
The authors in [11] examined the “In-Edge-AI” model for edge networks to allow for efficient collaboration between terminal devices and terminal servers to exchange learning model parameter updates. They explored two use scenarios: edge caching and compute offloading. Toefficiently support these scenarios, they trained a double deep Q-learning (DDQN) model via CFL. Lastly, the authors in [8], [9], [10], [11], and [12] addressed several existing issues in CFL for actual applications, such as the ability of mobile devices to handle a high computation process and the power consumption and battery life to keep connected to a central server. Furthermore, CFL raises concerns about flexibility because it may cease to function due to the aggregation server’s failure (i.e., due to a malicious assault or physical flaw). Moreover, training CFL models via IoT networks necessitates many communication resources to allow participants to communicate with a central server [14].
Most existing DFL systems are based on gossiping schemes, and the number of neighbours in the learning process are chosen regardless of communication challenges, end-user capability and network capacity. For instance, the works in [15] and [16] implement a classic DFL algorithm that allows a user to aggregate the model parameters from an estimated number of multiple neighbouring devices, and Ramanan and Nakayama [17] propose an alternative approach that uses a blockchain-based FL scheme to aggregate updates for the participants’ devices. However, these approaches suffer from several limitations related to the communication constraints in the real environment applications, the data size and the terminal capability (i.e., energy consumption and computation cost) of blockchain-based transactions. In summary, we list some existing works on FL-related topics with our paper’s contribution in Table 1.
Background and Challenges for DFL Over WMN
The implementation of the DFL approach over a WMN is supposed to reduce communication costs, cope with the single point of failure issue in CFL, and provide innovative capabilities in a range of aspects, including healthcare systems (e.g., monitoring physiological data like heart-rate variability [18] to classify various cardiac pathologies), industry, and smart homes [19], [20], [21].
In addition, the combination of DFL and WMN using mesh protocols will likely be helpful in preserving privacy, guaranteeing a robust network, and improving the battery life of the devices by reducing communication costs. Instead of transferring data from the device to the central server provider, which will need large bandwidth and consume high power on the edge device, the decentralized approach can be applied to minimize these back-and-forth journeys of data. This decentralized fashion is implemented by processing the data into the edge device and communicating with the other neighbours using the mesh networking links to exchange and update the model parameters.
This paper will simulate and implement the DFL model over WMN system protocols. The model and the system performance will be evaluated by training the model using a dataset divided into training data, validation data, and test data. The performance metrics of the algorithms will be prediction accuracy, communication cost, and latency. Both DFL and WMN protocols are implemented in some applications separately and individually.
This research proposes integrating DFL and WMN into one intelligent system to optimize for robust communication networks that could be applied in many IoT applications to ensure participant privacy preservation and data security. Although DFL and WMN have impressive characteristics and features, they reveal several challenges and problems faced by engineers that can influence the model’s accuracy. The following is a short summary of those limitations:
The network will be designed for low-power IoT devices under IEEE 802.15.4-2006 and WMN would need to be designed to coexist in IoT systems with other technologies, and not to replace them [32].
Convergence speed: DFL algorithms usually adopt a peer-to-peer one-layer architecture. Each participant collects and aggregates all the local model updates of one-hop neighbours in a CFL architecture. With such multi-hop architecture, the wireless routing paths between participants can be easily saturated, resulting in a slower convergence speed [33].
Unbalance: the amount of data varies at each participant resulting in different local training data quality.
Lack of stability and flexibility in communication networks of a massive number of devices in real-time applications.
The communication stage is one of four main steps in the learning process that cannot be neglected, and most researchers do not consider it analytically in their research. In this paper, this stage will be addressed in detail. We propose using mesh networking to maximize the communication stage’s flexibility and the channel’s capacity during the learning process. The motivation for this is the fact that the DFL algorithms can efficiently update the terminal edge with the parameters through the Thread protocol or any other mesh networking protocols (e.g., ZigBee or Bluetooth). This will allow us to design and develop a global model that can precisely analyze the end-users data without sharing the data with a central server or any other devices within the network. In other words, the data stays protected locally and never leaves the device itself and this will achieve personalization and guarantee high Quality of Service (QoS) as well as enhance the performance of devices in IoT applications.
To the best of our knowledge, this research is the first work that combines DFL and WMN using the slotted ALOHA protocol. The model results verify the intuition, showing that implementing DFL over mesh networks can offer more flexibility as no central server is required and promises more communication channels available to communicate. More participants can be involved in the learning process in the form of neighbour groups. The rest of this paper will introduce the wireless communication characteristics for the mesh networking and DFL criteria. The wireless communication constraints will be considered and CFL and DFL models will be implemented by simulations.
A. Traditional Machine Learning (ML) on Edge Devices
There are different kinds of ML and deep learning algorithms used in various proposals. For instance, Convolution Neural Network (CNN) algorithms are powerful tools widely used in image classification processing and other classification problems [34] since CNN has a proven ability to achieve higher accuracy, and can efficiently learn from thousands of image datasets. To implement CFL and DFL algorithms, the local ML algorithms are required to be embedded in the terminal devices (participants) to train the algorithm on the local data before sharing the parameters with the server in a CFL approach or with the neighbours in a DFL approach. Details on ML-enabled edge devices challenges and opportunities have been addressed in many research papers in the recent past [35], [36] and they are out of the scope of this paper. In this research, the CFL and DFL models will be implemented for a classification problem, and the CNN algorithm will be the main algorithm that is used to train the local models on the proposed system.
B. Centralized Federated Learning (CFL)
The main objective of CFL systems is to train, in coordination with a central server for model aggregation, a shared global model from participating devices that act as local learners. Figure 2 shows the fundamental CFL architecture and the main four steps to train a CFL network, where these steps are iterated until reaching the convergence status [12]:
Local learning where each edge device uses its local dataset to train the model locally and update the parameters of the ML model (e.g., neural biases and weights).
Upload model parameters to the central server: participants upload (transmit) their parameters to the central server via communication channels.
Global aggregation: the central server aggregates the local models’ parameters from those successfully received to update a new version of the global model on the server.
Download (broadcast) and synchronize the devices with the latest global model updates [14], and then back to step 1 and repeat until system convergences.
In the CFL process, each device’s local algorithm has an optimization technique to update the model iteratively, such as Stochastic Gradient Descent (SGD). Afterwards, the global model emerges from aggregating the local parameters from participants’ devices, which can then be weighted according to the perceived quality of the updates of the devices [34]. One crucial property of CFL is that the participant user data never transfers between devices and the server, which reduces communication costs and data sharing privacy concerns. However, due to the central server node, the CFL system will run into scalability issues. Even if the server node’s hardware and software capabilities have been optimized, the server node’s performance will not improve when thousands of client nodes join [37]. Communication bottlenecks may appear due to the amount of traffic that is increasing exponentially, and the system becomes overburdened. Furthermore, accessing a central node cannot be possible in some scenarios, for instance, self-driving vehicles and high mobility sensor systems. Decentralized architectures have recently been proposed to avoid communication bottlenecks and protect data privacy [38], [39]. Remove the centralized server, and each participant only communicates with its one-hop neighbours in its local area and exchanges their local models and updates parameters [40].
C. Decentralized Federated Learning (DFL)
As shown in Figure 3, the DFL framework does not need a central server to coordinate the training tasks and contains only terminal participants (nodes) [37]. The idea is that each participant exchanges parameter updates with neighbours in a peer-to-peer manner. Besides the local model algorithm, the FedAvg algorithm is employed on each terminal participant to create its global models in the DFL approach with no central server. Each participant trains on its local data and averages it within the aggregated models’ parameters from the selected neighbours using the FedAvg algorithm to broadcast an updated global model to the neighbours again at each iteration. Afterwards, the same procedure is applied to all other participants until the system converges.
In [41], the Combo algorithm proposed an approach in which the participants send and average a segment of the models’ parameters to reduce the required communication bandwidth without affecting the system performance and convergence rate. Even though the proposed DFL algorithms overcome some of the problems associated with general FL systems that require a central server, they use model averages to fuse models at the local clients, which is not always very efficient in heterogeneous data scenarios. For each, local model parameters are updated toward the local optimum, and averaging the model parameters from different clients leads to the averaged outcome of each client’s local optimum being used. The optimum of each participant’s loss function may be quite far from the others, which is also far from the global optimum due to different participants owning different sets of training data. Thus, those datasets typically have different distributions or even no overlap, which is defined as data heterogeneity [42], [43], [44].
Furthermore, most prior works do not consider the wireless environment in the communication network, so they do not account for wireless impairments caused by channel fading, link blockages, and wireless interference. Most researchers choose the number of participants by estimation without concerning the communication medium constraints, which is not always very efficient in terms of reliability and flexibility for real-time applications (e.g., AV and UAV networks).
Therefore, this study will focus on designing a model based on the gossipy method [45] that can deal with decentralized approaches and heterogeneous datasets, and it will precisely analyze the communication process between the participants in the network, considering the interference and the noise in the transmission medium to simulate a real scenario of the communication connection between the terminal devices through the learning process.
D. Wireless Mesh Networking (WMN)
Nowadays, many access points have overlapping areas, and almost each traditional wireless network has to be connected to the wired network. In this scenario, the cost of installing IoT devices is costly and extremely difficult. Thus, a WMN will benefit from its flexibility to connect the devices within the network and offer a different perspective than non-mesh networks. The connectivity needs in wireless mesh networks are reduced mainly because the devices within the network have a multi-route capability to send and receive packets. In addition, the WMN also has a range of advantages such as self-healing and self-organizing, attracting a vast number of investigations and research developments [46]. Furthermore, WMN can reduce the networking cost for innovative home applications using low-profile hardware.
The routing protocols significantly contribute to WMNs as they help find the best path between multi-hop networks in unreliable wireless media. The WMN protocols have been widely investigated to achieve higher throughput, low latency and low power consumption [46], [47]. According to some related research, ZigBee, Bluetooth Low Energy BLE, Z-wave and Thread protocols are the most common protocols used in many wireless mesh-networking applications. Each protocol has its own unique specification for particular implementation depending on the user requirement. For instance, the Z-wave protocol is advantageous for long-range coverage because of its low-frequency band (900 MHz) compared to others with a 2.4 GHz band (i.e., Bluetooth and Thread). According to [48], all protocols achieve similar performance (i.e., latency and throughput) for small networks and small payloads. By contrast, for large mesh networks with multi-hop nodes between the transmitter and the receiver, the Thread protocol achieves better performance metrics in terms of latency and efficiency.
System Model
A. Communication in WMN
In this paper, a typical receiver is considered that is connected to a corresponding desired transmitter. A Rayleigh fading channel is adopted for the small-scale path-loss model and complemented with a single slope large-scale path-loss. Hence, the received power at the typical receiver from the desired transmitter is (
The signal can be correctly decoded at the typical receiver if the corresponding SINR (Signal to Interference plus Noise Ratio) is higher than a certain threshold \begin{equation*} P(SINR\ge T_{k}) =P\left ({\frac {P_{k}h_{k0}d_{k0}^{-\alpha }}{\sum _{i\in \varphi }I_{i}+N_{0}}\ge T_{k}}\right ) \tag{1}\end{equation*}
The proposed IoT network is assumed to have a small thermal noise power variance \begin{align*} P(SIR\ge T_{k}) & \cong P\left ({\frac {P_{k}h_{k0}d_{k0}^{-\alpha }}{\sum _{i\in \varphi }I_{i}}\ge T_{k}}\right ) \tag{2}\\ & \cong P\left ({h_{k0}\ge \frac {T_{k}d_{k0}^{\alpha }\sum _{i\in \varphi }I_{i}}{P_{k}}}\right ). \tag{3}\end{align*}
Since the proposed IoT device network has Rayleigh fading channels, and for the sake of simplicity, the \begin{align*} P(SIR\ge T_{k}) & \cong \mathbb {E}_{a_{i}}\left ({\mathbb {E}_{h_{i0}}\left({{\int _{\frac {\left({T_{k}d_{k0}^{\alpha }\sum _{(i\in \varphi )}I_{i}}\right)}{P_{k}}}^{+\infty }(e^{-x}dx}}\right)\Bigr )}\right ) \tag{4}\\ & {\cong }\,\,\mathbb {E}_{a_{i}}\left ({\mathbb {E}_{{h_{i0}}}\left({{\text {exp}\left({-\frac {T_{k}d_{k0}^{\alpha }\sum _{(i\in \varphi )}I_{i}}{P_{k}}}\right)}}\right)}\right ) \tag{5}\\ &{\cong }\mathbb {E}_{a_{i}}\left ({\mathbb {E}_{h_{i0}}\left({\prod \nolimits _{(i\in \varphi ) }\text {exp}\left({\frac {T_{k}d_{k0}^{\alpha }I_{i}}{P_{k}}}\right)}\right)}\right ) \tag{6}\\ & {\cong }\mathbb {E}_{a_{i}}\left ({\!\mathbb {E}_{h_{i0}}\left({\prod \nolimits _{(i\in \varphi )}\!\!\text {exp} \left({\frac {-T_{k}d_{k0}^{\alpha }P_{i}h_{io}d_{k0}^{-\alpha }a_{i}}{P_{k}}}\right)}\right)}\right ) \tag{7}\end{align*}
The Bernoulli distribution property can be used to simplify (7),
Consequently, the probability of successful transmission for the participant devices within the network area can be explicitly obtained as:\begin{equation*} P(SIR\ge T_{k}) \cong \prod _{(i\in \varphi )} \left ({1-P_{A}+\frac {P_{A}}{\left({1+T_{k}\left({\frac {d_{k0}^{\alpha }}{d_{i0}}}\right)(\gamma _{ki})}\right)} }\right ) \tag{8}\end{equation*}
B. Achievable Transmission Capacity Over Slotted-ALOHA
The ALOHA protocol is a class of fully decentralized MAC protocols [50] that does not perform carrier sensing and attempts to avoid packet collision. The slotted-ALOHA protocol was introduced to enhance the utilization of the shared communication medium and reduce the chances of collisions for multiple transmitting devices by synchronizing the transmission of devices at the beginning of discrete timeslots.
In the WMN, the probability for each device sending a packet to neighbours is \begin{equation*} C_{ALOHA} =P_{A}(1-P_{A})\text {log}(1+ SIR)P(SIR\ge T_{k}). \tag{9}\end{equation*}
From (8), the outage probability (\begin{equation*} 1-\prod \nolimits _{i\in \varphi }\left ({1-P_{A}+\frac {P_{A}}{ \big(1+T_{k}\left({\frac {d_{k0}^{\alpha }}{d_{i0}^{\alpha }}}\right)\left({\frac {P_{i}}{P_{k}}}\right)}}\right )\le \theta _{k}. \tag{10}\end{equation*}
To simplify (9), a natural logarithm is applied to compute the maximum achievable transmission capacity with respect to \begin{align*} \mathrel {\mathop {\mathrm {arg\,max}}\limits ^{}_{P_{A}}}f(P_{A}) &=\mathrm {arg\,max}\biggl (\text {ln}(P_{A})+\text {ln}(1-P_{A}) \\ &\quad +\,\text {ln}(\text {log}(1+T_{k})) \\ &\quad +\,\sum _{(i\in \varphi )}\text {ln}\left({1-P_{A} +\frac {P_{A}}{\left({1+T_{k}\left({\frac {d_{k0}^{\alpha }}{d_{i0}^{\alpha }}}\right)\gamma _{ki}}\right)}}\right)\biggr ) \tag{11}\\ \textrm {s.t.}~\varepsilon _{k} &\le \sum _{(i\in \varphi )}\text {ln}\left ({1-P_{A}+\frac {P_{A}}{\left({1+T_{k}\left({\frac {d_{k0}^{\alpha }}{d_{i0}^{\alpha }}}\right)\gamma _{ki}}\right)}}\right ). \tag{12}\end{align*}
With a Taylor series expansion, which is \begin{equation*} \varepsilon _{k}\le \sum _{(i\in \varphi )}\left ({-P_{A}\left({1-\frac {1}{\Big(1+T_{k}\left({\frac {d_{k0}^{\alpha }}{d_{i0}^{\alpha }}}\right)\gamma _{ki}}}\right)}\right ). \tag{13}\end{equation*}
\begin{align*} \mathrel {\mathop {\mathrm {arg\,max}}\limits ^{}_{P_{A}}}f(P_{A}) &={~ {{ \mathrel {\mathop {\mathrm {arg\,max}}\limits ^{}_{P_{A}}}}}}( \text {ln}(P_{A})+\text {ln}(1-P_{A}) \\ &\quad +\,\text {ln}(\text {log}(1+T_{k})) ) \\ &\quad +\,\mathop {\sum }\limits _{(i\in \varphi )}\left ({-P_{A}+\frac {P_{A}}{\left({1+T_{k}\left({\frac {d_{k0}^{\alpha }}{d_{i0}^ {\alpha }}}\right)\gamma _{ki}}\right)}}\right ). \tag{14}\end{align*}
\begin{align*} \mathrel {\mathop {\mathrm {arg\,max}}\limits ^{}_{P_{A}}}f(P_{A}) & = \mathrel {\mathop {\mathrm {arg\,max}}\limits ^{}_{P_{A}}}(\text {ln}(P_{A})+ln(1-P_{A}) \\ & \quad +\,\text {ln}\left ({\text {log}(1+T_{k}))-P_{A}f_{2}}\right ) \tag{15}\end{align*}
\begin{equation*} \frac {\partial f(P_{A})}{\partial P_{A}} =\left ({\frac {1}{P_{A}}-\frac {1}{(1-P_{A})}-f_{2}}\right ).\end{equation*}
The probability of being in transmitting mode using the slotted ALOHA protocol at \begin{equation*} P_{A}(0) =\frac {f_{2}+2-\sqrt {f_{2}^{2}+4}}{2f_{2}}\end{equation*}
A special case has been adopted by setting the system’s outage probabilities equal to its SINR threshold values to find the maximum transmission capacity based on the slotted ALOHA protocol for the wireless mesh networks. Then (9) can be reformulated to restate the maximum achievable transmission capacity is as follows:\begin{align*} \text {max}(C_{ALOHA}) &=P_{A}(1-P_{A})\text {log}(1+T_{k})(1-\theta _{k}), \\ &=\biggl (P_{A}(1-P_{A})\text {log}(1+T_{k}) \\ &\quad \times \prod \nolimits _{(i\in \varphi )}\Bigl (1-P_{A} \\ &\quad +\,\frac {P_{A}}{(1+T_{k}(d_{k0}^{\alpha }/d_{i0}^{\alpha })}(\gamma _{ki})\Bigl )\biggr ). \tag{16}\end{align*}
From this analysis, the intuition is provided as to what extent the network parameters
C. Latency
The required time to achieve a convergence state in any FL model plays a key role in evaluating the system performance. Therefore, the average learning latency for the participants will be one of the performance metrics. Latency in the proposed centralized and decentralized FL will be defined as the expected time duration (in seconds) required for the model to complete learning in a typical one-hop mesh communication network. Let \begin{equation*} T_{total} =R\sum _{(s_{i}=1)}^{A_{i}}\bigl (T_{cmm}^{(s_{i})}\bigr )+R(T_{computation}+T_{broadcast}). \tag{17}\end{equation*}
The summation of the communication process time
D. Accuracy and Loss
Accuracy and loss functions are the two main model metrics that are mainly applied to adjust the model weights during the training process and to measure the system performance in order to optimize a model (e.g., a convolution neural network model) and solve for example, a face recognition problem. Accuracy is calculated as follows:\begin{equation*} \text {Accuracy} =\frac {\mathrm {Number~of~Correct~Predictions }}{\mathrm {Total~Number~of~Predictions}}. \tag{18}\end{equation*}
Loss is a measure of the difference between the actual output value and the predicted output value by the implemented model. In the classification models whose output values are an array of probability values between 0 and 1, the most common loss function applied is the cross-entropy loss function. The cross-entropy loss function is also known as logistic loss, log loss or logarithmic loss. The probability of each predicted value is weighed against the actual desired output 0 or 1, and a loss is measured based on how far it is from the actual expected value for each sample. A larger loss for significant differences close to 1 and more, and a slight loss for minor differences tending to 0, and therefore, the overall cross-entropy loss of 0 means the model is perfect. The cross-entropy loss function is defined as:\begin{align*} Loss & =-\sum _{j=1}^{m}\sum _{i=1}^{n}y_{(i,j)}\text {log}(p_{(i,j)}),{} \tag{19}\\ & \text {for $n$ classes and $m$ samples}\end{align*}
Therefore, the objective is almost always to increase the accuracy and minimize the loss of the FL models or any other implemented models.
The Learning Criterion for DFL
The designed system considers a group of
Similarly, let
There are
All participants’ devices have to have the same machine-learning model (e.g., a CNN), which have a common weight parameters matrix (\begin{equation*} \mathrel {\mathop {\mathrm {arg\,min}}\limits ^{}_{\textbf {W}}}F(\textbf {W}) \triangleq \frac {1}{A_{i}}\sum _{(i\in \varphi )}f_{i}(\textbf {W}) \tag{20}\end{equation*}
\begin{equation*} s_{i} =1,2,\ldots..,A_{i},\quad \forall \,\,\,SINR\thinspace \,\ge \thinspace \,T_{k}.\end{equation*}
The local loss function for the \begin{align*} f_{i}(\textbf {W}) & =\frac {1}{M}\sum _{(m=o)}^{M}l(h_{\textbf {W}}(X_{i}^{m}),y_{i}^{m}), \\ m&=0,1,2,\ldots,M \tag{21}\end{align*}
At the
In the CFL models, there will be two optimizers’ levels. First is the local model level; the local optimizer in each device to update the local parameters based on the local dataset can use a common machine learning optimizer called Stochastic Gradient Descent algorithm SGD. Second is the global model level; a global model optimiser uses the aggregated parameters from neighbours (i.e., Federated Averaging algorithm (FedAvg)) to create and update the global model.
Each participant \begin{align*} {\nabla f}_{i}(\textbf {W}_{i}^{t})&=\frac {1}{M}\sum _{(m=0)}^{M}\nabla l(h_{(\textbf {W}_{i}^{t})}(X_{i}^{m},y_{i}^{m}), \tag{22}\\ &\quad {{~{{\forall m=1,\ldots,M \text {and }i\in \varphi }}}} \\ \textbf {W}_{i}^{t}&:=\textbf {W}_{i}^{t}-\eta _{i}\nabla f_{i}(\textbf {W}_{i}^{t}). \tag{23}\end{align*}
Here
The total number of local training
For all participants, the updates are simultaneously done where each participant receives the other neighbours’ weights and gradients and averages them with the local weights and gradients.
In the next step, each participant in the network successfully aggregates the local parameters
These weights are then applied on the local model using the device dataset \begin{align*} \hat {\textbf {W}}_{i}^{t} & =\frac {1}{A_{i}+1}\left({\textbf {W}_{i}^{t}+\sum _{(s_{i}=1)}^{A_{i}}\hat {\textbf {W}}_{s_{i}}^{t}}\right) \tag{24}\\ \hat {\nabla }f(\hat {\textbf {W}}_{i}^{t}) & =\frac {1}{M}\sum _{(m=1)}^{M}\hat {\nabla }l\left ({h_{(\hat {\textbf {W}}_{i}^{t})}(X_{i}^{m},y_{i}^{m})}\right ); \tag{25}\\ \hat {\textbf {W}}_{i}^{t} & =\textbf {W}_{i}^{(t+1)}. \tag{26}\end{align*}
Then these new parameters \begin{align*} \mathrel {\mathop {\mathrm {arg\,min}}\limits ^{}_{\textbf {W}}}\nabla F_{i}(\textbf {W})&\triangleq \Bigl [\Bigl (\frac {1}{(A_{i}+1)}(\hat {\nabla }f_{i}(\hat {\textbf {W}}_{i}^{t}) \\ &\quad +\,\sum _{s_{i}=1}^{A_{i}}\hat {\nabla }f_{s_{i}}(\hat {\textbf {W}}_{s_{i}}^{t})\Bigr )\Bigl |A_{i}\ge 1\biggr ]\le \varepsilon _{k}. \tag{27}\end{align*}
The participants will communicate and exchange the parameters and the system loss function’s values over a wireless mesh network via a peer-to-peer manner. The number of successful transmit devices (participants) is subject to wireless communication constraints. Consequently, it is possible to show that every device
To sum up, the DFL process is divided into
Decentralized Federated Learning
All participants have initial weights with
for each iteration
for each device
for
end
from the
receive
If
Yes: end process (Gradient Convergence)
No: continue
end
end
Simulations and Results
A. The Simulation of the Network Communication
The proposal network for the CFL and DFL approaches will have many participants were distributed on a two-dimensional bounded space. A large-scale circular area with radius
On the one hand, the central point in a two-dimensional bounded space will be the position for the central server of the network in the CFL model as illustrated in Figure 4 (a), and the participants will be randomly distributed within the target area. However, the successful transmission and participation in the learning process will be subject to wireless communication constraints (see section IV) to approximate the real situation applications.
(a) The distribution for random participants around the CFL centre server. (b) Example of three participants communicating with neighbours in DFL approach.
On the other hand, in DFL, each point (participant)
The positions of the participants are independently and randomly distributed within the network area (circular area), where the distance for each participant
The participants are assumed to have equal transmit power (0.8 Watts). Only a finite number of participants can simultaneously exchange and update their parameters based on the slotted ALOHA protocol. When the given parameters (i.e., signal power, distance, and interference) were applied in the SINR equation 8, the theoretical results matched the simulation results, and both proved that the number of successful transmitter devices decreased when the SINR threshold increased for different intensity of users in the network, as shown in Figure 5.
The simulation (markers) and theoretical results (solid lines) of the relationship between the SINR threshold and the probability of success for the participants within different network intensities.
In contrast, reducing the SINR threshold allows more participants to involve in the learning process, but the required bandwidth and the system’s latency will also increase. Therefore, there is a need to examine the trade-off between the probability of success and the SINR threshold to achieve higher throughput and capacity and acceptable latency.
Consequently, the proposed wireless communication model outcomes in Figure 5 confirm the conclusions of the theoretical analysis in (8) and (16) where a trade-off between the probability of success transmit and the SINR threshold is required to satisfy the FL network target in term of the capacity and the number of users (participants) within the network during the learning process.
B. CFL Setup and Simulation
The simulation settings are proposed for implementing an FL with a central server deployed in the centre of the target disk area with a radius
To simplify the simulation, all participants are assumed to have the same transmit power and are all randomly distributed around the centre within the network area. The system will be trained using Python and Tensorflow APIs frameworks, the well-known MNIST dataset and a local algorithm on the edge devices to perform digit number recognition from handwritten images. And while this is a simple problem and well-understood problem, it is being used here to illustrate the principle. The MNIST dataset contains labelled data samples of 60K images, each with a size of
Low mobility scenarios where the number of participants’ success transmitted in the system can be the same for all iterations until the system convergences.
High mobility scenarios where participants move in a wide geometric area and many participants are IN and OUT of the network during the learning processes, such as autonomous vehicles and Unmanned Aerial Vehicle (UAV) networks. Thus, the number of successful transmits is usually different in each learning iteration.
This simulation will implement the first case, low mobility scenarios, and we will leave the second case for subsequent works. In both cases, the number of successful transmit participants in the system is subject to the communication network constraints (described in section IV) and the effect of network parameters in terms of the devices’ intensity and participants’ distribution.
Based on the proposed wireless communication model in VI-A, the system will be allocated the target SINR threshold (−10 dB) in order to achieve a higher transmission capacity, the probability of the participant to be in a transmit mode
Consequently, the total number of participants within the target area is 80 devices, but the number of successful transmit participants in the CFL process was 65 participants.
In this study, the CFL with a central server is designed by using the FedAvg algorithm as a global model optimizer. The main function of the server algorithm is to aggregate and average the participants’ parameters to update the new global model at each iteration and then measure the accuracy and the loss of the model outcomes.
The system evaluation for the CFL network optimization regarding the accuracy and loss used the cost function Cross-Entropy. The simulated system in subsection VI-A has 65 successful transmit participants as regards the communication constraints in subsection IV.
It can be noted from Figure 6 that the procedure moved progressively towards the global minimum dramatically in the first 50 epochs, where the accuracy increased from 17% to 90%. After 200 iterations, the model moved progressively and achieved the convergence state with the accuracy and loss of 98.1% and 0.15, respectively. The latency of the model was 1850 seconds, which was the required time to achieve the predefined convergence bound. Despite having applied constraints for the proposed CFL model to build a robust CFL network by considering the communication constraints and real environment scenarios in the simulation, the model was capable of converging faster and achieving slightly higher accuracy and lower loss than the FL baselines that have not considered the communication constraints in practice. In other words, the baselines estimate a fixed number of participants in the learning process without considering the network challenges, which cannot be reliable in real applications. In contrast, our proposed CFL model defines the participants based on the communication model and considers the real applications environment’s constraints to obtain a trustworthy and robust network while achieving high accuracy and low loss.
Although the system achieved high accuracy and reached convergence after around 200 iterations in the designed simulation, there is potentially a single point of failure at the central server. The system, in reality, could be completely down if any failure occurs in the central server or the link to the server is blocked (communication bottleneck).
C. DFL Setup and Simulation
The set-up simulation uses Python and Tensorflow APIs frameworks to implement and evaluate the design of the DFL system over a WMN for IoT devices. In order to make an accurate comparison, we concentrate on the same previous CFL design in terms of the communication links constraints and the parameters model optimizers, low mobility (fixed) and the same total number of participants in the learning process but without a central server. The system was evaluated based on performance metrics; the validation accuracy, the latency and the convergence speed rate. The simulation settings are listed as follows:
Training settings
The participants trained a classification CNN model on the MNIST datasets. As in CFL, each participant will have 1000 random samples to train the local models, but the parameters in this DFL approach will be shared directly with neighbours to create and update the global models at each device without needing a central server.
The local models of each participant were trained and updated using a stochastic gradient descent (SGD) optimizer with the same learning rate of 0.01 and batch size 32 (hyper-parameters) in CFL simulation.
Furthermore, the designed system also used another two different local models optimizers’ algorithms, the Adam algorithm [52] and Root Mean Square Propagation (RMSProp) optimizer [53], to observe how the performance of the DFL model gets affected by implementing different optimizers’ methods.
Network Settings
In the simulation, the network is designed to simulate the communication stage between the participants’ devices in a peer-to-peer manner over wireless mesh networking, avoiding the communication bottleneck challenge in the central server in the CFL scenario. Each participant will successfully communicate and exchange the parameters with neighbours if the desired transmitters have an SINR over the target threshold and no collision occurs. The number of successful transmits participants is configured to be variable between the participants in an asymmetric manner. For instance, there are two participants A and B within the network, and participant A can receive parameter updates from participant B. In contrast, participant B does not have to successfully obtain the parameters update from participant A unless he satisfies the network communication constraints.
Comparison Setting
The DFL system outcomes will be compared with the CFL outcomes in terms of system performance. Both DFL and CFL were implemented to train their models on the same factors (i.e., the dataset, the total number of participants within the network and the network geographic area) in order to make them comparable. Thus, the accuracy, loss and latency and convergence speed were measured.
Simulation Results
Based on the successful transmit conditions and the network capacity, every device within the network created its one-hop neighbours’ group to exchange parameters and performed the CFL model training. The model results have been recorded for some random participants in the DFL simulation as an example to evaluate the system behaviour.
The designed network shows that these random recorded participants are connected successfully with a variant number of neighbours (who meet the communication constraints and they are able to exchange parameters and update their global models).
The DFL’s simulated system has shown that the classification prediction achieved about 90% accuracy in the first 20 iterations for all these recorded participants with any of the three different model optimizers mentioned in the training settings (SGD, ADAM and RMSprop optimizers).
The results and statistics of some participants within the network are illustrated in Figure 7, and Table 2. The results for the DFL approach without a central server show high accuracy and low cross-entropy loss in predicting the digit number from the handwritten samples after 135 iterations.
In this study, the wireless mesh networking model in the section (VI-A) is integrated with the DFL model to verify the number of successful participants during the learning process. The system shows that each participant will be able to communicate with particular neighbours depending on the neighbours’ locations, the desired participant transmit power and other devices’ power interference.
The latency for each participant varies slightly depending mainly on the number of iterations that each participant is required to achieve the system convergence status. In this study, the results have shown that the average latency for the participants was 420 seconds with the assumption the computation and broadcast time are equal for all participants. The expected system convergence in the DFL model is around 130 iterations on average.
As shown in Figure 7, the system achieved sufficiently high accuracy and low loss in the data predictions. The random chosen participants’ records show that the participants can learn in parallel and follow similar progress toward convergence in terms of accuracy and loss. The designed system offers the benefits of utilizing the DFL framework, where the data never leaves the participants’ devices, and privacy restrictions exist. In comparison with the centralized model, the decentralized model achieved competitive results without needing a central server. For instance, participant 1 reached 98.2% accuracy and 0.088 cross-entropy loss after only 135 iterations.
In contrast, after more than 200 iterations, the centralized model achieved the convergence state with the accuracy and loss of 98.1% and 0.15, respectively. Based on the latency criteria in the subsection IV-C, CFL has higher latency than DFL as the system outcomes find that the communication cost and the number of iterations of CFL were higher than the DFL approach.
Thus, DFL can reduce the latency and loss and increase the convergence speed in which they can outperform CFL.
From these results, we can conclude that the DFL models over WMN produced significant developments in classification prediction using sufficient datasets. In this study, the DFL framework is combined with the wireless mesh networking using the Slotted-ALOHA protocol to improve communication between the participants during the learning process.
The DFL approach over WMN using the slotted-ALOHA protocol could be very competitive to the CFL. The simulated models prove that the designed DFL system can achieve better latency, more flexibility, and similar accuracy without installing a central server in the network.
Conclusion
To sum up, DFL reduced the communication cost compared to CFL as the participants’ devices communicate directly and send the packets of their parameters and updates with only one-hop neighbours using slotted-ALOHA as the devices’ MAC protocol. The network topology was a mesh network topology. In this study, the network communication model simulated the real scenario of the mesh networking topology considering the frequency interference in the network environment and then combined it with the DFL model to train the network efficiently and reliably. The effectiveness of combining the DFL framework and the mesh networking protocol results in a comprehensive improvement in the model performance, and the DFL approach is becoming very competitive compared to CFL.
The following is a summary of this research and future work:
Analysis of the wireless communication stage between the participants during the learning process in order to simulate the real scenarios of interaction between the IoT devices, reducing the communication resources and increasing the system flexibility.
Implementing the DFL system using the FedAvg algorithm to enforce consensus techniques by sharing local model updates; established gossip methods are also extended by consensus.
Emphasis on an experimental IoT setup, considering convergence speed, complexity, communication cost, and average prediction accuracy on the DFL embedded devices.
The CFL and DFL algorithms were implemented by considering the communication stage challenges in the simulation to calibrate the real environment scenarios and the real applications.
In the future, DFL models need to be designed for high-mobility sensors and devices in the wireless mesh networking system to build a robust system, increase the system flexibility and scalability and enhance the performance for some applications, such as a Driverless Transport System.