Incentivization and Aggregation Schemes for Federated Learning Applications

Currently, the data collected by the Internet of Things (IoT) still relies on the cloud-centric data aggregation and processing approach for preparing machine learning models. This approach puts the privacy of the participants at risk. In this paper, federated learning (FL) is proposed for privacy-preserving collaborative model training on data distributed across IoT users. To motivate participants, we must incentivize the whole process by rewarding each participant for their contribution to the training process of the federated learning model. The process of collective training takes place over a long duration of time and multiple iterations. However, participants in the training process may have varying levels of willingness to participate (WTP) and may contribute duplicate or poor-quality data. Therefore, in each iteration, participants must be rewarded based on their contribution in that specific iteration. In this paper, a methodology to reward each participant based on their contribution and a model aggregation technique are proposed. The aggregation technique uses Polyak-averaging to aggregate weights of local models, with the weightage assigned to each local model being proportional to its accuracy on the test dataset. Performance evaluation shows that the federated learning model formed using our aggregation approach achieves the performance level of machine learning as we perform more iterations and performs slightly better than the model formed using the FedAvg algorithm. Additionally, our incentivization methodology provides better performance-based rewards compared to other profit-sharing schemes.


I. INTRODUCTION
W ITH recent developments in the applications of the Internet of Things (IoT) combined with Artificial Intelligence (AI), numerous opportunities for innovation and rejuvenation have opened up for traditional industries. Using data from a relatively small number of volunteers spread across a vast region, we can create a generalized model accessible across the devices of a large number of users. Conventionally, to harness the benefits of IoT, the data collected through volunteers' smart IoT devices is transmitted to the cloud to create a centralized machine learning model [1], [2], [3]. This centralized model is then made available as an endpoint for performing inferencing tasks, thereby generating revenue for the model owner. Sending raw data to the remote cloud exposes the data to attacks and incurs larger bandwidth costs and transmission latency. Moreover, sending data to the cloud fails to provide quality of service (QoS) and a satisfactory user experience due to network congestion and traffic [4]. Additionally, since the transmitted data inevitably contains private information, it can be misused for targeted advertisements and other malicious activities such as burglary [5], [6], [7].
To address data privacy concerns, it is preferable to keep raw data on the local device. However, for an effective AI model, a large amount of data must be available. Federated Learning (FL), proposed by Google in 2016 [8], provides an alternative to the cloud-centric approach by allowing models to be trained on local devices, with only the model weights needing to be updated on the cloud. This approach involves the existence of two types of models: a global model and multiple local models. Local models are prepared by training on local data, and their weights are transmitted to the cloud where they are aggregated to form the global model [9]. In the next iteration, the global model's weights are sent back to each local device, where they are recalculated using the local data, thus creating a new local model. To prevent battery issues, local devices may perform model training during idle periods, and users' devices can contribute to training and sharing the model while being charged [10]. This collaborative approach allows many users to contribute data for creating a global model while keeping the raw data on the local device [11], [12], [13].
The focus of this paper is on the FL solution, where user participation is a significant concern. Different users may have varying willingness to participate (WTP) at different times. Therefore, an incentivization scheme needs to be developed to ensure sustainable participation with a dynamic WTP, rewarding participants based on their contributions. Furthermore, existing solutions are vulnerable to malicious behavior by local devices [14], [15]. Malicious devices can undermine the accuracy of the global model by providing fake data and altering the weights during the training process, potentially rendering the global model unusable [16], [17]. The lightweight nature of the training devices, not primarily designed for deep learning, makes perpetual training detrimental to their sustainability [18]. Moreover, current solutions do not fully prevent privacy breaches. Although the training phase is decoupled from centralized data management, model inversion attacks can be performed against gradients or uploaded model parameters to gain access to private raw data [19]. Additionally, divergent data supplied by different IoT devices can slow down the aggregation process even with well-defined initial parameters [20], [21], [22]. The data provided by local devices also depends on communication latency and computational ability, leading to imbalanced data distribution [14]. Therefore, a central server should discard the local straggler's model during the training process, ensuring that its weights are not considered in altering the weights of the global model.
In this paper, we propose a scheme for intelligent aggregation of local models and incentivization for participants with a variable WTP, ensuring robustness to dynamic participant behavior, including participant dropouts. We design a contract-theoretic incentive mechanism that tracks the contributions of participants, allowing the model owner to reward them based on their level of contribution towards creating a global model. The major contributions of this paper can be listed as follows: 1) The proposed incentivization model can be applied to any FL application that aims to incentivize participation in the process, thereby addressing the privacy concerns of the cloud-centric approach.
2) The proposed incentivization model quantifies the willingness to participate by considering the quantity of data contributed, taking into account the system resources of the device and the impact of the contribution on the global model.

3) The proposed incentivization model offers improved
performance-based rewards compared to other profitsharing schemes while also encouraging long-term participation. 4) The proposed aggregation model prevents the weights contributed by underperforming participants from aggregating with the global model.

II. RELATED WORK
Notable contributions have previously been made in the field of edge computing and FL. Initially, edge-computingbased solutions were proposed [23] to alleviate the burden on networks and cloud servers due to the increasing number of devices and to enhance privacy. To leverage the high bandwidth and low latency between the edge server and the client, computation offloading [24] was proposed to divide and offload complex computing tasks execution from mobile devices to the edge server. FL has been proposed as a means to preserve user privacy while collaboratively forming a model on distributed devices. The authors in [18] proposed optimization algorithms for FL based on edge computing. Islam et al. [25] proposed differential privacy techniques to prevent model inversion attacks.
Several incentive-based mechanisms have been proposed in order to encourage user participation. The authors in [26], [27] made the optimistic assumption that all devices would unconditionally comply when asked to participate, which is impractical due to the resource costs incurred by participants in training a model. Irfan et al. in [28] explored the leader-follower approach, where the leader auctions for followers to form coalitions. The authors in [29] proposed a contract theory that maps contributed resources to appropriate rewards. In [30], the interaction between devices and the owner was modeled as a Stackelberg game, where the model owner purchases the services provided by the devices. A Deep Reinforcement Learning (DRL) based approach that learns system states from historical training records was proposed in [31].
Previous works on developing a robust incentivization model did not take variable WTP into consideration and mostly relied on the assumption that participants would exhibit the same behavior in the future as they did historically. However, in a highly dynamic world, this is practically impossible. The proposed incentivization and aggregation models consider each contribution as a standalone entity and reward participants based on their individual contributions. Furthermore, Table 1 lists the major contributions, strengths, weaknesses, and recent works related to devising incentivization models.

III. SYSTEM MODEL
In this section we explain the proposed model in detail. The model is divided into two parts, namely, the incentivization model and the aggregation model. The incentivization model focuses on rewarding each participant based on their contribution, while the aggregation model presents a method to combine local models into a new global model.

A. INCENTIVIZATION METHOD
The proposed incentivization model in this paper involves two main entities: the model owner and the participant.
Participants can be categorized into two types based on their incentive to participate in the process, referred to as Type I and Type II participants.
Type I participants contribute their data to create a model and seek monetary incentives from the process, while Type II participants utilize the prepared model in their daily activities, gaining non-monetary benefits. Since Type II participants only use the model without contributing their data, they are not provided with monetary benefits. The focus of the incentivization scheme proposed in this paper is on Type I participants.
Existing solutions using Federated Learning (FL) assume that Type I participants have the same willingness to participate (WTP) throughout the model creation process. Therefore, they are rewarded equally for all their contributions, regardless of the quantity and quality of the data they provide.
The proposed scheme utilizes a contract to record every contribution made by participants during the process. The contract is updated by evaluating the impact of each participant's local model's weights on the accuracy of the global model after aggregation. Initially, Type I participants will be contacted by the model owner, and upon agreeing to the terms and conditions, they will need to provide their device's IMEI or equivalent identification number, which will be used for their contributions. If a participant changes their device, they will need to register again with the new device.
It is assumed that the model owner will contact participants in a way that ensures there is no class imbalance in the classification problem. The system resources of the contributing device can be determined using the identification number and measured relative to a standard device. The entire process is divided into two phases. The first phase concludes with the preparation of the initial global model, while the second phase encompasses all future requests for updating the global model. Figure 1 illustrates the practical working procedure of the proposed methodology. The diagram consists of three parts: the first part includes the global model and its owner, the second part represents the cloud, and the third part represents the devices of Type I participants. The cloud maintains two crucial components of the proposed methodology: the test dataset and the contract. The test dataset is a pre-prepared dataset for a similar problem, stored in the cloud to calculate rewards based on the quality of data contributed in each iteration. Upon receiving the weights of the local model and the amount of contributed data from all the participants, the contract calculates the rewards, and the aggregation process begins in the cloud. The global model is updated after each iteration, benefiting from a larger amount of data compared to the previous iteration, which enhances its robustness. Since the test dataset and the contract stored in the cloud do not contain any sensitive participant data, the proposed methodology does not pose a risk to participant privacy. The three types of devices shown on the rightmost part of the figure indicate that participants are evaluated based on the quantity and quality of their contributions, without bias, irrespective of the size and system resources of their devices. Once selected by the model owner, each participant starts collecting data on their device. Upon the model owner's request to update the global model, the participant initiates local model formation using the collected data, and the weights of the local model are sent to the cloud. The data used for forming a local model in any participant's device is kept separate and not used for future local model formations. Additionally, Table 2 provides a list of symbols used in the mathematical model along with their respective meanings.
In the first phase, participants are rewarded solely based on the quantity of data they contribute, taking into account  The incentivization model considers that participants should respond to the model owner's request within a specified time limit. Contributions made after the deadline will not be rewarded. The incentivization scheme takes various parameters into account, including the amount of data contributed, the system resources of the device, the consistency in responding to requests, and the impact of the participant's contribution on the global model. The final reward after any iteration can be calculated as follows: where for participant i in iteration j, ι ij represents the reward and α 1ij represents data contributed relative to other participants, α 2ij represents data contributed given system resources relative to other participants, and α 3ij represents the impact the local model had on the global model. λ 1 is the reward received per unit of α 1ij , λ 2 is the reward received per unit of α 2ij , and λ 3 is the reward received per unit of α 3ij .
Let us consider that we have n different devices with system resources, η 1 , η 2 , . . . , η n , and participant i contributed β ij amount of data in iteration j. We can calculate the total data contributed and total data contributed given system resources as, Let φ i be the number of times participant i responded to the request made by the model owner for an update. Let χ i and ψ i be the number of times the contribution made by participant i had a positive impact and a negative impact on the global model, respectively.
Assuming β ij is not equal to zero we have, At j = 0 we are in the first phase, otherwise, we are in the second phase.
In the first phase, we do not have a global model prepared. Therefore, we cannot calculate ϵ ij , which is the impact that local model's weights of participant i had on the global model in iteration j, hence α 3ij = 1 at j = 0.
In the first phase, the reward of every participant can be calculated using Equation (1) where, For the second phase, the global model's weights are sent to all the participants with the request to update them on their local data.
The impact of the local training on the global model is calculated as, where ω(µ j−1 ν ij ) is the accuracy of the model, formed by fitting the local training data on the global model's weights, on the test dataset, and ω(µ j−1 ) is the accuracy of the previous global model on the test dataset. Note that The participant's contribution to the formation of a new global model can be classified into a positive contribution and a negative contribution based on the value of ϵ ij . A positive contribution should be rewarded, while a negative contribution should be penalized. Erratic participants are those whose contributions negatively impact the global model. However, since negative contributions can sometimes be unintentional, it is important to consider human error and not be too harsh on the participants. To address this, participants who make a negative contribution more than a threshold κ times will not receive any reward for another negative contribution, and the weights contributed by that participant will be discarded in the aggregation process. The threshold ensures that participants are not discouraged by the fear of failure, while also preventing participants from taking the model owner's rewards for granted. For positive contributions, participants are rewarded in a similar way as any other participant with a positive contribution. This approach ensures that participants are appropriately rewarded for their positive contributions while considering the impact of negative contributions and encouraging improvement.
Given β ij is not equal to 0, if ϵ ij >= 0, we increment χ i as, Otherwise we increment ψ i as, We must choose the function such that the amount of reward and penalty increases for higher values of ϵ ij . Therefore, we calculate tan( ϵ ij ) due to its property of tending to ∞ and −∞ when ϵ ij tends to 1 and −1 respectively.
In the second phase, the reward of each participant can be calculated using (1), where Here, the term tan( ϵ ij ) is the reward for improving the accuracy of the global model.
Given ψ i < κ, we calculate the reward by adding tan( ϵ ij ). In case of a negative contribution ϵ ij < 0, a negative value will be added to the reward. Hence, a penalty will be added which reduces the reward of the participant which he received for contributing data and spending system resources. The value of α 1ij and α 2ij will be calculated similarly as given in equations (12) and (13), and the value of α 3ij will be given by: In case ϵ ij = 0, tan( ϵ ij ) will be equal to 0. In our equation α 3ij will be equal to 0, hence, the reward will depend on the values of α 1ij and α 2ij .
In case ψ i >= κ and ϵ ij < 0, ι ij = 0. Algorithm 1 outlines the process of creating a local model. In the beginning, a local model for participant i in iteration j, denoted as localModel ij , is initialized using the baseline model configuration referred to as baselineModel in the algorithm. If we are in the first phase, the local model is trained on the participant's local data using the default initial weights. These default initial weights are determined through Xavier Glorot Initialization. Once trained, the local model is sent to the cloud. If we are in the second phase, the initial weights of the local model are set equal to the weights of the existing global model. The local model is then trained on the participant's local data using these initial weights. The updated local model is sent to the cloud. The algorithm provides a step-by-step guide for initializing and training the local model, ensuring that each participant's contribution is based on their own data while maintaining consistency with the global model. Algorithm 2 shows the process of rewarding each participant. The meaning of all the symbols used in this algorithm are explained in Table 2.

B. AGGREGATION OF LOCAL MODEL
To create a robust global model, we need to aggregate the local models received from participants. In the first phase, the global model is selected as the best-performing local model based on its performance on the test dataset. In the Algorithm 1 Local Model Generation Algorithm.

7:
localModel ij with weights µ j−1 ν ij fits local training data and we get updated µ j−1 ν ij 8: end if 9: Weights µ j−1 ν ij are sent to the cloud along with the amount of data trained upon in bytes and system resources of the participant's device Algorithm 2 Reward Calculation Algorithm for Each Participant. 1: λ 1 , λ 2 , λ 3 ← constants 2: We have the system resources η i of the device and the amount of data in bytes β ij contributed by participant i in iteration j. Using these we calculate 12: if β ij ̸ = 0 then 13: 15: if ϵ ij > 0 then 16: 18: else 19: ψ i ← ψ i + 1 20: if ψ i < κ then 21: else 23: ι ij ← 0 24: end if 25: end if 26: end if 27: end if second phase, Polyak averaging is applied to aggregate the weights of multiple local models. Polyak averaging calculates a weighted average of these local models, with different weightage given to each model in each phase. In the first phase, the weight of the best-performing local model is set to 1, while the weights of other local models are set to 0. Thus, at j = 0, where ν ij represents the weights of the model contributed by participant i in iteration j, and ξ ij is the weightage applied to this model. The value of ξ ij is given by: for the best-performing local model, and for other local models. In the second phase, the weightage of each local model is proportional to the accuracy of the local model formed by training the global model's weights on the participant's local data. Thus, the global model in the second phase is determined as: where ξ ij represents the weightage applied to µ j−1 ν ij , the weights of the model formed by training the global model on the participant's local data. The weightage assigned to each local model is calculated as: where τ ij is a linear transformation of ω(µ j−1 ν ij ), and is given by: Algorithm 3 demonstrates the process of aggregating local models into a global model. It starts by initializing the global model globalModel j in iteration j with the baseline model configuration denoted as baselineModel, where all the weights are set to 0. The weights of the global model are then aggregated based on the proposed aggregation model. Once the weights are finalized, the global model is sent to the model owner.

IV. RESULTS
In this section, we describe the experiments conducted to evaluate the performance of the proposed incentivization and aggregation methodologies.

A. EXPERIMENTAL SETUP
A simulation of the proposed methodologies was performed using the MNIST handwritten digits and Extended MNIST (EMNIST) letters datasets. Both datasets are derived from the NIST Special Database 19 [33] and converted to a 28 × 28 pixel image format. MNIST, which stands for Modified if ψ i < κ then 5: if j = 0 then 6: µ j ← µ j + ν ij * ξ ij 7: else 8: end if 10: end if 11: end for 12: The weights of the globalModel j are set to µ j and it is sent to the model owner National Institute of Standards and Technology, is considered one of the standardized datasets used for learning, classification, and computer vision systems [34]. It has also been used for performance analysis of recently proposed FL techniques [35], [36]. The MNIST handwritten digits dataset consists of 60,000 images in the training dataset, with 10 classes covering digits from 0 to 9, and 10,000 images in the test dataset. The EMNIST letters dataset merges all the uppercase and lowercase classes of the English alphabet to form a balanced 26-class classification task. The EMNIST letters dataset has 145,600 images in the training dataset and 20,800 images in the test dataset [37]. The FL process was conducted with four hypothetical Type I participants who contributed a random amount of data across 10 iterations. The images and their labels from the aforementioned datasets were distributed randomly to ensure that the setup reflects a practical environment. The baseline model used consists of a simple neural network with one hidden layer, with the same number of neurons as the total number of pixels in each image of the dataset. The Rectified Linear Unit (ReLU) is used as the activation function for neurons in the hidden layer. The softmax activation function is used for the neurons in the output layer to obtain a probability for each of the 10 classes for each input. The Adam gradient descent algorithm and logarithmic loss function are used for learning the weights. The number of epochs was fixed at 10, and the batch size was set to 200. Default values were used for the learning rate and other important hyperparameters to keep the baseline model as simple as possible. The purpose here is not to increase the performance of the classification algorithm, but to compare the performance of the Machine Learning and FL models and to reward each participant based on the quality of their contribution and their WTP. In the incentivization model, the hyperparameters are set as λ 1 = 1, λ 2 = 2, and λ 3 = 4. These values indicate that, for the hypothetical model owner, the impact created by any local model on the global model is twice as important as the quantity of data contributed by any participant, given their system resources, and four times as important as the quantity of data contributed by any participant. The value of κ is set to 2, which means that the model owner will not reward participants solely based on the quantity of data contributed if the participant has contributed poor quality data more than twice. It's important to note that these values may vary from one model owner to another. Figure 2 shows the accuracy of the model prepared by applying FL and the baseline Artificial Neural Network (ANN) model to the total number of images contributed in each iteration. The x-axis represents the iteration number, and the y-axis shows the percentage accuracy achieved. In the early iterations, the performance of FL is worse compared to Machine Learning, as in the latter, all the images are trained as a whole, whereas in FL, only the weights are aggregated. The comparison shows that our proposed FL model as any other existing FL model follows the trend of performing comparable to the ML model as we increase the number of iterations.

B. PERFORMANCE EVALUATION
In the MNIST handwritten letters dataset, the difference between the accuracies of Machine Learning and FL techniques on the test dataset reduced to 0.36%, making the performance of both approaches almost similar. In the EMNIST Letters dataset, after 10 iterations, the difference in performance was around 2.62%, which makes the performance of both approaches moderately comparable. The baseline model in both datasets can be altered to enhance the performance of Machine Learning and FL. Figure 3 compares the performance of the FedAvg algorithm with the proposed aggregation model. The FedAvg algorithm [38] is the most commonly used aggregation scheme for Federated Learning. Our aggregation scheme takes inspiration from the FedAvg algorithm and performs Polyak averaging of the local models' weights, with more weightage given to higher accuracy models. The FedAvg algorithm serves as an important benchmark for comparing our aggregation scheme with existing FL aggregation algorithms. From the figure, we can see that the proposed model shows greater accuracy on the test dataset for the first iteration compared to the FedAvg algorithm for both datasets. As we progress in the number of iterations, both schemes become comparable. The reason for the comparable performance of FedAvg and our proposed aggregation model is the use of high-quality standard datasets for the experiments. Even when dividing the data randomly among participants, each participant's local model is trained on unbiased high-caliber data, resulting in almost equal weightage being applied to every contributed local model's weights for aggregation using Polyak averaging. In a practical environment, we cannot expect such a scenario since we cannot assume good quality data from every participant. In the real-world scenario, our proposed aggregation model is expected to have an edge, as more accurate models will have a higher contribution to make a global model. Figure 4 compares each attribute mentioned earlier for each contribution across iterations of two participants, along with the reward they earned relative to other participants. The x-axis denotes the iterations, and the y-axis denotes the normalized reward received and the normalized value of the attributes. Since both the reward and attributes are normalized, their values range from 0 to 1. In the performed simulations, Participant A contributed good quality data across all the iterations, while Participant B was an erratic contributor, as it did not contribute in iteration number 5 and 10. Therefore, no reward was provided to Participant B in those iterations. We can observe that the reward is a function of all the attributes combined and does not depend solely on a single attribute.   joins the process. We plot the normalized reward received by a participant using our proposed model and the reward they would have received in other schemes against the iteration number. The Consistent Participant symbolizes a participant who regularly contributed good quality data, whereas the Erratic Participant depicts a participant who did not respond to each request made by the model owner and contributed inferior quality data. Since we considered 4 participants in this simulation, each participant should receive 25% of the total reward allocated according to the Egalitarian scheme. For the Marginal Gain scheme, we considered that the relative reward received by the participant is equal to the change in accuracy of the global model with and without that participant in a particular iteration. We can observe that for a Consistent Participant, the proposed incentivization model rewards more than both the traditional schemes. In the case of an Erratic Participant who did not contribute in iteration number 5 and 10, the Egalitarian Scheme gives undue rewards in case of no contribution, whereas the rewards from the Marginal Gain scheme hardly change after the first few iterations. The reason is that as the number of iterations increases, the global model's accuracy tends to stabilize, hence the marginal gain of the whole process does not change much. We could have included the amount of data contributed by a participant in a particular iteration along with the change in accuracy as a part of the gain in the Marginal Gain scheme, but that would have made this scheme highly dependent on the data contributed as the number of iterations increased. Our proposed incentivization methodology improves upon these existing methodologies by rewarding participants according to the quality and quantity of their contributions.

V. CONCLUSION
In this paper, we proposed an incentivization and aggregation methodology that is robust against the variable WTP of participants in a practical environment. The proposed incentivization model utilizes the data contributed by each participant and system resources for each participant's device, as well as the impact that any local model's weights have on the global model. The impact on the global model can be calculated by measuring the change in accuracy of the existing global model on the test dataset before and after training the existing global model's weights on the local data. This change in accuracy is also utilized as the weightage assigned to each local model during the Polyak averaging in the proposed aggregation model. We conducted simulations on the MNIST and EMNIST datasets to demonstrate the applicability of our proposed model. We can apply our proposed incentivization and aggregation methodologies to any specific use-case by using a targeted test dataset and a carefully tuned baseline model. Having a test dataset tailored to the particular use-case is crucial for evaluating the quality of each contribution and determining appropriate rewards. The test dataset is considerably smaller than the training dataset, therefore it can be prepared using the model owner's resources based on the given problem statement. This way, we can effectively utilize the proposed federated learning incentivization and aggregation methodologies. Our performance evaluation shows that FL with the proposed aggregation model performs similarly to machine learning and slightly better than the FedAvg algorithm as the number of iterations increases. Moreover, the proposed incentivization model provides more rewards to consistent participants compared to the Egalitarian and Marginal Gain Profit Sharing schemes.