Decentralized and Incentivized Federated Learning: A Blockchain-Enabled Framework Utilising Compressed Soft-Labels and Peer Consistency

Federated Learning (FL) has emerged as a powerful paradigm in Artificial Intelligence, facilitating the parallel training of Artificial Neural Networks on edge devices while safeguarding data privacy. Nonetheless, to encourage widespread adoption, Federated Learning Frameworks (FLFs) must tackle (i) the power imbalance between a central authority and its participants, and (ii) the challenge of equitably measuring and incentivizing contributions. Existing approaches to decentralize and incentivize FL processes are hindered by (i) computational overhead and (ii) uncertainty in contribution assessment (Witt et al. 2023), limiting FL's scalability beyond use cases where trust between participants and the server is established. This work introduces a cutting-edge, blockchain-enabled federated learning framework that incorporates Federated Knowledge Distillation (FD) with compressed 1-bit soft-labels, aggregated through a smart contract. Furthermore, we present the Peer Truth Serum for Federated Distillation (PTSFD), which cultivates an incentive-compatible ecosystem by rewarding honest participation based on an implicit yet effective comparison of worker contributions. The primary innovation stems from its lightweight architecture that simultaneously promotes decentralization and incentivization, addressing critical challenges in contemporary FL approaches.


I. INTRODUCTION
T HE ascent of Machine Learning (ML) has been marked by a growing emphasis on decentralized and privacypreserving solutions.One of the leading solutions, Federated Learning (FL), allows training of Deep Neural Networks (NNs) across distributed devices, ensuring data remains localized, hence addressing privacy concerns.Federated Averaging (Fe-dAvg) [2], a cornerstone algorithm in FL, achieves this by aggregating locally trained models to produce a global model.However, FL's transformative potential is curtailed by (i) A trust deficit emanating from the imbalance of power between workers and the central server and (ii) an absent practical reward mechanism to incentivize worker contributions.Notably, while blockchain's inherent transparency and immutability characteristics hold promise in addressing the trust issue, its effective integration with FL for scalable deployments has remained elusive [3], [4], [5].Furthermore, designing mechanisms that effectively reward worker contributions without compromising data privacy remains open research in FL [1], [6].Comparing and evaluating worker contributions in FedAvg is non-trivial since data always stays private [1], [7].Traditional solutions like the Leave-one-out [8] or Shapley value [9], [10] introduce computational overhead and hinge on a centralized authority, constraining their adoption in decentralized, blockchain-based solutions.Lastly, the prohibitive cost of storing vast amounts of data on blockchain systems, compounded by the intensive computational demands, means popular methods like FedAvg struggle to fit within General Purpose Blockchain Systems (GPBS).This has prompted researchers towards Application Specific Blockchain Systems (ASBS) or off-chain aggregation [11], [12], [13], [14], [15] -both bringing their set of challenges.

A. Contributions
In response to these challenges, this work introduces the Peer Truth Serum for Federated Distillation (PTSFD), a blockchainenabled and incentivized FL framework.It utilizes Federated Knowledge Distillation (FD) on 1-bit compressed soft-labels, combined with the Peer Truth Serum for Crowdsourcing [16], which is adjusted for the FD case. 1) Incentivization: We present the Peer Truth Serum for Federated Distillation (PTSFD), an informed-truthful multitask peer prediction mechanism tailored for the FD case.It discerns contributions based on the correlation of reported 1-bit compressed soft-labels.This approach drastically reduces storage requirements, making it particularly suited for blockchain.2) Decentralization: The reduced storage requirement and the simplicity of our method promote decentralization.The framework can be deployed on simple smart contracts hosted on prevalent blockchains, such as the Ethereum Virtual Machine [3], eliminating the need for specialized ASBS and simplifying the entire FL process.3) Scalability and Efficiency: Our method, which builds upon Federated Knowledge Distillation, significantly reduces both communication overheads and blockchain storage needs.This sets the stage for large-scale, practical FL deployments without sacrificing efficiency or scalability.We substantiate our contributions through theoretical validations and exhaustive experimental analyses.Our findings reveal a system that maintains a strong incentive-compatible equilibrium, demonstrating resilience against adversarial actions.Moreover, it showcases efficiency gains in storage and communication costs compared to FedAvg in various FL scenarios.The core of this work lies in its pioneering architecture, laying the groundwork for a lightweight, fully decentralized, incentivized, and efficient Federated Learning paradigm.

A. Federated Averaging
The most common algorithmic approach to FL problems is FedAvg, where the training process consists of an iteration of the following steps: 1) The central server selects a subset of clients W, which participate in this training round.
2) The central server sends the current model θ to the selected clients.
3) The selected clients perform local training on their private data, leading to updated client models θ i .4) The updated models θ i ∀i ∈ W are send back to the central server.5) The central server aggregates the updated models to a new global model.This training paradigm requires a two-way communication of the model θ (resp.θ i ) at every iteration, which can result in significant communication overhead for state-of-the-art NN models with hundreds of millions of parameters.To address this challenge, various approaches have been proposed, including pruning methods [17] and advanced compression techniques [18], [19], [20], [21], [22], [23], [24], [25].However, despite these advances, the fundamental issue of scaling FedAvg to larger models remains, impeding the utilization of blockchain for storing or aggregating models [1].

B. Knowledge Distillation and Federated Settings
1) Knowledge Distillation: Knowledge Distillation (KD), depicted in Fig. 1, is a technique in deep learning where a smaller NN model (often called the "student") is trained to mimic the behavior of a larger, pre-trained model (referred to as the "teacher") [27].This is accomplished not by transferring the model parameters directly, but rather by aligning the output distributions of both models.Traditional training methods involve training a model directly on ground-truth labels, using a cross-entropy loss that measures the discrepancy between the model's predictions and these true labels.In contrast, KD employs a divergence-based loss, such as the Kullback-Leibler (KL) divergence, to measure the difference between the student's predicted probabilities and those of the teacher model.This divergence provides insights into how closely the student is able to mimic the behavior of its teacher.A distinct feature of KD is the use of "softened" labels.In traditional classification tasks, hard labels are used, which unequivocally classify a data point into one category.However, the teacher model in KD provides "soft" labels in the form of probabilities, indicating the confidence levels across various categories.These probabilities can be further softened using a temperature parameter T to yield a smoother distribution, capturing the nuances of decision boundaries and offering richer guidance to the student model.This process allows the student to inherit not just the overt knowledge from the ground-truth labels but also the implicit, or "dark", knowledge embedded in the teacher model's predictions.Since only soft-labels are necessary to perform backpropagation, models with varying architectures can learn from the teacher.The appeal of KD lies in its ability to produce compact models with performance that closely mirrors that of much larger networks.These compact models are advantageous for deployment in resource-constrained environments, such as mobile devices or edge devices, without sacrificing much in terms of accuracy.With the above foundation in KD, we can delve deeper into its application in the FL setup, specifically focusing on Federated Distillation.
2) Federated Distillation: Drawing from the KD paradigm, Federated Distillation (FD) [28], [29], [30] extends the softlabel philosophy to a federated landscape, where the aggregated soft-logits from workers act akin to the soft-label output from an overarching teacher model.While FedAvg directly communicates the model parameters to transfer information between the central server and the clients, FD uses soft-label predictions Y pub i obtained on a separate public distillation dataset X pub for this purpose.More precisely, the locally updated model f θ i +Δθ i is indirectly communicated to the central server by sending its predictions on the distillation dataset, i.e., (1) Therefore, unlike traditional FL techniques such as FedAvg that mandate a consistent model structure across clients due to the aggregation of model parameter updates, FD does not require a single NN architecture but allows each worker to adopt a distinct architecture that might be best suited to its local data-or computational restrictions.Additionally, instead of with NN parameters, it scales with the size of the distillation dataset.This characteristic of FD can lead to communication savings [26], especially for large models.In this work, we modify a recently proposed, highly communication-efficient FD method [26], called Compressed Federated Distillation (CFD), which is based on the multi-round protocol developed in [28], [30].In our modified version of CFD, every client performs the following steps in each communication round, as depicted in Fig. 2: 1) Train on local datasets and improve model 3) Upload the integer-encoded compressed soft-labels to the smart contract (in a two-step commit-reveal fashion outlined in Algorithm 3).4) (Blockchain) Aggregate predictions Y pub aggr by majority vote over all Y pub i .5) Download the aggregated predictions Y pub aggr from the blockchain.
6) Distill the current model θ using X pub and Y pub aggr .The authors of [26] showed that CFD largely reduces the information necessary for exchange by quantization Q and the use of a small public distillation dataset (e.g., random subset selection).The savings are in the order of two orders of magnitude when compared to Federated Distillation, and more than four orders of magnitude when compared to

C. Blockchain Technology in FL Context
Blockchain was initially introduced with Bitcoin by Satoshi Nakamoto in 2008 [31].It is referred to as a distributed ledger managed by nodes in a peer-to-peer network, where cryptographic links of information ensure resistance to modification and immutability.The network is governed by a consensus mechanism [32] among peers, which supersedes the need for central coordination.The advent of general-purpose blockchains [3], with smart contract functionality supporting Turing-completeness, allows for a decentralized, immutable, and transparent business logic atop of blockchain.This technology is able to mitigate open problems of FL environments due to its inherent properties, namely: Decentralization.In server-worker architectures, workers are exposed to a power imbalance and a single point of failure.A malicious server could (i) exclude workers arbitrarily or (ii) withhold reward payments.Furthermore, a server-worker design is not suitable for an environment where multiple entities share a common and equal interest in advancing their respective models.The decentral property of blockchain systems ensures a federal Transparency and Immutability.Since every peer in the system shares the same data, data on blockchain can only be updated and never deleted.A transparent and immutable reward logic in an FL context ensures trust on the worker side.On the other hand, each worker is audited and can therefore be held accountable for malicious behavior.
Cryptocurrency.Many general-purpose blockchain systems come with cryptocurrency functionality, e.g., the option to implement payment schemes within the business logic of the smart contract.Based on a reward mechanism of the FL system, workers can be rewarded immediately, automatically, and deterministically without the need for a trusted third party.
To analyze Blockchain systems, we categorize them into two main types: 1) Application Specific Blockchain Systems (ASBS): Blockchains which have to be adapted to a specific FL use-case require a novel infrastructure.This causes overhead in terms of complexity at the development, deployment, and hence more likely to introduce vulnerabilities.2) General Purpose Blockchain Systems (GPBS): These are limited due to restricted virtual machines and predefined consensus layers, but allow for easy development, deployment, and operation utilizing already existing frameworks [4], [5], [33].These types can either be public or permissioned/private. Public blockchains, like Ethereum, are open networks where anyone can participate, hence making it expansive to use as every transaction has to be duplicated by every node in the network.Permissioned blockchains, such as Hyperledger Fabric [5], restrict participation to authorized entities, offering a controlled, efficient, and private environment that may be preferable for FL scenarios with known and trusted participants.

D. Related Work
We focus on related Federated Learning Frameworks (FLF) that (i) are both decentral and reward participation as well as (ii) use blockchain at its core to decentralize FL.That is, parameters are aggregated or stored in a decentralized way. 1e extended the systematic analysis established by [1] to compare FL, the application of Blockchain and the Contribution measurement in Table I.Note that the inherent complexity of FLF leads to heterogeneity in terms of application, overall design, special focus and details.[34] designs an FL system for home appliances using blockchain and a new normalization technique for differential privacy.Similarly, [40] introduces a regional FL framework for vehicles, integrating a reputation mechanism and a blockchain-secured trading platform.Focusing on robust mechanism deisigns, [15], [41] propose an FL protocol on blockchain using contest theory for worker engagement.[42] employs a Stackelberg game-based FL system considering contributions, deadlines, and upload times.[43] introduces DeepChain, a blockchain-secured FL framework with a special focus on privacy.[44] presents a two-layered blockchain for mobile edge networks.[45] minimizes communication costs in IoT FL through a double-layer aggregation model.[46] offers a specialized Democratic Learning (DemL) solution for ondevice learning, including a unique consensus mechanism.[47] introduces Proof of FL (PoFL), an energy-efficient blockchain mechanism.[48] provides a secure FL framework for UAVassisted sensing, incorporating differential privacy and reinforcement learning-based incentives.[49] designs a Mobile Crowdsensing framework that uses blockchain and edge intelligence for resource-constrained environments.[50] evaluates participant contributions transparently in its FL framework.[39] combines FL and blockchain for secure data sharing in neural training, using Shapley values for fair rewards.Lastly, [51] integrates blockchain and model distillation to accommodate model heterogeneity and enhance communication efficiency, yet it falls short in detailing blockchain operations and providing a theoretical analysis.
1) Incentivization of FLF: Measuring contributions in FL to fairly reward clients remains an open research challenge [1], [6].Various metrics and methods are currently used for this purpose, each with its own set of challenges and limitations.[42], [46], [48], rely on self-reported information such as data size to determine rewards.However, this approach is susceptible to malicious behavior as false reporting leads to a maximal return.Alternatives include using similarity measures like the Euclidean distance of model updates [34], or employing voting systems for contribution assessment [41], [45].Despite their utility, these methods lack rigorous theoretical and experimental validation and are vulnerable to attacks.Explicit methods like Shapley value [39], [50] or simple test-set accuracy [40], [47] have been utilized for explicit reward measurement.However, when applied in a decentralized context, these explicit methods (i) either require complex adjustments to the blockchain consensus mechanism, (ii) cause infeasible overhead (especially Shapley value), or require a central authority that measures the contribution against the test set, introducing a single point of failure.While [44] acknowledges multiple factors like data quality and task satisfaction as affecting rewards, it remains vague about its contribution measurement methodology.Similarly, [49], [52], and [46] lack specificity in this regard.
2) Decentralization of FLF: The trade-off in using blockchain lies between scalability and decentralization.Although it is theoretically favorable to decentralize all FL operations -namely Aggregation (A), Coordination (C), Payment (P), and Storage (S) -on-chain, doing so may introduce prohibitive computational and storage costs.This is because all blockchain nodes must replicate both computation and storage at all times.Specifically, the need to store and manipulate data-heavy objects, such as millions of NN parameters, on-chain restricts the framework to a limited number of participants.In summary, Table I compares decentralized and incentivized FLF to the approach presented in this work.Our approach is unique in allowing for NN flexibility (see Section II-B) while simultaneously maintaining full decentralization and scalability.

A. Problem Statement
We assume a federation F of workers W who have a common interest in advancing their private Neural Networks based on (i) additional data from other participants and (ii) the unlabeled public dataset X pub through Federated Distillation (FD).We consider an environment where all participants of F have equal power.For example, no central entity such as a central server should have the power to either censor or manipulate the reward distribution.Each worker participating in the training is responsible for submitting predictions on the public dataset X pub based on their locally trained model and label distribution labelCount i of the predictions.To enable decentralization, a smart contract atop a blockchain will replace the central server.This contract will (i) aggregate the workers' predictions and (ii) calculate the rewards considering other contributions.To ensure accountability and to prevent free-riding, each worker must stake a deposit D i .D = i∈F D i will be used to pay τi for each contribution at the end of the training process.Note that τi ≥ D i if worker i's contributions are above average to F and τi ≤ D i otherwise.Malicious behaviors, such as (i) withholding after committing and (ii) committing an incorrect label distribution labelCount i , will result in the slashing of the deposit and exclusion from F. The worker selection process is beyond the scope of this work.Reputation systems [36], [53] or required registrations might be feasible solutions.Our proposed framework is designed to be lightweight and blockchain agnostic.By employing 1-bit compressed logits on a public test set, instead of aggregating millions of parameters of modern NN (FedAvg), and incorporating a computationally simple, correlation-based reward mechanism, our framework uniquely enables (i) on-chain aggregation and (ii) on-chain reward calculation, while maintaining compatibility with both ASBS and GPBS.While theoretically possible, many promising public blockchain projects are still in their technological infancy, either lacking smart contract functionality or facing scalability restrictions.These constraints currently make deploying our system on public blockchains economically infeasible, due to high transaction fees and limited transactions per second, resulting in scalability issues.Consequently, our framework is specifically designed for the cross-silo case on permissioned blockchains.We assume the following properties: 1) Honest Majority Assumption: We assume that the majority of the nodes in the blockchain network are honest and follow the protocol.This is critical for the blockchain's consensus mechanism to function correctly.2) Sybil Attack Resistance: We assume that our blockchain network is resistant to Sybil attacks, where an adversary controls multiple nodes.This is especially important for the GPBS deployment, where entry to the network is more open.3) Confidentiality and Integrity: We assume that the blockchain ensures the confidentiality and integrity of the data and code.

B. Reward Mechanism Motivation
As no entity is in possession of the true labels of X pub in the decentralized Federated Learning setting, workers' evaluations cannot be verified.This might encourage workers to report random data without actually classifying X pub .This can be mitigated by rewarding peer consistency, e.g., the reward depends on its consistency with the label given by other workers.However, the best strategy in such schemes is for all workers to report the same answer without investing effort in finding the real label.The solution to these issues is to set up a mechanism, where the expected profit for each individual worker is maximized, if they put high effort into solving the task while acting truthful.In contrast to a server-worker relationship, our framework assumes multiple stakeholders with common interest in improving their Algorithm 1: The Peer Truth Serum for Crowdsourcing [16].respective model.The initially staked deposit D which will be used to pay τ manifests this mutual interest.Yet, contributions may be of different quality to the overall federation.Low quality workers may even have a negative effect on the overall federation even if their intention is truthful.At the same time, some classes in X pub may be less common and therefore are more important to classify correctly.Hence, a mechanism is required to: 1) incentivize only workers with the best abilities for the task 2) incentivize these workers to invest their utmost effort in obtaining the most accurate answer 3) incentivize workers who are able to classify uncommon samples in X pub with higher rewards

C. Peer Truth Serum for Federated Distillation
The Peer Truth Serum for Crowdsourcing (PTSC) is a promising Multi-task Peer Prediction mechanism.Through a scoring rule τ , it rewards workers for surprisingly common reports, encouraging honest and high-effort behavior without the need for ground-truth knowledge [16].PTSC merges the reward mechanism of [54] with the Peer Truth Serum concept [55], [56], ensuring incentive compatibility across a non-binary solution space suitable for heterogeneous workers.Introducing the Peer Truth Serum for Federated Distillation (PTSFD), we adopt the PTSC framework, as described in Algorithm 1, for the Federated Distillation setting.In this scenario, a group of workers perform statistically independent tasks, where a task refers to classifying a sample j, with j ∈ X pub .The discrete density function is represented as ( Here, R i (x) excludes the contribution from worker i and denotes the fraction of reported labels, given by Furthermore, PTSFD incorporates an adjustable penalty term β.This modification acknowledges that the primary motivation might be the utility of an improved model, making the payment secondary (as seen in ( 6)).Consequently, the reward for each sample is where λ adjusts the payment magnitude and β modulates the reward-accuracy ratio.The cumulative reward for worker i is computed over all tasks as

D. Game-Theoretic Analysis
The setting can be considered a two-stage game.In stage 1, workers choose the amount of effort e they want to invest in classifying X pub .To simplify the analysis, we assume two levels of effort, high e 1 and low e 0 .Here, e 1 represents the best work possible exerted by the worker, and e 0 represents no effort (i.e., no local NN training).Unlike in FedAvg-based systems, the proposed framework applies FD, hence it does not require a uniform NN among the clients but allows for flexible architectures appropriate for the hardware constraints of the respective workers (see Section II-B).In stage 2, workers decide on what to report.The baseline model assumes each worker solves every task.Yet, without loss of generality, workers could be randomly allocated to solve tasks such that each sample of X pub is classified by at least two different workers.
Workers.We assume workers to be individually rational, aiming to maximize their expected profit ) U i represents the expected utility function of worker i, which can vary among workers.The expected rewards of contributing to the federation F are twofold: (i) the expected utility of the improved model θ improved i and (ii) the utility of the expected monetary reward from S for contributing to classify X pub .We assume that the training process incurs variable costs c i (e i ), where c i is an increasing function of effort e i .Specifically, c i (e 1 ) > c i (e 0 ), where e 0 denotes no effort and e 1 denotes high effort of worker i. Effort represents the quality and quantity of private data, model quality, number of training iterations, etc.A detailed Pareto-optimal cost analysis [57], [58] under real-world assumptions will be explored in future work.Additionally, to offset free-riding of inactive but registered workers of S who benefit from an improved model U i (θ improved Incentive Compatibility.In order to evaluate PTSFD in game theoretic terms, we analyze each workers expected profit Π i = Rewards i − Costs i .We assume Individual Rationality (IR), e.g., workers try to maximize their expected profit and do not participate if Π ≤ 0. For the sake of simplicity, we further assume that the gain in model improvement based on the a-priori known distribution of labels in X pub We define the mechanism to be incentive compatible, if the honest strategy is the dominant strategy for every worker.We use an equilibrium analysis to determine the resulting behavior of each worker.In particularly, σ = (σ 1 , σ 2 , . . ., σ n ) represents a strategy profile of each worker.This profile is an equilibrium σ if for any worker i ∈ W, the workers expected profit is maximized with the honest strategy profile σ.Suppose that worker i believes that the peer workers are honest and their answer on a given sample j is positively correlated with the worker i's answer x, when obtained with high effort e 1 .Specifically, worker i believes that answer x is not less likely for sample j than in the distribution over all tasks.
Honest Strategy.For every sample j in X pub , the worker calculates the probability scores over all possible classes in C (output of the softmax layer of a NN).Let us further assume worker i is in possession of a trained model θ i , with an overall accuracy Accuracy θ i .We define the relative certainty A ij of any prediction of client i on an element j of X pub as the product of the local classifier accuracy and the sample-specific maxprobabilityscore.
Under the assumption that the local client data X priv i is representative of the entire data distribution D, this metric will give a heuristic measure for the data specific certainty in the model prediction.Based on this metric, each worker will make the decision whether to report predicted labels, discarding those for which reward is expected to be negative.This leads to the expected profit (8) Assuming individual rationality, E(Π i,j ) ≥ 0 in order to incentive worker i to submit a vote on sample j.Following 8, we can derive minimum prediction quality required to incentivize worker i to participate, e.g., Π i ≥ 0.
Notice that the federation can set the overall quality threshold by adjusting hyperparameter λ and β appropriately, assuming similar variable costs c(e) on the workers side.Heuristic Strategy.The heuristic strategy assumes that worker i does not exert any effort to obtain Y eval j = x ∈ C. The expected reward is based on the probability of matching a peer's answer.Given that answer x is independent of the task, the probability of coincidentally matching a peer is equivalent to the frequency of answer x ∈ C.
It's important to note that the expected profit for β = 1 is 0, and it is strictly negative for β > = Y eval j consistently results in a negative expected profit for all j ∈ X pub .
This is true provided the self-predicting condition [16] is met, i.e., Considering a scenario where workers collude (they report x for both Y eval j = x and Y eval j = y), R will adjust such that This change in R entirely offsets the increase in the probability of a match.Therefore, only an honest strategy paired with a highquality model will yield a positive expected reward for a given worker.This results in an equilibrium σhonest for the PTSFD mechanism, thereby demonstrating its incentive compatibility.

IV. 1-BIT COMPRESSED FEDERATED DISTILLATION FRAMEWORK WITH SMART CONTRACT LOGIC
The protocol consists of the following steps:  (v) Reveal Predictions, (vi) Aggregation & Reward Distribution, and (vii) Knowledge Distillation from X pub as depicted in Fig. 3.

A. Task Specification & Smart Contract Deployment
To form Federation F, participants with similar interests must agree on the requirements and specifics of an FD task, specifically: 1) Task description and data distribution (e.g., images of a certain type).2) Reference to a public data set X pub and potential classes C for the Federated Distillation pipeline.This will subsequently be utilized by workers to predict the labels on each sample of the dataset.3) Reference to the address of S. 4) Deposit amount D i that each worker must stake.5) PTSFD and reward mechanism details (λ and β values).After forming a federation F, either an external third party or one of the workers from F deploys the governing smart contract S, stakes the necessary deposit D i , and lists the addresses of all eligible workers in F, as well as the aggregation and PTSFD logic of the FD task.

B. Worker Registration & Deposit Submission
Based on the task specifications, interested workers register on the smart contract S using their respective blockchain address (public key) and submit the required deposit D i .S verifies if the applying worker belongs to the federation.Given that |F| >> |W|, PTSFD motivates valuable workers for F in terms of data and computational capacity to engage while dissuading lowquality workers, as demonstrated in Section IV-E.To preclude free-riding, workers in F who aren't registered shouldn't access S.This restriction can be implemented by deploying S on a suitable blockchain system or by shuffling X pub , ensuring only registered clients can access the correct indices.

C. Local Model Training and Prediction
The entire training procedure encompasses two phases: the local model training phase on local data X priv i , Y priv i and the KD phase from X pub , Y pub aggr , which occurs as the protocol's final step, as detailed in Section II-B.
Training on Local Data.Each worker either has a pre-trained model or begins training a NN on their specific private data until convergence (optionally, until a predetermined minimum accuracy agreed upon within F is achieved).Notably, unlike FedAvg, FD doesn't mandate a common shared NN architecture across all workers, thus enabling them to select an optimal architecture tailored to their computational resources.
Label Prediction.Upon completing the training, workers compute the soft labels Algorithm 2: Local Label Count for Worker i.
Label Count.The PTSFD mechanism necessitates data on the label distribution R(x) over X pub for reward calculations.Hence, each worker i must compute the label count labelCount i ∈ N |C| for each label present in X pub , to minimize computational overhead on the blockchain (as described in Algorithm 2).The supplemental validation function to ensure the accurate computation of labelCount i depends on the specific blockchain system and is outside this work's purview.

D. Commit and Reveal
Information on the blockchain is transparent to every node.Even in a private blockchain setup, workers in W could wait for peers to publish Y 1 bit p and replicate their results without expending any effort.To prevent this kind of copying and to ensure that workers apply effort to classify X pub , a two-step commit and reveal scheme is employed.
Commit.Prior to publishing the results to S, where all peer workers could view the submission, a cryptographic hash and labelCount i .The property of pre-image resistance of a cryptographic hash function (e.g., it should be challenging to find any message m such that commit i = H(m)) and the property of collision resistance (e.g., it should be challenging to find two distinct messages m1 and m2 with H(m 1 ) = H(m 2 )) ensure that no worker can either retrieve Y 1 bit i or alter their previously committed Y 1 bit i .Each worker i transmits hashCommit i to S upon completing their training.It's noteworthy that the commit phase on S concludes once |W | ⊆ |W| workers have registered with S or when the maximum time T max commit is reached.Reveal.During the reveal phase on S, each worker that successfully committed in the commit phase must disclose Y 1 bit i , labelCount i , and salt i within the timeframe T max reveal via a transaction function call to S. To counteract withholding attacks, a worker's deposit D i is forfeited if worker i fails to reveal within the allotted time T max reveal .The smart contract then verifies the commitment's validity to confirm that H(Y i , labelCount i , salt i ) == hashCommit i .
Algorithm 3 provides the pseudocode for this scheme in Solidity on the Ethereum blockchain.

E. Aggregation & Reward Distribution
We apply PTSFD to calculate the reward distribution for each worker.In order to calculate the rewards, S aggregates labelCount i across all workers i ∈ W first to obtain the global label count The worker is rewarded for its prediction on sample j with respect to it's peers regarding (4).The final rewardScore for worker i is a sum of all individual rewards over X pub , given by τi = rewardScore(i) where parameter λ describes a scaling parameter for the reward and n peers j describes the number of peer workers who also submitted a label prediction on j.The aggregated predictions Y pub aggr are calculated by majority vote of Y 1 bit i ∀i ∈ W .We merge the reward computation and aggregation into a single algorithm outlined in Supplementary Materials B. Note that implementation details may differ fundamentally depending on the underlying blockchain architecture.

F. Knowledge Distillation on Public Dataset
Finally, workers download the aggregated predictions Y pub aggr from the blockchain and perform several epochs of KD using X pub and Y pub aggr to improve their respective models (θ improved i → θ i + Δθ i ).Optionally, the training process of each client, as shown in Section II-B2, can be repeated until a specific threshold is achieved as specified in the Smart Contract S. Note that λ should decrease with every consecutive round, as most evaluated labels will not change.

G. Complexity Analysis
Because we are executing this protocol on the blockchain, it's essential to understand the computational and storage costs involved.In this section, we discuss the overheads related to computation and storage that our proposed algorithm introduces.While the actual implementation on a general-purpose blockchain system might differ based on the underlying virtual machine, our PTSFD implementation, as illustrated in Supplementary Materials B, provides a useful reference to estimate the complexity.
1) Computational Complexity: The algorithm presented in Supplementary Materials B first calculates the global label distribution and counts class votes across all workers (lines 7-12).This computational overhead is O(m • n), where m = |X pub | and n = |W |.Subsequently, we examine each data sample in X pub , rewarding or penalizing a worker based on its peers.We also determine the aggregated class label for each sample during this algorithm phase (lines [13][14][15][16][17][18][19][20][21][22][23][24][25][26][27][28][29].The process of calculating the reward for each worker based on its peers results in a computational overhead of O(m • n i=1 n peers ).The global label calculation adds a cost of O(m • |C|).In the baseline scenario, where each worker processes all data samples from the public dataset and is considered a peer of all other workers, the overall computational cost is given by (16).
For more efficient solutions, we distribute public dataset samples among workers such that each sample is classified by a maximum of two workers.Implementing PTSFD in this manner would reduce the overhead as detailed in (17).
2) Storage Complexity: Two types of storage costs are associated with the proposed algorithm: permanent storage and temporary memory variables.V otes, M , S, R i , and τ 0 need memory storage during the computation, resulting in O(|C| • (m + 2) + n) additional memory storage.Whether the reported frequencies labelCount i or each worker's final reward share rewardScore need permanent blockchain storage depends on the requirements of the specific blockchain system.Ideally, only globalLabels = Y pub aggr is stored permanently on the blockchain.The minimum amount of data required for each round is represented by (18), where η accounts for the overhead due to encoding necessities.

H. PTSFD in Comparison to Shapley Value
Quantifying individual contributions in FL is essential for the equitable distribution of rewards and the growth of FL systems.The Shapley value is a concept in cooperative game theory that distributes total gains among players by measuring the marginal contributions each player makes to different possible coalitions.For a worker i in a set W, with utility evaluation function V : 2 W → R, the Shapley value φ i (V ) is given by where Π is the set of all permutations of W, and S Π i ⊂ W is the set of workers preceding i in permutation Π.The Shapley value adheres to several axioms for fairness: 1) Efficiency: Total utility is fully distributed among the workers: i∈W φ i (V ) = V (W).2) Symmetry: Workers contributing identically to every subset receive identical rewards.3) Null Player: Workers who do not enhance utility for any coalition yield no reward.4) Additivity: The Shapley value is linear over the utility functions of combined games.Despite these favorable properties, the computation of the Shapley value scales with O(2 n ).In FL, V (S) = V (θ S ), where θ S is the FL model trained on the subset of datasets {X, Y } S = {X i , Y i }, i ∈ S from scratch for every permutation: where A(•) is a learning algorithm and θ init denotes the initial model.The computational demands involved in calculating the Shapley Value render it challenging to efficiently execute even a single step on a general-purpose blockchain.Recent studies have attempted to approximate the Shapley value to mitigate this computational overhead [59], [60], [61].In contrast, PTSFD offers a lightweight method to elicit truthful contributions implicitly, bypassing the exhaustive computation of the marginal utilities.PTSFD, as an alternative, is designed to be compatible with blockchain technology, enabling scalable and decentralized FL without excessive computation, while still aspiring to maintain fairness in incentives.

I. Limitations
Despite the benefits of our decentralized FD protocol, our framework encounters the following limitations.
Public Dataset.Although FD offers numerous advantages, such as reduced information exchange and the flexibility of independent NN architectures, the FD training process necessitates access to a public dataset X pub .This may not be available for certain use cases.While [62] demonstrated that highly disparate data distributions might suffice for FD, relying on A ij as a heuristic for the evaluation certainty of a sample limits the variance of distributions between X pub and X priv .The scoring method proposed by [63] appears promising in addressing this.
Public Blockchains.Despite significantly reducing computational and storage demands, our framework remains unsuitable for current public blockchain systems due to (i) the high costs associated with storing Y pub aggr and the computational overhead of PTSFD, and (ii) the transparency of Y pub aggr to nodes that are not members of W ∈ F and hence did not make a deposit.Both issues may be addressed by upcoming advancements in the public blockchain sector.
Self Predicting Condition.PTSFD is incentive-compatible and yields an optimal outcome when workers are honest.However, the mechanism's incentive compatibility is contingent upon the satisfaction of (12).If classes are evenly distributed across X pub , the conditions will always be met.Example: Let P r(x = a) = 0.8 and P r(x = b) = 0.2, but R(a) = 0.9 and R(b) = 0.1.Although the worker's Y eval i = a, their expected reward would be greater if Y report i = b, as 0.8 0.9 − β < 0.2 0.1 − β.

V. EXPERIMENTS
In this section, we empirically evaluate the PTSFD framework and analyze the reward distribution under different levels of effort as well as its robustness in the event of malicious behavior.We do not consider explicit variable costs c i (e).Furthermore, we set the reward scaling parameter λ = 1 for all experiments.We do not account for lagging workers, so W = W for all experiments.All experiments are based on multiple rounds of the proposed protocol.The code for our experiments is made publicly available. 2Specifically, we experimentally validate the following properties of PTSFD: 1) Performance: Choosing to participate in the federation should lead to a significant improvement in model accuracy for each worker.2) Fairness: The greater the effort a worker exerts in terms of training accuracy and amount of training data, the higher the reward they should receive.3) Robustness: Malicious workers should receive substantially less reward, even under high collusion rates.

A. Data Sets and Models
We analyze the decentralized 1-bit compressed FD with the PTSFD protocol on three different federated image classification problems, using EMNIST [64] / MNIST [65], CIFAR-10 [66] / STL-10 [67] and Fashion MNIST [68] datasets on ResNet-18 [69] and LeNet [70]  The different alphas simulate various data distributions such as iid and non-iid.We first train models locally on X priv and then perform KD using the public dataset X pub .Even though in real-world PTSFD applications, workers might train different model architectures and vary the number of local training epochs based on their hardware constraints, we employ a single default NN architecture for simplicity.We simulate heterogeneity through varying local training accuracy (early stopping), non-iid data, and different sizes of X pub .It's important to note that the  distribution of the distillation data deviates from the worker's data, mirroring realistic FL scenarios (e.g., MNIST contains handwritten digits, while EMNIST features a different set of handwritten numbers; similarly, CIFAR-10 includes a distinct set of images compared to the STL-10 dataset).We use the Adam optimizer [71] with a fixed learning rate of 0.001 for both the distillation and training processes.We minimize cross-entropy loss for local model training on X priv , Y priv and minimize Kullback-Leibler Divergence on X pub , Y pub aggr .

B. Storage and Communication Cost
Given that storage and computation costs on the blockchain are critical scalability constraints for decentralized FLFs [1], we assess our framework's storage costs in comparison to FedAvg for specific target accuracies.Table II shows the communication cost (upstream/downstream) as well as the storage cost required to achieve a specific accuracy target for ResNet-18 and LeNet on the CIFAR-10, MNIST, and Fashion-MNIST datasets, respectively.All experiments were run under different data distributions by varying the Dirichlet parameter α while simulating 4, 10, and 25 workers respectively.As observed from the summarized results, our framework achieves accuracy similar to that of a typical Federated Learning system using FedAvg as the aggregation mechanism but at a fraction of the communication and storage costs.For instance, when training ResNet on CIFAR-10 with α = 100, we can achieve the target accuracy with a minimal communication cost of 0.84 MB and storage cost of 0.92 MB, compared to the substantial costs incurred by FedAvg (5.33 GB and 5.37 GB for communication and storage costs, respectively, an improvement of roughly 5,000x).The remaining experiments, showcasing model quality improvement, reward fairness, and simulation of collusion or heuristic behavior, were conducted using LeNet on EMNIST/MNIST as training and distillation datasets.We believe that these experiments sufficiently demonstrate the desired results and would yield comparable outcomes for other datasets or models.A marked improvement in model quality is evident for every worker following their KD execution.While the size of the local training dataset plays a more substantial role than the distillation dataset, the significance of the latter should not be underestimated, particularly in the context of non-iid distributions which can be attributed to the inclusion of additional data from the public dataset.

D. Fair Effort-Reward Correlation
Fig. 5 illustrates the correlation between effort, in terms of both heterogeneous training efforts (top) and varying data quantities (bottom), and the subsequent reward distribution.PTSFD facilitates realistic FL scenarios where workers, due to hardware constraints, can employ various local model architectures and train them for different numbers of epochs.For instance, we emulated heterogeneity by training 10 workers using diverse early stopping criteria, where higher local training accuracy, indicative of greater effort, corresponded to better rewards.Concurrently, variations in private data quantity, while assuming consistent data quality, also influence contribution quality.Overall, superior training accuracy and more extensive local dataset result in greater rewards.

E. Robustness of PTSFD
In order to ensure the desired quality of label predictions, the federation can adjust the parameter λ to scale the reward based Fig. 7. Average reward with varying ratio of colluding and heuristic workers under different penalty β.Federated Learning setting with 10 workers running LeNet on EMNIST digits for 10 epochs.For all experiments, 40000 data points from the MNIST data set were used as public dataset X pub .on its underlying collateral (with λ = 1 consistently used in our experiments).Additionally, β can be set to modify the penalty for incorrect answers, thus tuning the confidence threshold required for rationally individual workers (as described in ( 9)) to submit a prediction.The initially staked deposit acts as a safeguard against malicious behavior, as such actions can lead to losses.
In our experiment, workers can opt to withhold their reports if they lack confidence in their predictions.Fig. 6 demonstrates the reward variations with different penalty factors β across diverse confidence levels.For this experiment, the local training data was divided based on a Dirichlet distribution with parameter α = 0.1, mimicking scenarios where workers might not have access to uniform data.As a result, certain workers' local models could be ill-equipped to predict classes previously unavailable to them.These workers will only submit their predictions if their confidence in the most probable label surpasses a certain benchmark.Our findings indicate that, by adjusting β, PTSFD can effectively deter subpar contributions from contaminating the federated training process.

F. Robustness in Case of Malicious Behavior
Building on the game-theoretic analysis detailed in Section II-I-D, we experimentally confirm our theoretical claims in a real FL setting.We have demonstrated that both heuristic behaviors (like bypassing local training to randomly report labels on a public dataset) and strategic tactics like collusion yield an expected reward of 1 − β.Fig. 7 (top) depicts the reward differences between colluding and honest workers.Colluding predictions are structured as Conversely, Fig. 7 (bottom) contrasts the rewards of heuristic workers, who predict randomly on the public dataset, with those of diligent participants.Overall, the data indicate that genuine engagement results in the most significant rewards, even amidst prevalent malicious actions.In-depth cost considerations are reserved for future studies.

VI. CONCLUSION
In this work, we introduced a novel decentralized and rewardbased 1-bit compressed Federated Distillation scheme on the blockchain, incorporating the Peer Truth Serum [16] specifically for Federated Distillation.The 1-bit compression ensures explicit comparability between contributions, a critical feature for automatically computing rewards on a smart contract atop a general-purpose blockchain system, where each worker is regarded as an equitable member of the federation.We have demonstrated that, in terms of storage on the blockchain and communication overhead, our framework is significantly more efficient than Federated Averaging.Additionally, the system not only offers flexibility in neural network architecture but also allows adaptation to various thresholds of contribution quality by adjusting the penalty term β.Furthermore, both theoretical insights and experimental evidence suggest our proposed mechanism is resilient to random reporting and collusion.We are confident that our findings will further the scalability of Federated Learning tasks in fully decentralized environments, where all entities have an equal interest in enhancing their models.
FedAvg.The possibility to apply binary soft-label quantization, i.e., Q b with b = 1, ensure three important properties for a decentralized CFD on Blockchain, namely r It reduces the amount of information processed in the aggregation process heavily.

r
It makes contributions by workers explicit and comparable.r It supersedes the need for additional encryption like noise inducing Differential Privacy or computational heavy secure multiparty computation.
i ) without contributing, fixed participation costs c fix S are necessary.The initially staked deposit is used to pay contributing workers.Thus, c fix S = D bef ore i − D af ter i describes the implicit costs for accessing Y pub aggr .

Algorithm 3 :
Commit and Reveal Protocol.

Fig. 4 .
Fig. 4. Influence of X pub and X priv on model accuracy with corresponding dirichlet α setting.These experiments were run using LeNet on MNIST/EMNIST respectively as training/distillation datasets.
and respectively as training/distillation data.Our Federation comprises 4, 10 and 25 workers for different experiments.The training data is distributed among workers according to a Dirichlet distribution with the Dirichlet parameter α.Fig. 4 top row illustrates the data distribution of 10 labels across 10 different workers for α = 100, α = 1, and α = 0.1.

Fig. 5 .
Fig. 5. Effect of local training accuracy & local data size on reward using LeNet on EMNIST.

Fig. 4
Fig. 4 illustrates the impact of the sizes of the local dataset |X priv | and the public dataset |X pub | on the accuracy for EMNIST/MNIST under both non-iid distributions (α = 0.1 & α = 1.0) and the iid distribution (α = 100) for 10 clients.A marked improvement in model quality is evident for every worker following their KD execution.While the size of the local training dataset plays a more substantial role than the distillation dataset, the significance of the latter should not be underestimated, particularly in the context of non-iid distributions which can be attributed to the inclusion of additional data from the public dataset.

Leon
Witt received the master's degree in mechanical engineering and business adminstration from RWTH Aachen, Germany, with exchange semesters in Zurich and Los Angeles, and the second master's degree in industrial engineering from Tsinghua University, in 2017.He is currently working toward the PhD degree with the Department of Computer Science and Technology, Tsinghua University in Beijing.His research interests lie with the intersection of federated artificial intelligence, blockchain and mechanism design.Usama Zafar received the bachelors of engineering (BE) degree in software engineering from the National University of Sciences and Technology (NUST), in 2015, and the master's (MSc) degree in computer science from Tsinghua University, in 2019.He is currently working toward the PhD degree with the Department of Information Technology, Uppsala University, Sweden.His research interests include distributed machine learning, as well as security and privacy issues in federated machine learning.

TABLE II COMMUNICATION
AND STORAGE COSTS (IN MB) FOR ACHIEVING SPECIFIC ACCURACY IN FL ACROSS VARIOUS DATASETS, ARCHITECTURES, AND DATA HETEROGENEITY LEVELS