Towards a Resource-Efficient Semi-Asynchronous Federated Learning for Heterogeneous Devices | IEEE Conference Publication | IEEE Xplore

Towards a Resource-Efficient Semi-Asynchronous Federated Learning for Heterogeneous Devices


Abstract:

Our proposed resource-efficient semi-asynchronous federated learning (RE-SAFL) approach presents a comprehensive and effective solution for training large models such as ...Show More

Abstract:

Our proposed resource-efficient semi-asynchronous federated learning (RE-SAFL) approach presents a comprehensive and effective solution for training large models such as Automatic Speech Recognition (ASR) models in a distributed and semi-asynchronous manner. In our research, we highlight the importance of employing a resource-efficient work allocation approach when deploying complex tasks such as ASR in real-time on edge devices such as mobile phones. To validate our approach, we conducted experiments on a real FL test-bed using Android-based mobile devices. By addressing the resource constraints of client devices and optimizing work allocation, our RE-SAFL framework opens up new possibilities for training large models in semi-asynchronous federated environments.
Date of Conference: 28 February 2024 - 02 March 2024
Date Added to IEEE Xplore: 05 April 2024
ISBN Information:

ISSN Information:

Conference Location: Chennai, India

I. Introduction

Federated learning (FL) is a popular distributed learning paradigm that has captured the interest of researchers globally. In a typical FL approach, the central server and the partic-ipating client devices are assumed to be fully synchronous [1]. The server waits for all clients to complete their local training before aggregating the weights. The availability of each client varies considerably over time due to system heterogeneities and internet connectivity. To address these challenges, researchers explored the potential of asynchronous updates, resulting in the emergence of asynchronous federated learning (AsynchFL) [2]. AsynchFL introduce flexibility and improve scalability to the FL framework by enabling clients to independently and asynchronously update their local models. The transition from synchronous to asynchronous FL mark a significant advancement in improving the performance and capabilities of federated learning. However, this transition also introduced new challenges, such as managing asynchrony, addressing stale gradients, convergence properties,and ensuring consistency in the global model. In a fully AsynchFL method [2], [3], each client update leads to an update in the server model. However, this approach introduces a challenge where faster clients update the global model more frequently. Whereas, slower clients may have their model updates based on an earlier round of the global model, leading to potential staleness that affect the model convergence [4]. In [2], [5] and [6], the authors introduce a staleness function that assigns lower weights to model updates from stale clients during server aggregation. In another work, [7], the authors propose a semi-asynchronous approach named SAFA. SAFA incorporates a lag tolerance hyperparameter to quantify client staleness and disregards local training results from clients considered too stale based on the lag tolerance threshold. Additionally, FedBuff [8] suggests an approach where the server waits for updates from a minimum number of clients before performing aggregation and stores the weights in a buffer. Both SAFA and FedBuff operate as hybrids between fully asynchronous and synchronous modes of operation. The TimelyFL approach, presented in [9], introduces a heterogeneity-aware semi-AsynchFL method with adaptive partial training. They address the inclusion of more available devices in global aggregation without introducing staleness by implementing partial model training for clients with lower capacity.

Contact IEEE to Subscribe

References

References is not available for this document.