SPIRT: A Fault-Tolerant and Reliable Peer-to-Peer Serverless ML Training Architecture | IEEE Conference Publication | IEEE Xplore
Scheduled Maintenance: On Monday, 30 June, IEEE Xplore will undergo scheduled maintenance from 1:00-2:00 PM ET (1800-1900 UTC).
On Tuesday, 1 July, IEEE Xplore will undergo scheduled maintenance from 1:00-5:00 PM ET (1800-2200 UTC).
During these times, there may be intermittent impact on performance. We apologize for any inconvenience.

SPIRT: A Fault-Tolerant and Reliable Peer-to-Peer Serverless ML Training Architecture


Abstract:

The advent of serverless computing has ushered in notable advancements in distributed machine learning, particularly within parameter server-based architectures. Yet, the...Show More

Abstract:

The advent of serverless computing has ushered in notable advancements in distributed machine learning, particularly within parameter server-based architectures. Yet, the integration of serverless features within peer-to-peer (P2P) distributed networks remains largely uncharted. In this paper, we introduce SPIRT, a fault-tolerant, reliable, scalable and secure serverless P2P ML training architecture. designed to bridge this existing gap. Capitalizing on the inherent robustness and reliability innate to P2P systems, we emphasized Intra-peer scalability for concurrent gradient to mitigate communication overhead from increased peer interactions. SPIRT, employs RedisAI for in-database operations, achieves an 82% reduction in model update times. This architecture showcases resilience against peer failures and adeptly manages the integration of new peers. Furthermore, SPIRT ensures secure communication between peers, enhancing the reliability of distributed machine learning tasks. Even in the face of Byzantine attacks, the system’s robust aggregation algorithms maintain high levels of accuracy. These findings illuminate the promising potential of serverless architectures in P2P distributed machine learning, offering a significant stride towards the development of more efficient, scalable, and resilient applications.
Date of Conference: 22-26 October 2023
Date Added to IEEE Xplore: 25 December 2023
ISBN Information:

ISSN Information:

Conference Location: Chiang Mai, Thailand

Contact IEEE to Subscribe

References

References is not available for this document.