TELKA: Twin-Enhanced Learning for Kubernetes Applications | IEEE Conference Publication | IEEE Xplore

TELKA: Twin-Enhanced Learning for Kubernetes Applications


Abstract:

Chaos engineering is the discipline of injecting computing and network faults, such as increased network latency and unavailability of computing nodes, into an IT system ...Show More

Abstract:

Chaos engineering is the discipline of injecting computing and network faults, such as increased network latency and unavailability of computing nodes, into an IT system to help developers in identifying problems that could arise in a production environment and tackle them. Several tools have emerged to ease the application of chaos engineering to complex IT systems, leveraging microservice and container-based applications deployed on Kubernetes. However, applying of such tools requires several phases to be put into practice, from defining a steady state to establishing an effective response plan if something goes wrong. To ease the application of chaos engineering in improving the resilience of Kubernetes applications, this work presents a smart scheduler for Kubernetes called TELKA: a Twin-Enhanced Learning for Kubernetes Applications, which combines chaos engineering, Digital Twin (DT), and Reinforcement Learning (RL) methodologies to mitigate the effects of computing and network faults. Instead of interacting directly with the physical Kubernetes application, TELKA learns by interacting with a digital twin, thus reducing the learning time and the operation costs related to the application of chaos engineering. Experiment results compare TELKA with other approaches to show its effectiveness in mitigating the adverse effects of injected faults.
Date of Conference: 26-29 June 2024
Date Added to IEEE Xplore: 31 October 2024
ISBN Information:

ISSN Information:

Conference Location: Paris, France
Distributed Systems Research Group, University of Ferrara, Ferrara, Italy
University of Bologna, Bologna, Italy
University of Bologna, Bologna, Italy
University of Bologna, Bologna, Italy
Department of Mathematics and Computer Science, St. John’s University, Queens, NY, USA
University of Bologna, Bologna, Italy
Distributed Systems Research Group, University of Ferrara, Ferrara, Italy
University of Bologna, Bologna, Italy
Operational Innovation, IBM TJ Watson Research Center, NY, USA
Distributed Systems Research Group, University of Ferrara, Ferrara, Italy
Distributed Systems Research Group, University of Ferrara, Ferrara, Italy

Distributed Systems Research Group, University of Ferrara, Ferrara, Italy
University of Bologna, Bologna, Italy
University of Bologna, Bologna, Italy
University of Bologna, Bologna, Italy
Department of Mathematics and Computer Science, St. John’s University, Queens, NY, USA
University of Bologna, Bologna, Italy
Distributed Systems Research Group, University of Ferrara, Ferrara, Italy
University of Bologna, Bologna, Italy
Operational Innovation, IBM TJ Watson Research Center, NY, USA
Distributed Systems Research Group, University of Ferrara, Ferrara, Italy
Distributed Systems Research Group, University of Ferrara, Ferrara, Italy

References

References is not available for this document.