Loading [MathJax]/extensions/MathMenu.js
SMSS: Stateful Model Serving in Metaverse With Serverless Computing and GPU Sharing | IEEE Journals & Magazine | IEEE Xplore

SMSS: Stateful Model Serving in Metaverse With Serverless Computing and GPU Sharing


Abstract:

With the rapid development of information technology, the concept of the Metaverse has swept the world and set off a new wave of the industrial revolution. The constructi...Show More

Abstract:

With the rapid development of information technology, the concept of the Metaverse has swept the world and set off a new wave of the industrial revolution. The construction of living and manufacturing scenes based on the Metaverse requires the joint participation of scientists and engineers from various fields where “human” is at the core. In the Metaverse, predicting human behavior and response based on the deep learning model is meaningful because the prediction results can provide more satisfactory services for participants. Therefore, how to deploy a multi-stage machine learning reasoning model has become the bottleneck to improving the development level of Metaverse. Thanks to its scalability and pay-as-you-go billing model, the emerging serverless computing can effectively cope with the workload of machine learning inference. However, the statelessness of serverless computing and the lack of good GPU resource-sharing support make it difficult to deploy the machine learning model directly on the serverless computing platform to play its advantages. Therefore, we propose SMSS, a stateful model inference service, which is deployed on a serverless computing platform that supports GPU sharing. Since the serverless computing platform does not support stateful workflow execution, SMSS adopts log-based workflow runtime support. We also design a mechanism of two-layer GPU sharing to fully explore the potential of inter-model and intra-model GPU sharing. We evaluate the effectiveness of SMSS with real workloads. Our experimental results show that log-based stateful workflow operation support can ensure the stateful execution of tasks with low overhead but facilitate error location and recovery. Two-layer GPU Sharing can reduce the cold start time of inference tasks to two orders of magnitude at most.
Published in: IEEE Journal on Selected Areas in Communications ( Volume: 42, Issue: 3, March 2024)
Page(s): 799 - 811
Date of Publication: 21 December 2023

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.