Impact Statement:The contribution of this study lies in proposing and successfully implementing an innovative solution for the partition, secure deployment, and distributed reasoning of L...Show More
Abstract:
In recent years, the emergence of large-language models (LLMs) has profoundly transformed our production and lifestyle. These models have shown tremendous potential in fi...Show MoreMetadata
Impact Statement:
The contribution of this study lies in proposing and successfully implementing an innovative solution for the partition, secure deployment, and distributed reasoning of LLMs. We achieve secure isolation and encrypted transmission of the models at the hardware level. This approach holds significant importance in practical application scenarios. It not only provides valuable insights and guidance for the secure deployment and efficient inference of LLMs on cloud platforms and within clusters but also offers practical solutions for research and applications in related fields.
Abstract:
In recent years, the emergence of large-language models (LLMs) has profoundly transformed our production and lifestyle. These models have shown tremendous potential in fields, such as natural language processing, speech recognition, and recommendation systems, and are increasingly playing crucial roles in applications such as human–computer interaction and intelligent customer service. Efficient inference solutions for LLMs in data centers have been extensively researched, with a focus on meeting users’ quality of service requirements. In this article, we focus on two additional requirements that responsible LLM inference should meet under QoS conditions: security throughout the model execution process and low maintenance requirements for the inference system. Therefore, we propose LLMaaS, a trusted model inference platform based on a serverless computing platform aimed at providing inference as a service for LLMs. First, we design a trusted serverless computing platform based on softw...
Published in: IEEE Transactions on Artificial Intelligence ( Volume: 6, Issue: 2, February 2025)