Scaling LLM Inference Architectures: A Performance Analysis for Chatbot Applications

Scaling LLM Inference Architectures: A Performance Analysis for Chatbot Applications | IEEE Conference Publication | IEEE Xplore