Cost-Efficient VM Selection for Cloud-Based LLM Inference with KV Cache Offloading

Cost-Efficient VM Selection for Cloud-Based LLM Inference with KV Cache Offloading | IEEE Conference Publication | IEEE Xplore