ServerlessPD: Fast RDMA-Codesigned Disaggregated Prefill-Decoding for Serverless Inference of Large Language Models

ServerlessPD: Fast RDMA-Codesigned Disaggregated Prefill-Decoding for Serverless Inference of Large Language Models | IEEE Conference Publication | IEEE Xplore

IEEE Account

Purchase Details

Profile Information

Need Help?