Abstract:
RDMA has been widely deployed in production datacenters. The conventional wisdom believes that the intra-host network delivers stable and high performance. However, intra...Show MoreMetadata
Abstract:
RDMA has been widely deployed in production datacenters. The conventional wisdom believes that the intra-host network delivers stable and high performance. However, intra-host resources witness a relative stagnation in technology trends compared to the evolving RDMA NIC (RNIC). Thus, the RNIC traffic may not get sufficient intra-host resources when it contends with CPU-to-memory traffic. A line of recent works from large-scale production datacenter operators demonstrates the emergence of intra-host congestion and associated performance collapse, which forces us to revisit the practice of intra-host congestion control. However, the ability to efficiently control RDMA intra-host networks is far less mature than inter-host networks, which brings challenges in congestion monitoring, intra-host resource allocation and RNIC traffic adjustment. In this paper, we propose RDMA intra-Host Congestion Control (RHCC), which combines CPU-to-memory traffic congestion avoidance with sub-RTT granularity and proactive RNIC traffic adjustment. RHCC ensures fast congestion avoidance and can work with different inter-host congestion control methods. We implement RHCC on commodity servers and RNICs and conduct experiments to evaluate the performance. The results show that RHCC can increase/decrease the network throughput/latency by up to 2 \times and 1.4 \times , respectively.
Published in: IEEE Transactions on Networking ( Early Access )