Loading [MathJax]/extensions/MathMenu.js
INSERT: In-Network Stateful End-to-End RDMA Telemetry | IEEE Conference Publication | IEEE Xplore

INSERT: In-Network Stateful End-to-End RDMA Telemetry


Abstract:

Remote Direct Memory Access (RDMA) has been widely adopted in modern data centers thanks to its high-throughput, low-latency data transfer capability and reduced CPU over...Show More

Abstract:

Remote Direct Memory Access (RDMA) has been widely adopted in modern data centers thanks to its high-throughput, low-latency data transfer capability and reduced CPU overhead. However, traditional network-flow-based monitoring is poor at interpreting RDMA communication and hence inadequate for gaining insights. In this paper, we present INSERT, an end-to-end RDMA telemetry system that enables seamless visibility into RDMA communication from the network layer all the way to the application layer. To this end, INSERT combines (i) eBPF-based transparent RDMA tracing on end-hosts and (ii) stateful RDMA network telemetry on programmable data plane. We implement RDMA network telemetry on programmable SmartNICs, where we address practical challenges for maintaining fine-grained state on massively-parallel packet processing pipelines. We demonstrate that INSERT can perform reasonably accurate telemetry at line-rate for different types of RDMA traffic even in the presence of out-of-order packets, and finally showcase two practical use cases that can benefit from it.
Date of Conference: 20-23 May 2024
Date Added to IEEE Xplore: 12 August 2024
ISBN Information:

ISSN Information:

Conference Location: Vancouver, BC, Canada

I. Introduction

RDMA-based transport has been adopted by more and more data center applications [1]–[3] as their increasingly demanding throughput and latency requirements can no longer be adequately met by traditional TCP/IP-based networking. RDMA over Converged Ethernet (RoCE) allows InfiniBand (IB) RDMA packets to be encapsulated via Ethernet/IP/UDP and carried over the commodity Ethernet fabric, facilitating the adoption of RDMA on existing data center networks. Given RDMA’s growing popularity, it is increasingly important to monitor RDMA traffic and associated applications at the RDMA protocol level. End-to-end RDMA telemetry allows data center operators to quickly detect network congestion and migrate some of associated RDMA applications to mitigate the congestion. It can also help the operators monitor the use or misuse of shared RDMA hardware resources by tenants. RDMA transaction monitoring can be a valuable input not only for optimizing application performance and troubleshooting performance anomalies such as tail latencies [4], but also for uncovering RDMA-based security attacks [5]. Despite these potential benefits, existing monitoring facilities of modern data centers fall short of RDMA’s popularity, and realizing these benefits faces several unique challenges as describe below.

Contact IEEE to Subscribe

References

References is not available for this document.