Skip to Main Content
We consider reliable multicast in overlay networks where nodes have finite-size buffers and are subject to failures. We address issues of end-to-end reliability and throughput scalability in this framework. We propose a simple architecture which consists of using distinct point-to-point TCP connections between adjacent pairs of end-systems, together with a back-pressure control mechanism regulating the transfers of adjacent TCP connections, as well as a back-up buffering system handling node failures. This architecture, that we call the one-to-many TCP overlay, is a natural extension of TCP to the one-to-many case, in that it adapts the rate of the group communication to local congestion in a decentralized way via the window back-pressure mechanism. Using theoretical investigations, experimentations in the Internet, and large network simulations, we show that this architecture provides end-to-end reliability and can tolerate multiple simultaneous node failures, provided the backup buffers are sized appropriately. We also show that under random perturbations caused by cross traffic described in the paper, the throughput of this reliable group communication is always larger than a positive constant, that does not depend on the group size. This scalability result contrasts with known results about the non-scalability of IP-supported multicast for reliable group communication.