Skip to Main Content
Overlay network monitoring enables distributed Internet applications to detect and recover from path outages and periods of degraded performance within seconds. For an overlay network with end hosts, existing systems either require measurements, and thus lack scalability, or can only estimate the latency but not congestion or failures. Our earlier extended abstract [Y. Chen, D. Bindel, and R. H. Katz, ldquoTomography-based overlay network monitoring,rdquo Proceedings of the ACM SIGCOMM Internet Measurement Conference (IMC), 2003] briefly proposes an algebraic approach that selectively monitors linearly independent paths that can fully describe all the paths. The loss rates and latency of these paths can be used to estimate the loss rates and latency of all other paths. Our scheme only assumes knowledge of the underlying IP topology, with links dynamically varying between lossy and normal. In this paper, we improve, implement, and extensively evaluate such a monitoring system. We further make the following contributions: i) scalability analysis indicating that for reasonably large n (e.g., 100), the growth of k is bounded as O(n log n), ii) efficient adaptation algorithms for topology changes, such as the addition or removal of end hosts and routing changes, iii) measurement load balancing schemes, iv) topology measurement error handling, and v) design and implementation of an adaptive streaming media system as a representative application. Both simulation and Internet experiments demonstrate we obtain highly accurate path loss rate estimation while adapting to topology changes within seconds and handling topology errors.