Skip to Main Content
The flow size distribution is a useful metric for traffic modeling and management. Its estimation based on sampled data, however, is problematic. Previous work has shown that flow sampling (FS) offers enormous statistical benefits over packet sampling but high resource requirements precludes its use in routers. We present dual sampling (DS), a two-parameter family, which, to a large extent, provide FS-like statistical performance by approaching FS continuously, with just packet-sampling-like computational cost. Our work utilizes a Fisher information based approach recently used to evaluate a number of sampling schemes, excluding FS, for TCP flows. We revise and extend the approach to make rigorous and fair comparisons between FS, DS, and others. We show how DS significantly outperforms other packet based methods, including Sample and Hold, the closest packet sampling-based competitor to FS. We describe a packet sampling-based implementation of DS and analyze its key computational costs to show that router implementation is feasible. Our approach offers insights into numerous issues, including the notion of “flow quality” for understanding the relative performance of methods, and how and when employing sequence numbers is beneficial. Our work is theoretical with some simulation support and case studies on Internet data.