Skip to Main Content
In recent years there has been renewed interest in the problem of transferring bulk data (terabytes) utilizing commercial ISPs. The need to transfer bulk data arises in various scientific and industrial applications. Today, this data is moved using postal service in conjunction with hard drives and DVDs or special high performance dedicated networks. The key insight underlying the recent work was that many of the applications are delay- tolerant and hence the bulk data can be transferred at minimal cost, utilizing already paid-for off-peak bandwidth resulting from diurnal traffic patterns, using store-and-forward through intermediate storage nodes. In this paper we expand on this theme and consider the computational complexity of transferring data over a network whose links have time-varying capacities. We show that the general problem of finding a cost-optimal transfer of the bulk data can be solved in polynomial-time using minimum cost flow algorithms on a time-expanded version of the underlying network. Our solution involves graph transformations. We present additional transformations that enable the handling of half-duplex links (e.g. fiber-optic links) as well as node processing constraints (e.g. limitations on the processing power available for filtering or archiving). An important characteristic of our solution is the ability to handle nodes with storage. We consider nodes with storage that varies over time in terms of both capacity and cost. We show that our approach provably extends to cover the case of linear costs, providing polynomial- time algorithms. However, the flat-fee storage model is NP- complete and hence unlikely to be tractable in polynomial-time. Interestingly, with constrained storage, the optimal solutions may involve loops, i.e. the data may pass through the same node more than once on its way from the destination to the source along the optimal route. We use data from one of the world's leading ISPs and perform a comprehensive evaluation - - of our algorithm. We show that there exists a huge potential for cost savings in real-world networks with time-varying costs for both link capacities and node storage.