Skip to Main Content
This work presents an analytical queuing network model for bulk data transfers over long distances. Large scale scientific applications like climate modeling and high energy physics generate huge amounts of data that needs to be replicated across long distances for collaboration purposes. In addition, businesses need to copy critical data to remote sites for disaster recovery. The performance of these kinds of applications depends on the latency of the link due to the physical distance between the sites. The model presented in this paper uses the most prominent characteristics of remote data transfer applications and simplifies the performance prediction. This study shows that the overall throughput is mostly determined by the latency of the link, bandwidth of the link, number of round trips required for each I/O operation, and the number of simultaneously active I/O operations (asynchronous window size). Through laboratory and field experiments, we show that the model is capable of approximating the performance of remote data transfers within a very small error range.