Skip to Main Content
We consider the problem of data durability in low-bandwidth large-scale distributed storage systems. Given the limited bandwidth between replicas, these systems suffer from long repair times after a hard disk crash, making them vulnerable to data loss when several replicas fail within a short period of time. Recent work has suggested that the probability of data loss can be predicted by modeling the number of live replicas using a Markov chain. This, in turn, can then be used to determine the number of replicas necessary to keep the loss probability under a given desired value. Previous authors have suggested that the model parameters can be estimated using an expression that is constant or linear on the number of replicas. Our simulations, however, show that neither is correct, as these parameter values grow sublinearly with the number of replicas. Moreover, we show that using a linear expression will result in the probability of data loss being underestimated, while the constant expression will produce a significant overestimation. Finally, we provide an empirical expression that yields a good approximation of the sublinear parameter values. Our work can be viewed as a first step towards finding more accurate models to predict the durability of this type of systems.