Skip to Main Content
The amount of stored data in enterprise Data Centers quadruples every 18 months. This trend presents a serious challenge for backup management and sets new requirements for performance efficiency of traditional backup and archival tools. In this work, we discuss potential performance shortcomings of the existing backup solutions. During a backup session a predefined set of objects (client filesystems) should be backed up. Traditionally, no information on the expected duration and throughput requirements of different backup jobs is provided. This may lead to an inefficient job schedule and the increased backup session time. We analyze historic data on backup processing from eight backup servers in HP Labs, and introduce two additional metrics associated with each backup job, called job duration and job throughput. Our goal is to use this additional information for automated design of a backup schedule that minimizes the overall completion time for a given set of backup jobs. This problem can be formulated as a resource constrained scheduling problem which is known to be NP-complete. Instead, we propose an efficient heuristics for building an optimized job schedule, called FlexLBF. The new job schedule provides a significant reduction in the backup time (up to 50%) and reduced resource usage (up to 2-3 times). Moreover, we design a simulation-based tool that aims to automate parameter tuning for avoiding manual configuration by system administrators while helping them to achieve nearly optimal performance.